The Benefit of Speech Enhancement to the Hearing Impaired · The Benefit of Speech Enhancement to...

The Benefit of Speech Enhancement to the

Hearing Impaired

N. Fink

Department of Bio-Medical

Engineering

Faculty of Engineering

Tel-Aviv University, ISRAEL

C. Muchnik

Department of Communications

Disorders

Faculty of Medicine


M. Furst

School of Electrical Engineering

Faculty of Engineering


Abstract— Most modern hearing-aids include different types

of speech enhancement algorithms. Yet, decreased speech

intelligibility in background noise is a common complaint of

most hearing impaired even when speech enhancements

algorithms are functional. Generally, the hearing-aid

industry chose those algorithms that were proven to be most

adequate to normal hearing subjects. However, it is not clear

that an algorithm that is beneficial to normal hearing will

increase the intelligibility of the hearing impaired as well,

and vice-versa.

We have recently developed a single-channel speech

enhancement technique that is based on an ear model

comprising outer-hair cell functionality. The algorithm was

evaluated in systematic speech intelligibility test of Hebrew

words. Hearing impaired subjects, who used either a hearing

–aid or a cochlear implant, demonstrated a significant

improvement in their performance with the algorithm. On

the other hand, normal hearing subjects demonstrated no

improvement in their performance on the same task. We,

therefore, suggest that speech enhancement algorithms for

the hearing-impaired should be different from those that are

beneficial to normal hearing subjects.

Keywords-component; speech in noise, Cochlear

reconstruction algorithm, speech enhancement, speech

intelligibility

I INTRODUCTION

Loss of hearing is a major health problem with

serious social implications. Many who have suffered a

hearing loss (HL) feel restricted socially and

professionally. One of the most common complaints

among patients with cochlear hearing loss is difficulty in

understanding speech in a noisy environment with or

without their hearing assisting devices (hearing aid HA or

cochlear implant CI). Current HAs work well in quiet

environments and provide the hearing impaired (HI) with

improved understanding of auditory signals. Yet hearing

assisting devices are less efficient in noisy environments.

Speech Enhancement (SE) algorithms, often called

noise reduction (NR) algorithms, aim to improve sound

quality and speech recognition in noise. Single-channel

based SE algorithms for the HI are efficient in improving

speech recognition at positive SNRs. At positive SNRs

normal-hearing (NH) subjects have no difficulty in

recognizing speech and therefore do not need SE. Multi-

channel based SE algorithms are more efficient than

single-channel but among their drawbacks are large sized

HAs (such as BTE) or the need of binaurally assisting

devices.

Sound qualities of SE algorithms have been broadly

evaluated but few studies have focused on speech

recognition. Some of these studies evaluated the

recognition of NH with single-channel SE algorithms [1-3]

and reported improvement in speech recognition of up to

33% at SNR of -5 dB with an auditory masked threshold

in conjunction with noise suppression AMT-NS [3],

partial improvement with the AMT-NS approach of up to

5% at SNRs of 0 dB and 5 dB, none at SNR of -5 dB [2]

or no improvement at all with 4 families of SE algorithms:

spectral subtractive, sub-space, statistical model and

Wiener-filter tested at SNRs of 5 dB and 0 dB [1]. HA-

users evaluation of single-channel SE algorithms reported

partial improvement with the AMT-NS technique of 2% at

SNRs of 5 dB, 0 dB and -5 dB [2]. Studies involving

single channel SE algorithms on CI users reported

improvement in sentence recognition by 8-21% at positive

SNRs (0-9 dB) with spectral subtraction [4] and by 20% at

SNR of 5 dB with the subspace algorithm [5].

Alternatively, with a multi-channel SE algorithm based on

blind source separation, bilaterally CI recipients improved

recognition by 40% at SNR of 0 dB [6].

Recently 5 promising algorithms for speech

enhancement were selected and implemented on a

common real-time hardware/software platform [7]. Two

SE algorithms were single-channel based (perceptually

optimized spectral subtraction and Wiener-filter-based

noise suppression) and three were multi-channel based

(Broadband blind source separation, Spatially

preprocessed speech-distortion-weighted multi-channel

Wiener filtering – MWF, Binaural coherence

dereverberation filter). Listening tests were conducted by

different research groups at different sites. Tests were

performed with NH and bilaterally HI subjects with flat

and sloping mild HL. Three perceptual measures were

used: speech reception threshold (SRT), listening effort

scaling and preference rating. In a multitalker babble

noise, resembling an office scenario (pseudo-diffuse), only

one algorithm, the spatially preprocessed speech-

distortion-weighted multi-channel Wiener filtering,

provided an SRT improvement (of 6-7 dB) relative to the

unprocessed condition.

To conclude, Single-channel SE algorithms have not

demonstrated persuasive speech recognition improvement

for the HI (HA-users or CI-recipients). Furthermore,

single-channel SE algorithms that may benefit a NH

subject at negative SNRs may not benefit a hearing-

impaired even at positive SNRs.

In the present work a single-channel SE algorithm

based on an ear model was used to evaluate speech

recognition on a large number of normal hearing, HA

users and CI recipients.

II METHOD

A. Cochlear Representation Algorithm

The cochlear representation algorithm (CRA) is based

on a full computational model of the cochlea [8]. The

model relies less on heuristics (and parameter estimation)

than conventional algorithms that are not based on a

hearing model. The CRA integrates outer hair cells (OHC)

activity in a one-dimensional cochlear model. In response

to a word input to the cochlea, a representation of the

basilar membrane's velocity is displayed in color as a

function of time and distance from the stapes (fig 1).

Alternating the OHC redundancy variable can result in a

normal cochlea (fig 1 top panel) or a damaged cochlea

response (fig 1 bottom panel). The dynamic properties of a

normal cochlear model output are used to reconstruct

noisy speech signals with improved SNR [9]. As displayed

in figure 2, the CRA method represents the auditory signal

(fig 2b), distinguishes between noised and un-noised

speech fragments in the input speech signal according to

threshold considerations (fig 2c), and applies a

corresponding masker (fig 2d). Reconstruction of the

masked signal results in an enhanced signal (fig 2f) in

comparison to the noisy input (fig 2a).

Figure 1: Representation of a normal (top) and damaged (bottom)

cochlea response to an input signal (word).

B. Psycho-acoustical Experiments

1. Subjects

A total of 99 subjects participated in the

experiment according to the following subdivision: (1)

NH; (2) HA users; and (3) CI users. The number of

participants in each group, their mean age and gender is

summarized in Table 1.

Figure 2: (a) A Time-domain representation of the word shen at

SNR 0 dB. (b) representation of the energy of the signal along the

cochlea as a function of time. (c) the algorithm progresses along the time

domain in narrow time windows and identifies areas of speech (dark red). (d) A mask with a shape of the identified speech is derived, masking out

areas that were not identified as speech. (e) The masker is applied to the

representation of the signal and reconstructed back to the time domain (f)

All the HI subjects (subgroups (2) and (3)) were

binaurally impaired. Most of them used binaural assisting

devices. In the HA subgroup, nine of them had a HA in

one ear only, the other ear was unassisted and totally deaf.

The CI subgroup included subjects that had CI at least in

one ear. Among them were six subjects who had no

assistance in the other ear; 2 had CI in both ears, and the

rest had a HA in the other ear. The rest of the CI users had

at least a CI in one ear.

All the subjects were volunteers and did not get any

reward. The HI subjects asked to participate in the study,

after they heard a lecture about the potential of the

method.

HA users CI users (ormal Hearing

subjects

analyzed 33 22 41

Age range

(yrs.) 23-64 14-82 17-52

Age

( mean±std) 51±14 49±18 35±14

Gender

(Female,

Male)

(14F, 19M) (9F, 13M) (11F ,30M)

Table 1: Age and gender of the different groups participating in the

experimental task.

2. Word Database and #oise Types

The database consists of the Hebrew adaptation to the

AB List (HAB). The AB list, a set of monosyllabic real

words, comprises consonant-vowel consonant (CVC)

words. The list was designed so that the different

phonemes in English shall be equally distributed

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

a) Input - noisy signal

Time [s]

am

plitue o

f sig

nal

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

f) Output - enhanced signal

Time [s]

am

plitue o

f sig

nal

b) representation of noisy signal

Time [s]

Dis

tance fro

m s

tapes [m

m]

0.16 0.32 0.48 0.64 0.8 0.96

3

7

10

14

17

21

24 -100

-80

-60

-40

-20

0

20

c) areas of identified speech

Time [s]

Dis

tance fro

m s

tapes [m

m]

0.16 0.32 0.48 0.64 0.8 0.96

3

7

10

14

17

21

24 -3

-2

-1

0

1

2

3

4

d) masker to be applied to b)

Time [s]

Dis

tance fro

m s

tapes [m

m]

0.16 0.32 0.48 0.64 0.8 0.96

3

7

10

14

17

21

24 0

0.5

1

1.5

2

2.5

3

3.5

4e) reconstructed representation

Time [s]

Dis

tance fro

m s

tapes [m

m]

0 0.16 0.32 0.48 0.64 0.8 0.96

0

3

7

10

14

17

21

24 -100

-80

-60

-40

-20

0

20

Quiet 30 24 180

10

20

30

40

50

60

70

80

90

100

SNR [dB]

Mean speech recognition score [%]

* * * *

64

57

45

55

39

53

29

46

CI users

CI users+CRA

n=22

Quiet 30 24 180

10

20

30

40

50

60

70

80

90

100

SNR [dB]

Mean speech recognition score [%]

*

67 66

57 55

47

53

42

53

HA users

HA users+CRA

n=33

Quiet 30 24 18 10 5 00

10

20

30

40

50

60

70

80

90

100

SNR [dB]

Mean w

ord recognition score [%]

99 99 98 95 9490 90

8685 86

74 74

5350

NH without CRA

NH+CRA

throughout the entire list [10]. The AB list is commonly

used in hearing tests as it reduces the effect of word

frequency and/or word familiarity on test scores.

Corresponding lists are produced for other languages and

accents [11]. The HAB list was designed for Hebrew

natives, and it consists of 15 lists of 10 monosyllabic

words such as “sir”, “kir” [12]. A single female speaker

recorded the HAB list with a sampling rate of 44.1 kHz.

Gaussian white noise was added to the database in

various SNRs (0, 5, 10, 18, 24 and 30 dB). The clean and

noisy HAB lists were band-pass filtered between 500 Hz

and 8 kHz. The filtered lists were applied to the CRA.

Eventually, each input word had a corresponding

reconstructed word. All together the complete database

consisted of several HAB word lists (each list comprising

of the same 150 words) in various treatments: noise type

(white noise), noise level (SNRs 0,5,10,18,24,30 and

quiet) and treatment (CRA, none).

3. Procedure

Each subject passed a standard audiometric test

procedure. The NH group consisted of subjects with 15

dB HL (PTA of 500, 1000, 2000 Hz). The hearing of the

patients with HAs was assessed with and without their aid.

The psychoacoustical experiment was word

recognition in an open-set. Subjects were seated in a

sound proof room, wearing their HA/CI in front of a

loudspeaker. They were randomly introduced with words

chosen from the recorded database. The level of the tested

words was adjusted for most comfortable level (MCL).

Each subject was tested in 8 experimental sessions of

10-20 words each. Following each word, the subject was

asked to repeat the write the word he/she heard.

HI subjects were tested with SNR levels of 30, 24,

and 18 dB, while the NH subjects were tested also with

lower SNRs, with a minimum of SNR of 0 dB. HI subjects

were not able to perform the experiment in the very low

SNRs.

III RESULTS

Mean speech recognition score of 22 CI users is

depicted in Fig 3 and of 33 HA users is in Fig 4.

Figure 3: Mean speech recognition score for quiet and for different

noise levels for the CI users (n=22). White bars represent score with the CIs without the CRA. Black bars represent score with CI+CRA. Values

above bars represent mean speech recognition score. Error bars are displayed. * t-test, p<0.05.

For decreasing SNRs, speech intelligibility for the CI

deteriorated with and without the CRA, from 64% at quiet

to 29% at SNR18 dB for no treatment and from 57% to

46% with the CRA. Similarly for the HA, Speech

intelligibility deteriorated with and without the CRA, from

67% at quiet to 42% at SNR 18 dB for no treatment and

from 66% to 53% with the CRA. The use of the CI or HA

alone is characterized by a gradual decline in performance

as SNR decreases, pointing to the difficulty of

understanding speech as the energy of the competing noise

increases. In SNRs 18, 24, 30 dB (for CI) and SNR18 dB

(for HA), there was a significant improvement in

performance relative to the situation when CRA was not

applied (t-test, p<0.05). Significant dis-improvement (t-

test, p<0.05) in word recognition with the CRA in

comparison to without the CRA was found at quiet

conditions for the CI.

Figure 4: Mean speech recognition score for quiet and

for different noise levels for the HA users (n=33). Significant

improvement (t-test, p=0.049) in word recognition with the CRA in comparison to without the CRA was found at SNR of 18 dB.

The results of the NH group are depicted in Fig 5.

From the 41 normal hearing subjects, 11 were tested at

high SNRs (similar to the HI groups) and reached a

'ceiling effect'. Additional 30 subjects were tested at low

SNRs (0, 5 and 10). Both groups were tested similarly in

quiet conditions and performed similarly with and without

the CRA (99% intelligibility). For decreasing SNRs,

Speech intelligibility deteriorated with and without the

CRA, from 98% at SNR30 to 53% at SNR0 for no

treatment and from 95% to 50% with the CRA.

Figure 5: Mean speech recognition score for quiet and for different noise

levels for NH (n=41).

Non-significant difference in performance with

and without the CRA was achieved (0%, -3%, -4%, -4%,

+1%, 0% and -3% for quiet and SNRs 30, 24, 18, 10, 5

and 0 dB respectively, where negative values depict poorer

performance with the CRA rather than with no treatment).

These findings are consistent with a comparative

intelligibility study of single-microphone noise reduction

algorithms on NH [1-2].

IV DISCUSSION

The psycho acoustical experiments revealed a significant

improved performance of HI subjects with CRA - the speech

enhancement algorithm tested. On the other hand, such an

improvement was not observed in NH listeners. It is,

therefore, most likely that NH and HI subjects use different

strategies in processing and recognizing noisy speech.

Subjects suffering from cochlear lesions and regularly

wearing their HAs acquire new acoustic cues that potentially

reorganize their cerebral cortex. Alternatively, similar

subjects that do not wear their HAs on a daily basis cannot

acquire new acoustic cues. Gatehouse [13] has used the term

“acclimatization effect” to explain the difference in speech

recognition scores between the aided ear and the un-aided

ear.

We hypothesize that the different performance in

recognition of the different groups is due to different

strategies used by these groups, when processing the input

from the cochlea. We assume that the strategy used by NH

for speech recognition is optimal usage of speech

representation redundancy. The speech can be recognized in

different overlapping spectral ranges, and the decision is

based on one or more frequency range. HI Subjects whom

improved their speech recognition with the CRA relative to

without have probably been accustomed to the distorted

signal produced by their HA/CI which emphasized certain

frequency ranges and ignored other frequencies where these

subjects did not have residual hearing. Other HI subjects

who did not benefit from the CRA relative to their HA or CI,

probably still use the strategies of NH which look at the

whole frequency range of the speech signal. Speech

enhancement techniques reduce the competing noise by

estimating the noise spectrum and subtracting it from the

noisy speech signal. This generates a distorted signal that

emphasizes part of the spectrum where the noise is minimal,

but diminishes other parts of the speech signal spectrum. A

NH that listens to the enhanced speech cannot use his

regular strategy since some of the speech spectrum is

missing. On the other hand, the speech enhancement

algorithm improves the strategy used by some of the HI,

since it cleans and emphasizes the part of the spectrum they

perceive and use for recognition.

REFERENCES

[1] Y. Hu, P.C. Loizou. "A comparative intelligibility study of

single-microphone noise reduction algorithms". J. Acoust. Soc.

Am. 122, 3, 1777–1786, 2007.

[2] K.H. Arehart, J.H.L. Hansen, S. Gallant and L. Kalstein.

"Evaluation of an auditory masked threshold noise suppression

algorithm in normal-hearing and hearing-impaired listeners"

Speech communications 40: 575-592, 2003.

[3] D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis. "Speech

enhancement based on audible noise suppression". IEEE Trans.

Speech Audio Process. 5: 497–513, 1997.

[4] L.P. Yang, Q.J. Fu. "Spectral subtraction-based speech

enhancement for cochlear implant patients in background noise".

J. Acoust. Soc. Am. 117,3 1001-1004, 2005.

[5] P.C. Loizou, A. Lobo, Y. Hu. "Subspace algorithms for noise

reduction in cochlear implants". J. Acoust. Soc. Am. 118, 2791–

2793, 2005.

[6] K. Kokkikanos, P.C. Loizou. "Using blind source separation

techniques to improve speech recognition in bilateral cochlear

implant patients". J. Acoust. Soc. Am. 123 (4), 2379-2390, 2008.

[7] H. Luts, K. Eneman, J. Wouters, et al. "Multicenter evaluation of

signal enhancement algorithms for hearing aids". J. Acoust. Soc.

Am. 127, 1491-1505. 2010.

[8] A. Cohen, M. Furst. "Integration of outer hair cell activity in a

one-dimensional cochlear model". J. Acoust. Soc. Am.. 115, 5,

2185-2192, 2004.

[9] M. Furst, A. Cohen, V. Weisz. "Method apparatus and system for

processing acoustic signals". US patent #7,366,656. 2008

[10] A. Boothroyd. "Statistical theories of the speech discrimination

score". J. Acoust. Soc. Am., 43, 362-367, 1968.

[11] S.C. Purdy, B. Arlington, C. Johnstone, "Normative Data for the

New Zealand Recording of the CVC (Revised AB) Word List".

#ew Zealand Audiological Society, Bulletin 10(2):20-29, 2000.

[12] L. Kishon-Rabin, S. Patael, M. Menahemi, N. Amir. "Are the

perceptual effects of spectral smearing influenced by speaker

gender?" Journal of Basic and Clinical Physiology and

Pharmacology,15: 41-55, 2004.

[13] S. Gatehouse. "Apparent auditory deprivation effect of late onset:

the role of presentation level. J. Acoust. Soc. Am. 86, 2103-2106,

1989.

The Benefit of Speech Enhancement to the Hearing Impaired · The Benefit of Speech Enhancement to...

Documents

Transcript of The Benefit of Speech Enhancement to the Hearing Impaired · The Benefit of Speech Enhancement to...