The Benefit of Speech Enhancement to the Hearing Impaired · The Benefit of Speech Enhancement to...
Transcript of The Benefit of Speech Enhancement to the Hearing Impaired · The Benefit of Speech Enhancement to...
The Benefit of Speech Enhancement to the
Hearing Impaired
N. Fink
Department of Bio-Medical
Engineering
Faculty of Engineering
Tel-Aviv University, ISRAEL
C. Muchnik
Department of Communications
Disorders
Faculty of Medicine
Tel-Aviv University, ISRAEL
M. Furst
School of Electrical Engineering
Faculty of Engineering
Tel-Aviv University, ISRAEL
Abstract— Most modern hearing-aids include different types
of speech enhancement algorithms. Yet, decreased speech
intelligibility in background noise is a common complaint of
most hearing impaired even when speech enhancements
algorithms are functional. Generally, the hearing-aid
industry chose those algorithms that were proven to be most
adequate to normal hearing subjects. However, it is not clear
that an algorithm that is beneficial to normal hearing will
increase the intelligibility of the hearing impaired as well,
and vice-versa.
We have recently developed a single-channel speech
enhancement technique that is based on an ear model
comprising outer-hair cell functionality. The algorithm was
evaluated in systematic speech intelligibility test of Hebrew
words. Hearing impaired subjects, who used either a hearing
–aid or a cochlear implant, demonstrated a significant
improvement in their performance with the algorithm. On
the other hand, normal hearing subjects demonstrated no
improvement in their performance on the same task. We,
therefore, suggest that speech enhancement algorithms for
the hearing-impaired should be different from those that are
beneficial to normal hearing subjects.
Keywords-component; speech in noise, Cochlear
reconstruction algorithm, speech enhancement, speech
intelligibility
I INTRODUCTION
Loss of hearing is a major health problem with
serious social implications. Many who have suffered a
hearing loss (HL) feel restricted socially and
professionally. One of the most common complaints
among patients with cochlear hearing loss is difficulty in
understanding speech in a noisy environment with or
without their hearing assisting devices (hearing aid HA or
cochlear implant CI). Current HAs work well in quiet
environments and provide the hearing impaired (HI) with
improved understanding of auditory signals. Yet hearing
assisting devices are less efficient in noisy environments.
Speech Enhancement (SE) algorithms, often called
noise reduction (NR) algorithms, aim to improve sound
quality and speech recognition in noise. Single-channel
based SE algorithms for the HI are efficient in improving
speech recognition at positive SNRs. At positive SNRs
normal-hearing (NH) subjects have no difficulty in
recognizing speech and therefore do not need SE. Multi-
channel based SE algorithms are more efficient than
single-channel but among their drawbacks are large sized
HAs (such as BTE) or the need of binaurally assisting
devices.
Sound qualities of SE algorithms have been broadly
evaluated but few studies have focused on speech
recognition. Some of these studies evaluated the
recognition of NH with single-channel SE algorithms [1-3]
and reported improvement in speech recognition of up to
33% at SNR of -5 dB with an auditory masked threshold
in conjunction with noise suppression AMT-NS [3],
partial improvement with the AMT-NS approach of up to
5% at SNRs of 0 dB and 5 dB, none at SNR of -5 dB [2]
or no improvement at all with 4 families of SE algorithms:
spectral subtractive, sub-space, statistical model and
Wiener-filter tested at SNRs of 5 dB and 0 dB [1]. HA-
users evaluation of single-channel SE algorithms reported
partial improvement with the AMT-NS technique of 2% at
SNRs of 5 dB, 0 dB and -5 dB [2]. Studies involving
single channel SE algorithms on CI users reported
improvement in sentence recognition by 8-21% at positive
SNRs (0-9 dB) with spectral subtraction [4] and by 20% at
SNR of 5 dB with the subspace algorithm [5].
Alternatively, with a multi-channel SE algorithm based on
blind source separation, bilaterally CI recipients improved
recognition by 40% at SNR of 0 dB [6].
Recently 5 promising algorithms for speech
enhancement were selected and implemented on a
common real-time hardware/software platform [7]. Two
SE algorithms were single-channel based (perceptually
optimized spectral subtraction and Wiener-filter-based
noise suppression) and three were multi-channel based
(Broadband blind source separation, Spatially
preprocessed speech-distortion-weighted multi-channel
Wiener filtering – MWF, Binaural coherence
dereverberation filter). Listening tests were conducted by
different research groups at different sites. Tests were
performed with NH and bilaterally HI subjects with flat
and sloping mild HL. Three perceptual measures were
used: speech reception threshold (SRT), listening effort
scaling and preference rating. In a multitalker babble
noise, resembling an office scenario (pseudo-diffuse), only
one algorithm, the spatially preprocessed speech-
distortion-weighted multi-channel Wiener filtering,
provided an SRT improvement (of 6-7 dB) relative to the
unprocessed condition.
To conclude, Single-channel SE algorithms have not
demonstrated persuasive speech recognition improvement
for the HI (HA-users or CI-recipients). Furthermore,
single-channel SE algorithms that may benefit a NH
subject at negative SNRs may not benefit a hearing-
impaired even at positive SNRs.
In the present work a single-channel SE algorithm
based on an ear model was used to evaluate speech
recognition on a large number of normal hearing, HA
users and CI recipients.
II METHOD
A. Cochlear Representation Algorithm
The cochlear representation algorithm (CRA) is based
on a full computational model of the cochlea [8]. The
model relies less on heuristics (and parameter estimation)
than conventional algorithms that are not based on a
hearing model. The CRA integrates outer hair cells (OHC)
activity in a one-dimensional cochlear model. In response
to a word input to the cochlea, a representation of the
basilar membrane's velocity is displayed in color as a
function of time and distance from the stapes (fig 1).
Alternating the OHC redundancy variable can result in a
normal cochlea (fig 1 top panel) or a damaged cochlea
response (fig 1 bottom panel). The dynamic properties of a
normal cochlear model output are used to reconstruct
noisy speech signals with improved SNR [9]. As displayed
in figure 2, the CRA method represents the auditory signal
(fig 2b), distinguishes between noised and un-noised
speech fragments in the input speech signal according to
threshold considerations (fig 2c), and applies a
corresponding masker (fig 2d). Reconstruction of the
masked signal results in an enhanced signal (fig 2f) in
comparison to the noisy input (fig 2a).
Figure 1: Representation of a normal (top) and damaged (bottom)
cochlea response to an input signal (word).
B. Psycho-acoustical Experiments
1. Subjects
A total of 99 subjects participated in the
experiment according to the following subdivision: (1)
NH; (2) HA users; and (3) CI users. The number of
participants in each group, their mean age and gender is
summarized in Table 1.
Figure 2: (a) A Time-domain representation of the word shen at
SNR 0 dB. (b) representation of the energy of the signal along the
cochlea as a function of time. (c) the algorithm progresses along the time
domain in narrow time windows and identifies areas of speech (dark red). (d) A mask with a shape of the identified speech is derived, masking out
areas that were not identified as speech. (e) The masker is applied to the
representation of the signal and reconstructed back to the time domain (f)
All the HI subjects (subgroups (2) and (3)) were
binaurally impaired. Most of them used binaural assisting
devices. In the HA subgroup, nine of them had a HA in
one ear only, the other ear was unassisted and totally deaf.
The CI subgroup included subjects that had CI at least in
one ear. Among them were six subjects who had no
assistance in the other ear; 2 had CI in both ears, and the
rest had a HA in the other ear. The rest of the CI users had
at least a CI in one ear.
All the subjects were volunteers and did not get any
reward. The HI subjects asked to participate in the study,
after they heard a lecture about the potential of the
method.
HA users CI users (ormal Hearing
subjects
analyzed 33 22 41
Age range
(yrs.) 23-64 14-82 17-52
Age
( mean±std) 51±14 49±18 35±14
Gender
(Female,
Male)
(14F, 19M) (9F, 13M) (11F ,30M)
Table 1: Age and gender of the different groups participating in the
experimental task.
2. Word Database and #oise Types
The database consists of the Hebrew adaptation to the
AB List (HAB). The AB list, a set of monosyllabic real
words, comprises consonant-vowel consonant (CVC)
words. The list was designed so that the different
phonemes in English shall be equally distributed
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
a) Input - noisy signal
Time [s]
am
plitue o
f sig
nal
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
f) Output - enhanced signal
Time [s]
am
plitue o
f sig
nal
b) representation of noisy signal
Time [s]
Dis
tance fro
m s
tapes [m
m]
0.16 0.32 0.48 0.64 0.8 0.96
3
7
10
14
17
21
24 -100
-80
-60
-40
-20
0
20
c) areas of identified speech
Time [s]
Dis
tance fro
m s
tapes [m
m]
0.16 0.32 0.48 0.64 0.8 0.96
3
7
10
14
17
21
24 -3
-2
-1
0
1
2
3
4
d) masker to be applied to b)
Time [s]
Dis
tance fro
m s
tapes [m
m]
0.16 0.32 0.48 0.64 0.8 0.96
3
7
10
14
17
21
24 0
0.5
1
1.5
2
2.5
3
3.5
4e) reconstructed representation
Time [s]
Dis
tance fro
m s
tapes [m
m]
0 0.16 0.32 0.48 0.64 0.8 0.96
0
3
7
10
14
17
21
24 -100
-80
-60
-40
-20
0
20
Quiet 30 24 180
10
20
30
40
50
60
70
80
90
100
SNR [dB]
Mean speech recognition score [%]
* * * *
64
57
45
55
39
53
29
46
CI users
CI users+CRA
n=22
Quiet 30 24 180
10
20
30
40
50
60
70
80
90
100
SNR [dB]
Mean speech recognition score [%]
*
67 66
57 55
47
53
42
53
HA users
HA users+CRA
n=33
Quiet 30 24 18 10 5 00
10
20
30
40
50
60
70
80
90
100
SNR [dB]
Mean w
ord recognition score [%]
99 99 98 95 9490 90
8685 86
74 74
5350
NH without CRA
NH+CRA
throughout the entire list [10]. The AB list is commonly
used in hearing tests as it reduces the effect of word
frequency and/or word familiarity on test scores.
Corresponding lists are produced for other languages and
accents [11]. The HAB list was designed for Hebrew
natives, and it consists of 15 lists of 10 monosyllabic
words such as “sir”, “kir” [12]. A single female speaker
recorded the HAB list with a sampling rate of 44.1 kHz.
Gaussian white noise was added to the database in
various SNRs (0, 5, 10, 18, 24 and 30 dB). The clean and
noisy HAB lists were band-pass filtered between 500 Hz
and 8 kHz. The filtered lists were applied to the CRA.
Eventually, each input word had a corresponding
reconstructed word. All together the complete database
consisted of several HAB word lists (each list comprising
of the same 150 words) in various treatments: noise type
(white noise), noise level (SNRs 0,5,10,18,24,30 and
quiet) and treatment (CRA, none).
3. Procedure
Each subject passed a standard audiometric test
procedure. The NH group consisted of subjects with 15
dB HL (PTA of 500, 1000, 2000 Hz). The hearing of the
patients with HAs was assessed with and without their aid.
The psychoacoustical experiment was word
recognition in an open-set. Subjects were seated in a
sound proof room, wearing their HA/CI in front of a
loudspeaker. They were randomly introduced with words
chosen from the recorded database. The level of the tested
words was adjusted for most comfortable level (MCL).
Each subject was tested in 8 experimental sessions of
10-20 words each. Following each word, the subject was
asked to repeat the write the word he/she heard.
HI subjects were tested with SNR levels of 30, 24,
and 18 dB, while the NH subjects were tested also with
lower SNRs, with a minimum of SNR of 0 dB. HI subjects
were not able to perform the experiment in the very low
SNRs.
III RESULTS
Mean speech recognition score of 22 CI users is
depicted in Fig 3 and of 33 HA users is in Fig 4.
Figure 3: Mean speech recognition score for quiet and for different
noise levels for the CI users (n=22). White bars represent score with the CIs without the CRA. Black bars represent score with CI+CRA. Values
above bars represent mean speech recognition score. Error bars are displayed. * t-test, p<0.05.
For decreasing SNRs, speech intelligibility for the CI
deteriorated with and without the CRA, from 64% at quiet
to 29% at SNR18 dB for no treatment and from 57% to
46% with the CRA. Similarly for the HA, Speech
intelligibility deteriorated with and without the CRA, from
67% at quiet to 42% at SNR 18 dB for no treatment and
from 66% to 53% with the CRA. The use of the CI or HA
alone is characterized by a gradual decline in performance
as SNR decreases, pointing to the difficulty of
understanding speech as the energy of the competing noise
increases. In SNRs 18, 24, 30 dB (for CI) and SNR18 dB
(for HA), there was a significant improvement in
performance relative to the situation when CRA was not
applied (t-test, p<0.05). Significant dis-improvement (t-
test, p<0.05) in word recognition with the CRA in
comparison to without the CRA was found at quiet
conditions for the CI.
Figure 4: Mean speech recognition score for quiet and
for different noise levels for the HA users (n=33). Significant
improvement (t-test, p=0.049) in word recognition with the CRA in comparison to without the CRA was found at SNR of 18 dB.
The results of the NH group are depicted in Fig 5.
From the 41 normal hearing subjects, 11 were tested at
high SNRs (similar to the HI groups) and reached a
'ceiling effect'. Additional 30 subjects were tested at low
SNRs (0, 5 and 10). Both groups were tested similarly in
quiet conditions and performed similarly with and without
the CRA (99% intelligibility). For decreasing SNRs,
Speech intelligibility deteriorated with and without the
CRA, from 98% at SNR30 to 53% at SNR0 for no
treatment and from 95% to 50% with the CRA.
Figure 5: Mean speech recognition score for quiet and for different noise
levels for NH (n=41).
Non-significant difference in performance with
and without the CRA was achieved (0%, -3%, -4%, -4%,
+1%, 0% and -3% for quiet and SNRs 30, 24, 18, 10, 5
and 0 dB respectively, where negative values depict poorer
performance with the CRA rather than with no treatment).
These findings are consistent with a comparative
intelligibility study of single-microphone noise reduction
algorithms on NH [1-2].
IV DISCUSSION
The psycho acoustical experiments revealed a significant
improved performance of HI subjects with CRA - the speech
enhancement algorithm tested. On the other hand, such an
improvement was not observed in NH listeners. It is,
therefore, most likely that NH and HI subjects use different
strategies in processing and recognizing noisy speech.
Subjects suffering from cochlear lesions and regularly
wearing their HAs acquire new acoustic cues that potentially
reorganize their cerebral cortex. Alternatively, similar
subjects that do not wear their HAs on a daily basis cannot
acquire new acoustic cues. Gatehouse [13] has used the term
“acclimatization effect” to explain the difference in speech
recognition scores between the aided ear and the un-aided
ear.
We hypothesize that the different performance in
recognition of the different groups is due to different
strategies used by these groups, when processing the input
from the cochlea. We assume that the strategy used by NH
for speech recognition is optimal usage of speech
representation redundancy. The speech can be recognized in
different overlapping spectral ranges, and the decision is
based on one or more frequency range. HI Subjects whom
improved their speech recognition with the CRA relative to
without have probably been accustomed to the distorted
signal produced by their HA/CI which emphasized certain
frequency ranges and ignored other frequencies where these
subjects did not have residual hearing. Other HI subjects
who did not benefit from the CRA relative to their HA or CI,
probably still use the strategies of NH which look at the
whole frequency range of the speech signal. Speech
enhancement techniques reduce the competing noise by
estimating the noise spectrum and subtracting it from the
noisy speech signal. This generates a distorted signal that
emphasizes part of the spectrum where the noise is minimal,
but diminishes other parts of the speech signal spectrum. A
NH that listens to the enhanced speech cannot use his
regular strategy since some of the speech spectrum is
missing. On the other hand, the speech enhancement
algorithm improves the strategy used by some of the HI,
since it cleans and emphasizes the part of the spectrum they
perceive and use for recognition.
REFERENCES
[1] Y. Hu, P.C. Loizou. "A comparative intelligibility study of
single-microphone noise reduction algorithms". J. Acoust. Soc.
Am. 122, 3, 1777–1786, 2007.
[2] K.H. Arehart, J.H.L. Hansen, S. Gallant and L. Kalstein.
"Evaluation of an auditory masked threshold noise suppression
algorithm in normal-hearing and hearing-impaired listeners"
Speech communications 40: 575-592, 2003.
[3] D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis. "Speech
enhancement based on audible noise suppression". IEEE Trans.
Speech Audio Process. 5: 497–513, 1997.
[4] L.P. Yang, Q.J. Fu. "Spectral subtraction-based speech
enhancement for cochlear implant patients in background noise".
J. Acoust. Soc. Am. 117,3 1001-1004, 2005.
[5] P.C. Loizou, A. Lobo, Y. Hu. "Subspace algorithms for noise
reduction in cochlear implants". J. Acoust. Soc. Am. 118, 2791–
2793, 2005.
[6] K. Kokkikanos, P.C. Loizou. "Using blind source separation
techniques to improve speech recognition in bilateral cochlear
implant patients". J. Acoust. Soc. Am. 123 (4), 2379-2390, 2008.
[7] H. Luts, K. Eneman, J. Wouters, et al. "Multicenter evaluation of
signal enhancement algorithms for hearing aids". J. Acoust. Soc.
Am. 127, 1491-1505. 2010.
[8] A. Cohen, M. Furst. "Integration of outer hair cell activity in a
one-dimensional cochlear model". J. Acoust. Soc. Am.. 115, 5,
2185-2192, 2004.
[9] M. Furst, A. Cohen, V. Weisz. "Method apparatus and system for
processing acoustic signals". US patent #7,366,656. 2008
[10] A. Boothroyd. "Statistical theories of the speech discrimination
score". J. Acoust. Soc. Am., 43, 362-367, 1968.
[11] S.C. Purdy, B. Arlington, C. Johnstone, "Normative Data for the
New Zealand Recording of the CVC (Revised AB) Word List".
#ew Zealand Audiological Society, Bulletin 10(2):20-29, 2000.
[12] L. Kishon-Rabin, S. Patael, M. Menahemi, N. Amir. "Are the
perceptual effects of spectral smearing influenced by speaker
gender?" Journal of Basic and Clinical Physiology and
Pharmacology,15: 41-55, 2004.
[13] S. Gatehouse. "Apparent auditory deprivation effect of late onset:
the role of presentation level. J. Acoust. Soc. Am. 86, 2103-2106,
1989.