8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 1/10
.Weighting of the fine-structure for the perception of the mono-syllabic
speech stimuli in the presence of noise
Introduction:
Information in speech is redundant. For normal-hearing subjects, this means that the signal isrobust to corruption, and that speech remains intelligible under adverse listening conditions,
such as in high levels of background noise.
In the normal auditory system, a complex sound like speech is filtered into frequencychannels on the basilar membrane. The signal at a given place can be considered as a time-
varying envelope superimposed on the more rapid fluctuations of a carrier (temporal fine
structure, TFS) whose rate depends partly on the center frequency and bandwidth of the channel,
which is important for the perception of speech in noise.The relative envelope magnitude across channels conveys information about the spectral
shape of the signal and changes in the relative envelope magnitude indicate how the short-termspectrum changes over time, which is the key role player in the perception of speech in quite.
The TFS carries information both about the fundamental frequency (F0) of the sound
(when it is periodic) and about its short-term spectrum.
The bandpass signal at a specific place on the basilar membrane (or the signal produced bybandpass filtering to simulate the waveform at one place on the basilar membrane) can be
analyzed using the Hilbert transform to create what is called the“analytic signal” (Bracewell1986).
In the mammalian auditory system, phase locking tends to break down for frequenciesabove 4 – 5 kHz (Palmer and Russell, 1986), so it is generally assumed that TFS information is
not used for frequencies above that limit. The role of TFS in speech perception for frequencies
below 5 kHz remains somewhat unclear.
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 2/10
The upper limit of phase locking in humans is not known. Although TFS in the stimulus
on the basilar membrane is present up to the highest audible frequencies, this paper is especiallyconcerned with TFS information as represented in the patterns of phase locking in the auditory
nerve. This information probably weakens at high frequencies, and so one way of exploring the
use of TFS information is to examine changes in performance on various tasks as a function of
frequency.
Many studies have assessed the relative importance of TFS and envelope information for
speech intelligibility, for normal-hearing subjects. The challenge inherent in evaluating theindividual contribution of frequency-specific (place) and temporally coded (temporal) cues to
auditory perception typically arises from difficulty in decomposing an auditory signal (such as
speech) into a modulator (or envelope) and a carrier so that either can be “independently”altered, reduced or replaced.
One such method involves decomposition of the signal by means of the Hilbert
transform. This method will be referred to as the Hilbert approach. Although it has several
variants, it can generally be described as follows. A priori, it is assumed that a broadband signal,
S(t), can be described as the sum of N modulated bands, Sn(t), such as
(1)
where mn(t) and cn(t) are, respectively, the modulator and the carrier in the nth band. In order
to reduce possible confusion,the original modulator and carrier will always be referred to as m(t)and c(t), respectively. The computed envelope and phase (or temporal fine structure; TFS),
defined later on, will always be referred to as a(t) and cos/(t). From Eq. (1), it is clear that themodulator and the carrier could easily be manipulated separately. However, for an observed
signal such as speech, mn(t) and cn(t) are unknown, and therefore must be determined. By
introducing Zn(t), the analytic
signal defined by
_; (2)
where and H[ _ _ _ ] is the Hilbert transform, one can determine the Hilbertinstantaneous amplitude, an(t), and the Hilbert instantaneous phase, / n(t), respectively given by
so that the original signal can be rewritten as
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 3/10
It is commonly assumed that mn(t) _ an(t) and cn(t) _ cos / n(t), and thus one canmanipulate the envelope and/or the fine structure independently and synthesize amodifiedversion of the original signal.
Several recent studies, however, suggest that the Hilbert approach may be inappropriate
to decompose complex signals such as speech. It should be noted that this restriction is limited tothose situations where the envelope and/or the fine structure are manipulated (e.g., filtered) prior
to be added back together to synthesize a new signal.
Ghitza (2001) first suggested that part of the original envelope information can be recovered
from the Hilbert fine structure at the output of the auditory filters. The intelligibility of TFS-
speech may be influenced by reconstructed E cues.The reconstructed envelope cues make a contribution to the intelligibility of TFS-speech,
even though the envelope cues alone are not sufficient to give good intelligibility. The fact that
learning is required to achieve high intelligibility with TFS-speech may indicate that the auditory
system normally uses TFS cues in conjunction with envelope cues; when envelope cues are
minimal, TFS information may be difficult to interpret. Alternatively, the learning may reflectthe fact that TFS cues are distorted in the TFS-speech (relative to unprocessed speech), and it
may require some training to overcome the effects of the distortion.
Several behavioral (Zeng et al., 2004; Gilbert and Lorenzi, 2006) and neurophysiological (Heinzand Swaminathan, 2009) studies have since confirmed that envelopes derived from the TFS can
produce good speech intelligibility. In the behavioral studies, normal-hearing (NH) listeners
were presented with the TFS of speech stimuli or with a series of noise or tone carriersamplitude-modulated by the recovered envelopes.
In the latter case, a technique similar to vocoder processing (Shannon et al., 1995) was
used and the recovered envelopes corresponded to the outputs of a bank of gammachirp auditoryfilters (Irino and Patterson, 1997) in response to the original speech fine structure.
“Vocoder” processing has been used to remove TFS information from speech, soallowing speech intelligibility based on envelope and spectral cues to be measured (Dudley,
1939; Van Tasell et al., 1987; Shannon et al., 1995).
Aspeech signal is filtered into a number of channels ( N ), and the envelope of eachchannel signal is used to modulate a carrier signal, typically a noise (for a noise vocoder) or a
sine wave with a frequency equal to the channel center frequency (for a tone vocoder). The
modulated signal for each channel is filtered to restrict the bandwidth to the original channel
bandwidth and the modulated signals from each channel are then combined. For a single talker,provided that N is sufficiently large, the resulting signal is highly intelligible to both normal-
hearing and hearing-impaired subjects (Shannon et al., 1995; Turner et al., 1995;Baskent, 2006;
Lorenzi et al., 2006b). However, if the original signal includes both a target talker and a
background sound, intelligibility is greatly reduced, even for normal-hearing subjects (Dorman et
al., 1998; Fu et al., 1998; Qin and Oxenham, 2003; Stone and Moore, 2003), leading to the
suggestion that TFS information may be important for separation of a talker and background into
separate auditory streams (Friesen et al., 2001).
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 4/10
Zeng et al. (2004) found up to 40% correct performance for sentences and Gilbert and Lorenzi(2006) found up to 60% correct performance for consonants.
Gilbert and Lorenzi (2006) also showed that performance decreases with increasing number of
analysis bands. The authors attributed the effect of the number of bands to the ratio between thebandwidth of the analysis filters and that of the auditory filters. They also concluded that
consonant identification is essentially abolished when the bandwidth of the analysis filters is less
than or equal to four times the bandwidth of normal auditory filters.
As India is multi culture and multi linguist and most of them are bilingual ---- Indian English is
different from other countries due to the influence of the multi ligngism----- So there is dearth of
knowing to what extent the low frequencies contribute to the speech intelligibility in Indian
English--- and to know However how much of the low frequency hearing preservation is needed
for the perception of speech in noise is not explored
If this information can be known, it stands as a outcome measure prior to implantation . Hence forth the present study felt the need of Weighting of the fine-structure for the perception
of the mono-syllabic speech stimuli in noise (i.e) knowing whether high frequency or low
frequency fine structure information is required for speech perception in noise.
Methodology:
Subjects :
Age range : 18 to 25 yrs. (young Adults)
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 5/10
Control group: Normal hearing individuals (normal hearing as per ANSI
Criteria).
All should have normal hearing, defined as having audiometric thresholds of 20 dBHL (hearing level) or less at octave frequencies between 250 and 8000 Hz and
normal immittance measures and histories consistent with normal hearing.
Experimental group: Individuals with Moderately severe Sensory Neural
Hearing loss (post lingual deafness).
The hearing-impaired subjects were selected to have „„flat‟‟ moderate hearing
losses ,and they were divided into two groups: young (n_7; mean age_24; range:
18 – 25) and elderly (n _ 7; mean age _ 68; range:63 – 72), because there is some
evidence that the ability to use TFS decreases with increasing age .
Air-conduction, Bone-conduction, and impedance audiometry for the hearing
impaired subjects were consistent with sensorineural impairment. The origin of
hearing loss was unknown for all elderly subjects and was either congenital or
hereditary for the young ones. All impaired subjects had been fitted with a hearing
aid on the tested ear for _9 years.
Number of subjects: As many as possible with in the time constraints of the
data collection.
All subjects were fully informed about the goal of the present study and provided
written consent before their participation.
Stimuli to be used:
To overcome the bias because of the differences in the semantic knowledge of the
subject‟s and to check the efficiency of the technology, the speech material in thestudy consisted of 50 ISHA PB – words.
All the PB words will be produced by a female and a male speaker and recorded
using SLM and Adobe Audition at a 44100-Hz sampling rate in a sound proof
booth into the laptop.
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 6/10
Instruments to be used:
MATLAB 2010a for signal processing.
GSI 61(Dual channel) for presenting stimuli.
Stimuli Synthesis
Phase I :
Stimuli:Speech signals will be digitized (16-bit resolution) at a 44.1-kHz sampling
frequency; they will then be band-pass filtered using Butterworth filters (72 dB/oct
rolloff) based on the green wood frequency - function critical bands spanning the
range 80 – 8,020 Hz. The bands will be less than two times as wide as the
„normal‟‟auditory filters (44), and probably comparable to the widths of the
auditory filters of the impaired subjects ,thus ensuring that recovered E cueswould be minimal for both groups of subjects.
The use of these analysis bands also ensured that the amount of spectral
information provided by the E stimuli was similar for the normal-hearing and
hearing-impaired subjects.( :ref: Gilbert G, Lorenzi C (2006) J Acoust Soc Am
119:2438 – 244)
These bandpass filtered signals were then processed in three ways.
In the first (referred to as „„intact‟‟), the signals were summed over allfrequency bands. These signals contained both TFS and E information.
In the second (referred to as „„E‟‟), the envelope was extracted in each
frequency band using the Hilbert transform followed by lowpass filtering
with a Butterworth filter (cutoff frequency_64 Hz, 72 dB/oct rolloff).
The filtered envelope was used to amplitude modulate a sine wave with a
frequency equal to the centre frequency of the band, and with random
starting phase.
The 16 amplitude-modulated sine waves were summed over all frequencybands. These stimuli contained only E information.
In the third (referred to as „„TFS‟‟), the Hilbert transform was used to
decompose the signal in each frequency band into its E and TFS
components.
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 7/10
Procedure
All stimuli were delivered monaurally to the right ear via TDH 39 headphones.
The stimuli were presented to the normal-hearing subjects at a level of SRS and to
the hearing-impaired subjects to ensure that the stimuli were audible and
comfortably loud.
Condition I;
Each individual be presented with PB words both with and without noise.
Condition II:
Each individual will be presented with processed signal (without the TFS
information) of atleast of one band of frequencies, every time in ascending (low -
high) order of elimination band.
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 8/10
Response :
Oral repetition of the word would be expected.
Scoring :
A score of „0‟ for every wrong repetition and „1‟ for every correct repetition
would be allotted.
ROL :
Band width:
The function described in 1961 (Greenwood,1961b) hypothesized that critical bandwidth,
in Hz, might follow an exponential function:
CB c 10“ + h_ of distance, x (in any physical units or normalized distance), along the cochlearpartition, and correspond also to a constant distance on the basilar membrane.
The frequency-position function obtained as above (see Fig. l), is: Fletcher (1940. 1953),and Zwicker et al. (1957).
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 9/10
F=A(lOUX-k),where F is in Hz and x is in mm and where suitable constants (for man) are: A = 165 and a =0.06, the latter an empirical constant arising in the critical band function but found also to agree
closely with the logarithmic slope of BCkCsy‟s volume compliance gradient for the human
cochlear partition; and k, an integration constant left here at the original value 1, but that may
sometimes be better replaced by a number from about 0.8 to 0.9 to set a lower frequency limitdictated by convention or by the best fit to data.
Although the value k = 0.88 would yield the conventional lower frequency limit of 20 Hz
for man, I will continue to use 1.0 for man and most of the other species in this paper, excepting
the cat since Liberman (1982) has found that a k of 0.8 best adjusts this function to his lowfrequency data points in the cat.
NO. of Bands :
Traditionally, the spectral magnitudes have been regarded as of primary importance for
perception, although under some conditions, the phases of the components play an important
role. (Moore 2002).
The bandpass signal at a specific place on the basilar membrane (or the signal produced
by bandpass filtering to simulate the waveform at one place on the basilar membrane) can be
analyzed using the Hilbert transform to create what is called the“analytic signal” (Bracewell
1986).
Hilbert transform can be used to decompose the time signal into its envelope (E; the
relatively slow variations in amplitude over time) and temporal fine structure (TFS; the rapid
oscillations with rate close to the center frequency of the band)
Each filter was chosen to have a bandwidth of 1 ERBN, where ERBN stands for the
equivalent rectangular bandwidth of the auditory filter as determined using young normallyhearing listeners at moderate sound levels (Glasberg and Moore 1990; Moore 2003). The suffix
N denotes normal hearing.
Traditionally, the envelope has been regarded as the most important carrier of
information, at least for speech Signals.Both E and TFS information are represented in the timing
of neural discharges, although TFS information depends on phase locking to individual cycles of
the stimulus waveform (Young and Sachs 1979).
In most mammals, phase locking weakens for frequencies above 4–5 kHz, although some
useful phase locking information may persist for frequencies up to at least 10 kHz (Heinz et al.
2001).
8/3/2019 Weighting of the Fin1
http://slidepdf.com/reader/full/weighting-of-the-fin1 10/10
Top Related