Contributions of energetic vs. informational masking in multitalker backgrounds
The Complexities of Understanding Speech in Background Noise
Stuart Rosen
UCL Speech, Hearing and Phonetic Sciences
First International Conference onCognitive Hearing Science for Communication
1A caveat about cognitionImportant aspects of this problem are not cognitive but Cognitive processing relies on adequate sensory representations, and can compensate for impoverished sensory representations.
2Why is this interesting?Most speech is not heard in quiet, anechoic conditions.People vary a lot in how well they can understand speech in the presence of other sounds.Effects of hearing impairmentEffects of ageAuditory processing disorder (APD)?3Some determinants of performance: IThe nature of the target speech materialPredictabilitycontextnumber of alternative utterances frequency of usagesize of lexical neighbourhoods
Some determinants of performance: IIThe configuration of the environmentOpen air or in a room?How dry is a room?effects of reverberationspatial separation between target and noiseor, the transmission system (e.g. mobile telephone)distortion & noise added by the system
Some determinants of performance: IIITalker characteristicsDifferent talkers vary considerably in intrinsic intelligibilityTalkers vary their own speech depending upon demands of the situation hyper/hypo distinction of Lindblom (1990)Match between talker and listener accentsSome determinants of performance: IVListener characteristicsLinguistic developmentvocabulary knowledgeability to use contextthe presence of language impairmentsL1 vs L2Hearing sensitivity and any hearing prosthesis usedNeuro-developmental disordersLanguage impairmentAutism spectrum disorderAPD
Some determinants of performance: VThe nature of the background noiseslevel (SNR)fluctuations in levelspectral characteristics genuine noise: aperiodic or periodic?and/or other talkershow many there arespeaking your own language or a language you dont knowHow attention-grabbing the background noises are
The simplest case:A steady-state background noise
9Much is understood about what makes one steady noise more or less interfering than another
spectral shapeSNR10Energetic maskingNoises interfere with speech to the extent that have energy in the same frequency regionsCan be quantified in the articulation indexReflects direct interaction of masker and speech in the cochlea, which acts as a frequency analyser.
11But noises are typically not steady
12
masker
Fluctuating maskers afford glimpses of the target signaltargetmaskerglimpses13dip listening or glimpsingPeople with normal hearing can listen in the dips of an amplitude modulated masker
SRT for VCVs in simple on/off fluctuations as a function of the duration of the fluctuation.
Howard-Jones & Rosen (1993) Acustica better performance14Dips can be limited in frequency (checkerboard noise)SRT for VCVs in 10 Hz modulations with different numbers of channels.
Howard-Jones & Rosen (1993) JASA
better performanceBut maskers can be periodic too, most importantly, when speech is in the background.
16Miller (1947)The masking of speechIt has been said that the best place to hide a leaf is in the forest, and presumably the best place to hide a voice is among other voices.17Listening to speech in noise
Bouncyin quiet in steady noise in modulated noise against another talker
Childrens Coordinate Response Measuredog_blue_8.wavduck_pink_9.wav
18A useful distinctionEnergetic maskingmaskers interfere with speech to the extent that have energy in the same time/frequency regionsprimarily reflecting direct interaction of masker and speech in the cochlearelevance of glimpsing/dip listeningTemporal and/or spectral dips in the masker allow glimpses of target speechInformational maskingeverything else!
19Informational maskingSomething to do with target/masker similarity?signal and masker are both audible but the listener is unable to disentangle the elements of the target speech from a similar-sounding distracter (Brungart, 2001)
20Informational masking: a finer distinction (Shin-Cunningham, 2008)Problems in object formation Related to auditory scene analysissimilarities in auditory properties make segregation difficultvoice pitch, timbre, rate
Problems in object selection Related to attention and distractionthe masker may distract attention from the targete.g., more interference from a known as opposed to a foreign language
2 men1 woman, 1 man
21EM & IM appear to operate at different parts in the auditory pathwayEnergetic masking at the periphery, in the cochleaEarly developing abilitiesIncreased EM from hearing impairmentUnlikely to be a factor in APDInformational masking at higher centres Late developing abilities?Increased IM in older listeners?Increased IM in developmental disorders?But aspects of IM can be made difficult by peripheral factorse.g., CI users difficulties with auditory scene analysis
22little glimpsing for CI usersNelson et al. (2003)speech-spectrum-shaped masking noise square-wave modulated added to IEEE sentences
normal listenersbetter performance CI usersnot only poor frequency selectivity, but lack of sensation of voice pitch (poor perception of TFS) makes auditory scene analysis difficult:How do you tell the noise from the speech?
better performance But IM can be excessive in the presence of normal hearing Children find it hard to ignore another talker
better performance62 children and 23 young adults26
Slow development of abilities that minimise IM better performance62 children and 23 young adults27Increased IM in Specific Language Impairment (SLI)
9 SLI & 10 TD children aged 6-10 years better performance steady noise ed speechCCRM sentences MSc work of Csaba Redey-Nagy28Increased IM in some people withHigh Functioning Autism (HFA)
CCRM sentences in various backgroundsPhD work of Katharine Mairevidence for a temporal processing deficit but not the crucial factor in excessive masking for speechcontrolbetter HFAworse HFA29
Increased IM in some people withHigh Functioning Autism (HFA)CCRM sentences in various backgroundsPhD work of Katharine MairHFA poor performers (and younger children) are highly susceptible to informational masking but what aspect?
ASA?attention?linguistic aspects?controlbetter HFAworse HFA30An ecologically valid test bed for evaluating the roles of EM and IM:
Speech in n-talker babble for n=1,2,3 talkersMiller (1947)Increasing the number of talkers in the masker
SNR (dB) +12 +6 0 -6 -12 -18 It is relatively easy for a listener to distinguish between two voices, but as the number of rival voices is increased the desired speech is lost in the general jabber.
target words from multiple males babble: equal numbers of m/f(1 VOICE is male)
better performance 32
IEEE sentences in n-talker babble What happens as n increases?glimpsing opportunities so EM linguistic content so IM (selection?)number of Fo contours so IM (ASA)
better performance 1-talker voice pitch source with envelopes derived from n-talker babble
1-talkerbabble-modulated 1-talker F0 (plus with an unmodulated envelope)2-talker16-talker
342-talker voice pitch source with envelopes derived from n-talker babble
1-talkerbabble-modulated 2-talker F0 (plus with an unmodulated envelope)2-talker16-talker
35Unintelligible maskers on noise-vocoded IEEE sentences
noise1 Fo contour2 Fo contoursPeriodicity in the maskers leads to better performance, probably through better ASA
Its easier to ignore a single F0 contour, rather than two
but ...
Why improved performance for steady-state vs 16-talker envelopes?
Worse still, why glimpsing in noise?! better performance36ANOVA interaction significant, but not mixed model with nTalkers as a continuous predictorFinal remarksThe balance of EM & IM effects presumably varies with the age and hearing status of the listener
The linguistic effects seen may represent a separate aspect of IM apart from object formation and selection.
Unraveling the contributions of various factors in understanding the masking of speech by other sounds is very important But very complex!
37Tack s mycket!Work supported by:UCL Speech, Hearing and Phonetic SciencesNational Institutes of Health DC006014Bloedel Hearing Research Center
Thanks to my collaborators:Sophie Scott, Katharine Mair, Tim Green, Csaba Redey-Nagy, Jude Barwell, Zoe Lyall & Arooj Majeed of UCLPam Souza, Northwestern U
38
Top Related