Download - A caveat about ‘cognition’

Contributions of energetic vs. informational masking in multitalker backgrounds

The Complexities of Understanding Speech in Background Noise

Stuart Rosen

UCL Speech, Hearing and Phonetic Sciences

First International Conference onCognitive Hearing Science for Communication

1A caveat about cognitionImportant aspects of this problem are not cognitive but Cognitive processing relies on adequate sensory representations, and can compensate for impoverished sensory representations.

2Why is this interesting?Most speech is not heard in quiet, anechoic conditions.People vary a lot in how well they can understand speech in the presence of other sounds.Effects of hearing impairmentEffects of ageAuditory processing disorder (APD)?3Some determinants of performance: IThe nature of the target speech materialPredictabilitycontextnumber of alternative utterances frequency of usagesize of lexical neighbourhoods

Some determinants of performance: IIThe configuration of the environmentOpen air or in a room?How dry is a room?effects of reverberationspatial separation between target and noiseor, the transmission system (e.g. mobile telephone)distortion & noise added by the system

Some determinants of performance: IIITalker characteristicsDifferent talkers vary considerably in intrinsic intelligibilityTalkers vary their own speech depending upon demands of the situation hyper/hypo distinction of Lindblom (1990)Match between talker and listener accentsSome determinants of performance: IVListener characteristicsLinguistic developmentvocabulary knowledgeability to use contextthe presence of language impairmentsL1 vs L2Hearing sensitivity and any hearing prosthesis usedNeuro-developmental disordersLanguage impairmentAutism spectrum disorderAPD

Some determinants of performance: VThe nature of the background noiseslevel (SNR)fluctuations in levelspectral characteristics genuine noise: aperiodic or periodic?and/or other talkershow many there arespeaking your own language or a language you dont knowHow attention-grabbing the background noises are

The simplest case:A steady-state background noise

9Much is understood about what makes one steady noise more or less interfering than another

spectral shapeSNR10Energetic maskingNoises interfere with speech to the extent that have energy in the same frequency regionsCan be quantified in the articulation indexReflects direct interaction of masker and speech in the cochlea, which acts as a frequency analyser.

11But noises are typically not steady

12

masker

Fluctuating maskers afford glimpses of the target signaltargetmaskerglimpses13dip listening or glimpsingPeople with normal hearing can listen in the dips of an amplitude modulated masker

SRT for VCVs in simple on/off fluctuations as a function of the duration of the fluctuation.

Howard-Jones & Rosen (1993) Acustica better performance14Dips can be limited in frequency (checkerboard noise)SRT for VCVs in 10 Hz modulations with different numbers of channels.

Howard-Jones & Rosen (1993) JASA

better performanceBut maskers can be periodic too, most importantly, when speech is in the background.

16Miller (1947)The masking of speechIt has been said that the best place to hide a leaf is in the forest, and presumably the best place to hide a voice is among other voices.17Listening to speech in noise

Bouncyin quiet in steady noise in modulated noise against another talker

Childrens Coordinate Response Measuredog_blue_8.wavduck_pink_9.wav

18A useful distinctionEnergetic maskingmaskers interfere with speech to the extent that have energy in the same time/frequency regionsprimarily reflecting direct interaction of masker and speech in the cochlearelevance of glimpsing/dip listeningTemporal and/or spectral dips in the masker allow glimpses of target speechInformational maskingeverything else!

19Informational maskingSomething to do with target/masker similarity?signal and masker are both audible but the listener is unable to disentangle the elements of the target speech from a similar-sounding distracter (Brungart, 2001)

20Informational masking: a finer distinction (Shin-Cunningham, 2008)Problems in object formation Related to auditory scene analysissimilarities in auditory properties make segregation difficultvoice pitch, timbre, rate

Problems in object selection Related to attention and distractionthe masker may distract attention from the targete.g., more interference from a known as opposed to a foreign language

2 men1 woman, 1 man

21EM & IM appear to operate at different parts in the auditory pathwayEnergetic masking at the periphery, in the cochleaEarly developing abilitiesIncreased EM from hearing impairmentUnlikely to be a factor in APDInformational masking at higher centres Late developing abilities?Increased IM in older listeners?Increased IM in developmental disorders?But aspects of IM can be made difficult by peripheral factorse.g., CI users difficulties with auditory scene analysis

22little glimpsing for CI usersNelson et al. (2003)speech-spectrum-shaped masking noise square-wave modulated added to IEEE sentences

normal listenersbetter performance CI usersnot only poor frequency selectivity, but lack of sensation of voice pitch (poor perception of TFS) makes auditory scene analysis difficult:How do you tell the noise from the speech?

better performance But IM can be excessive in the presence of normal hearing Children find it hard to ignore another talker

better performance62 children and 23 young adults26

Slow development of abilities that minimise IM better performance62 children and 23 young adults27Increased IM in Specific Language Impairment (SLI)

9 SLI & 10 TD children aged 6-10 years better performance steady noise ed speechCCRM sentences MSc work of Csaba Redey-Nagy28Increased IM in some people withHigh Functioning Autism (HFA)

CCRM sentences in various backgroundsPhD work of Katharine Mairevidence for a temporal processing deficit but not the crucial factor in excessive masking for speechcontrolbetter HFAworse HFA29

Increased IM in some people withHigh Functioning Autism (HFA)CCRM sentences in various backgroundsPhD work of Katharine MairHFA poor performers (and younger children) are highly susceptible to informational masking but what aspect?

ASA?attention?linguistic aspects?controlbetter HFAworse HFA30An ecologically valid test bed for evaluating the roles of EM and IM:

Speech in n-talker babble for n=1,2,3 talkersMiller (1947)Increasing the number of talkers in the masker

SNR (dB) +12 +6 0 -6 -12 -18 It is relatively easy for a listener to distinguish between two voices, but as the number of rival voices is increased the desired speech is lost in the general jabber.

target words from multiple males babble: equal numbers of m/f(1 VOICE is male)

better performance 32

IEEE sentences in n-talker babble What happens as n increases?glimpsing opportunities so EM linguistic content so IM (selection?)number of Fo contours so IM (ASA)

better performance 1-talker voice pitch source with envelopes derived from n-talker babble

1-talkerbabble-modulated 1-talker F0 (plus with an unmodulated envelope)2-talker16-talker

342-talker voice pitch source with envelopes derived from n-talker babble

1-talkerbabble-modulated 2-talker F0 (plus with an unmodulated envelope)2-talker16-talker

35Unintelligible maskers on noise-vocoded IEEE sentences

noise1 Fo contour2 Fo contoursPeriodicity in the maskers leads to better performance, probably through better ASA

Its easier to ignore a single F0 contour, rather than two

but ...

Why improved performance for steady-state vs 16-talker envelopes?

Worse still, why glimpsing in noise?! better performance36ANOVA interaction significant, but not mixed model with nTalkers as a continuous predictorFinal remarksThe balance of EM & IM effects presumably varies with the age and hearing status of the listener

The linguistic effects seen may represent a separate aspect of IM apart from object formation and selection.

Unraveling the contributions of various factors in understanding the masking of speech by other sounds is very important But very complex!

37Tack s mycket!Work supported by:UCL Speech, Hearing and Phonetic SciencesNational Institutes of Health DC006014Bloedel Hearing Research Center

Thanks to my collaborators:Sophie Scott, Katharine Mair, Tim Green, Csaba Redey-Nagy, Jude Barwell, Zoe Lyall & Arooj Majeed of UCLPam Souza, Northwestern U

38