Sound source segregation - UW Faculty Web Server

38
1 Sound source segregation Development of the ability to separate concurrent sounds into auditory objects So far we have no compelling explanation for the fact that infants and preschoolers have higher masked thresholds than adults.One potential explanation is that the immature auditory system has trouble segregating the signal from the noise-- in the same way that it might have trouble separating Mom’s voice from the sound coming from the TV. Here we consider what is known about the development of sound source segregation.

Transcript of Sound source segregation - UW Faculty Web Server

Page 1: Sound source segregation - UW Faculty Web Server

1

Sound sourcesegregation

Development of the ability toseparate concurrent sounds into

auditory objects

So far we have no compelling explanation for the fact that infants andpreschoolers have higher masked thresholds than adults.One potentialexplanation is that the immature auditory system has troublesegregating the signal from the noise-- in the same way that it mighthave trouble separating Mom’s voice from the sound coming from theTV. Here we consider what is known about the development of soundsource segregation.

Page 2: Sound source segregation - UW Faculty Web Server

2

The problem…

TIME

FREQ

UEN

CYWow! Psychophysics

is interesting!

Just to remind you what we mean by sound source segregation: Theidea is that multiple sound sources in the environment create soundswith overlapping spectra that are encoded by the ear. The brainsomehow is able to figure out that some frequency components comefrom one source, others from another, and so on. We are then able toperceive the separate auditory objects in the environment.

Page 3: Sound source segregation - UW Faculty Web Server

3

Cues that adults use to segregatecomponents into sources Spectral separation Spectral profile Harmonicity Spatial separation Temporal separation Temporal onsets and offsets Temporal modulations

There are several sorts of acoustic information that the auditory systemcould use to figure out which components belong together. Moststudies of sound source segregation focus on identifying andmanipulating acoustic cues thought be responsible for sound sourcesegregation. However, the auditory system rarely receives only one ofthese cues in isolation. Instead we have several cues available.

Page 4: Sound source segregation - UW Faculty Web Server

4

Measuring sound sourcesegregation Auditory streaming “Thresholds” of sounds, segregated and

not segregated Informational masking (indirect evidence)

There are three ways that sound source segregation has been studieddevelopmentally. The first two of these are pretty direct measures--auditory streaming, which I will review in more detail in the next slides;and the difference in thresholds for a sound when it can be segregatedfrom competing sound and when it can’t be segregated. The thirdphenomenon here-- informational masking-- is one that is thought toinvolve sound source segregation, so I will review a little of what weknow about its development as well. This is more indirect evidence,however, as informational masking isn’t all that well understood.

Page 5: Sound source segregation - UW Faculty Web Server

5

Auditory streaming

You might recall that in an auditory streaming experiment, we present alistener with two sounds and then try to figure out when the listenerhears these sounds as one sound and when he hears them as twosounds. The sequence presented is illustrated in the top of this drawing.Two sequences, A and B are presented, with the elements in eachsequence alternating. When the listener hears these as a single sound(or sequence or stream), he reports that there is one sound thatalternates from A to B. But if we make A and B more different in someway, the listener may then hear two separate sounds, a sequence of Asand a sequence of Bs. These sounds will still be intermittent (not asshown in the bottom part of this picture), but the listener won’t be ableto tell whether an A element precedes a B element or not. Now thelistener hears two streams.

Page 6: Sound source segregation - UW Faculty Web Server

6

A single sound source is perceived

This is easier to understand with an example. Here we have twosequences of tones of different frequency. In the A frequency the tonesare spaced further apart in time than they are in the B sequence. Whenyou hear the sequences together, it sounds like one “galloping” soundand you can hear that they go high-low-low-high-- you can tell theorder.

Page 7: Sound source segregation - UW Faculty Web Server

7

Two sound sources are perceived

Now if I move the two sequences apart in frequency, the combinationsounds different. You hear a rapid beep that is low and a slower beepthat is high, and it is no longer evident what the order of events acrosssequences is. Now we say you hear two streams, and we think that islike being able to separate the sound into two different sources on thebasis of frequency. When you do this experiment, you ask the adultlistener to tell you whether he heard one stream or two.

Page 8: Sound source segregation - UW Faculty Web Server

8

Auditory streaming in infants

How many streams

that time?

That experiment would be sort of difficult to do with infants, as theyhave trouble saying “one stream” and “two streams” among otherthings. Several investigators have approached this problem byconstructing stimuli that sound different to adults and then seeing if theyalso sound different to infants.

Page 9: Sound source segregation - UW Faculty Web Server

9

Auditory streaming in infants

This is an example of stimuli used in a study by McAdams andBertoncini. They presented two sorts of sounds, a synthesizedvibraphone sound that came out of a speaker to the infant’s right and atrumpet sound that came out of a speaker to the infant’s left. In thispicture the “y-axis” is frequency and the “x-axis” is time.They playedthese sounds in too configurations. In the first one, at the top of thispicture, the vibraphone sound was played three times, with increasingpitch, followed by the trumpet at a higher pitch still. When adults listento this sound, they hear one stream of sound, repeatedly increasing inpitch, and if you turn the direction of pitch change around, as in the“retrograde sequence” panel, adults will readily report that the soundhas changed. In the bottom configuration, though, adults hear twostreams-- one is the vibraphone alternating between high and lownotes, and the other is the trumpet alternating between low and highnotes. Because the sounds are in separate streams, adults can’t judgethe relative order of the elements in the two streams, and so if I reversethe order (“retrograde sequence”, configuration 2/2) they can’t tell thatanything has changed. So the question is whether infants candiscriminate between the initial and retrograde sequences of the twoconfigurations. The infant subjects were newborns and they were testedwith the high-amplitude sucking paradigm.1.Demany, L., Auditory stream segregation in infancy. Infant BehavDev, 1982. 5: p. 261-276.2.Fassbender, C., Auditory grouping and segregation processes ininfancy. 1993, Norderstedt, Germany: Kaste Verlag.3.McAdams, S. and J. Bertoncini, Organization and discrimination ofrepeating sound sequences by newborn infants. J Acoust Soc Am,1997. 102: p. 2945-53.

Page 10: Sound source segregation - UW Faculty Web Server

10

Auditory streaming in infants

Configuration 3/1 Configuration 2/2

Once they had picked the frequency differences between the soundsand the timing correctly, McAdams and Bertoncini were able to showthat the group that heard the 3/1 configuration switch to its retrograde(experimental group in left panel) could discriminate the change indirection. This is indicated by the increasein sucking rate afterhabituation point, (It turned out that they had to make the sequencepretty slow and have big frequency differences to make this work.) Theinfants who heard the 2/2 configuration, responded to the switch to theretrograde sequence (experimental in right panel) the same way as thecontrol infants who heard no change at all. Mc Adams and Bertoncinireason that 1) adults discriminate the 3/1 initial sequence from itsretrograde because they do not segregate the streams, but do notdiscriminate the 2/2 initial sequence from its retrograde because theydo segregate the streams. 2) Infants discriminate the 3/1 configurationfrom its retrograde, but nor the 2/2 sequence from its retrograde,suggesting that they “hear” the sequences like the adults do. So thesestudies demonstrate that infants could be hearing auditory streams atleast qualitatively like adults.

Page 11: Sound source segregation - UW Faculty Web Server

11

Electrophysiologicalmeasures of streamingin newborns

Winkler et al.(2003) examined event related potentials to sounds in newborn infants,using logic similar to that used in the behavioral studies. The ERPs they measuredwere the mismatch negativity (MMN) and P3, a positive wave that usually follows theMMN in adults. These are responses that occur when a change in a sound occurs. Inother words, if you repeat one sound a number of times and then switch to anothersound, you get these responses to the switched sound.Winkler et al reasoned that if the repeated, or frequent, sound and the switched, orinfrequent, sound were presented with other sounds, the switch in sound would bedetected if the frequent and infrequent sounds were heard in a different stream from theother sounds, but not if the sounds were all heard in one stream.The standard condition is shown in the top graph in the slide. A tone at 1813 Hz and 61dB SPL is presented repeatedly; the infrequent tone is at 1813 Hz and 76 dB SPL. Inthe streaming condition, tones at 4 other frequencies and 4 other intensities arerandomly interspersed with the frequent and infrequent tones and all the tones are in anarrow frequency range. The prediction is that all the tones will be treated as onestream of sound, and the change from frequent to infrequent tones will not be noticed.In the “two streams” condition the random tones are lowered in frequency. Now thelistener should hear two streams and the expectation is that a response will be elicitedwhen the tone switches from frequent to infrequent. Finally, because the two streamingconditions meant presenting the tones at a faster rate, they add a control condition thatjust uses the frequent and infrequent tones at a faster rate.

Page 12: Sound source segregation - UW Faculty Web Server

12

The results are shown on the right of the slide, where the averageresponses are shown. (Negative is up.) You'll notice that in theconditions in which a response is elicited in adults--all but the onestream condition--a response is also elicited in infants, although theresponse from infants has a different morphology.

These results suggest that newborn infants have at least some of theneural circuitry that will allow them to separate sounds on the basis offrequency.1.Winkler, I., E. Kushnerenko, J. Horvath, et al., Newborn infants canorganize the auditory world. Proc Nat Acad Sci USA, 2003. 100: p.11812-11815.

Page 13: Sound source segregation - UW Faculty Web Server

13

Auditory streaming in children

These studies of infants might convince us that infants can formauditory streams, but that is far from convincing us that they formstreams just as adults do.Sussman et al. (2007) looked at the development of auditory streamingby children between 5 and 11 years of age. Their stimuli are illustratedhere, where the y-axis is frequency and the x-axis is time. In thecondition illustrated in a, the x and o sounds are in the same frequencyregion and are expected to be heard as a single auditory stream (sort ofgalloping like the examples I played earlier). The stimulus in b shouldbe heard as two streams, one with rapid pairs of tones, the other withslower single tones, because now x and o are far apart in frequency.Sussman et al asked kids and adults to listen to sequences withdifferent frequency separations and to report whether they heard 1 or 2streams (after practicing with some very obvious examples).1.Sussman, E., R. Wong, J. Horvath, et al., The development of theperceptual organization of sound by frequency separation in 5ミ11-year-old children. Hear Res, 2007. 225: p. 117ミ127.

Page 14: Sound source segregation - UW Faculty Web Server

14

Auditory streaming in children

The results of this experiment are shown here. With the proportion oftimes the listener said “2 streams” plotted as a function of the frequencyseparation between the two sequences, expressed in semitones.(Semitones are equal logarithmic steps, with 12 semitones in anoctave.) The expected result is that as the frequency separationincreases, the proportion of “2 stream” responses will increase-- andthis is what you see for the adults (diamond symbols). The 9-11 yearold children don't say “2-streams” until the separation is a bit bigger,compared to adults; the youngest kids need a pretty big frequencyseparation to say “2 streams), and on average, they don’t say “2streams” more than 70% of the time even at a nearly 2 octaveseparation between the streams.[next slide} To try to verify this apparent age difference in auditorystreaming, Sussman et al did a a second experiment in which theyasked the listener to detect an intensity increment in the “x” tonestream. The hypothesis is that you can do this better if the tones are in2 streams rather than 1. This graph shows the proportion of times thatthe listener reported an intensity increment as a function of thefrequency separation between sequences.The “oddball” point isperformance when only the x sequence was presented. All the listenerscould do the oddball condition very well, although the younger kidsaren’t quite as good as the older listeners. Notice, though that theproportion of hits

Page 15: Sound source segregation - UW Faculty Web Server

15

Another way to look at auditorystreaming in children

increases as the frequency difference increases for everyone. The 9-11year olds need a little bit bigger frequency separation than the adults toachieve a certain level of performance. The younger kids still need apretty big frequency separation. Interestingly, the older listeners areonly getting about 80% hits at frequency separations where they nearlyalways reported 2 streams in the first experiment. Again, the youngestkids never get much above 70% hits even at big frequency separations.

On the face of it, this study suggests that sound source segregation isimmature in 5-11 year old children. It may well be, but there are anumber of limitations of this study. One problem is that the researchersused an “x” sound of 1046.5 Hz, and then increased the “o” sequenceabove that for the adults and 9-11-year-olds. But then they used a523.25 Hz “x” sound for the 5-8-ear olds, presumably so that the “o”sounds would be in the same frequency range for them. So it might bethat the younger kids don’t hear the difference in frequency as well asthe older listeners (and this is sort of consistent with the data on thedevelopment of frequency discrimination at low frequencies), and that’swhy they don’t hear the two streams. The task requires “vigilance”, orsustained attention that might be immature in the younger children aswell. The results from the first experiment suggest that the youngerkids were not paying attention a lot of the time. This may not be a goodtest of sound source segregation for them.

Page 16: Sound source segregation - UW Faculty Web Server

16

Electrophysiologicalmeasures ofstreaming inchildren

Electrophysiologicalmeasures of streaming inchildren

Sussman et al have also conducted electrophysiological studies ofauditory streaming in children, essentially the same experiment they didwith newborns in the earlier slide. They are looking at the ERP to anintensity change in one stream, when it is presented alone, when it ispresented with other “interfering” tones in the same frequency range,and when it is presented with interfering tones in a lower frequencyrange. The subjects were 7-10-years old. The grand average responsesin the different conditions are shown to the right, where the shadedarea represents a significant response; there’s a significant response inthe oddball and segregated condition, but not in the interferencecondition. In a second experiment, Sussman et al showed that the kidslabeled the segregated conditions as 2 streams and the interferencecondition as 1 stream nearly all of the time. (The frequency separationbetween streams was more than 2 octaves, and the interferers weremoved down in frequency to near 250 Hz, so the target stream wasprobably noticeable louder as well. Thus, the streams would have beenreally different compared to the more recent study.)The conclusion might be that the electrophysiological results areconsistent with the behavioral results, but they don’t’ really shed anylight on the relative maturity of auditory streaming in the school years.1.Sussman, E., R. Wong, J. Horvath, et al., The development of theperceptual organization of sound by frequency separation in 5ミ11-year-old children. Hear Res, 2007. 225: p. 117ミ127.

Page 17: Sound source segregation - UW Faculty Web Server

17

ConclusionInfants and children form “auditory

streams.”

So the conclusion is that infants and children form auditory streams, butit isn’t clear how mature their ability to do this is or when it reachesadultlevels. The Sussman et al. studies suggest that it might not be until wellinto the school years.

In any case, frequency differences are only one cue for sound sourcesegregation, and not the most important one, so it is worthwhile to lookat how well kids can use other acoustic cues to segregate sources.

Page 18: Sound source segregation - UW Faculty Web Server

18

Thresholds of sound, segregatedand not segregated Spatial cues Synchronized visual information

Page 19: Sound source segregation - UW Faculty Web Server

19

Masking level difference

The MLD is theimprovement inaudibility that resultsfrom dichotic listening

N= noise, S = signal

Monotic = one ear (m)

Diotic = 2 ears, same sound in both (0)

Dichotic = 2 ears, different sound ineach (π)

Modified from Gelfand (1998)

One phenomenon that illustrates the benefits of segregating sounds onthe basis of spatial cues is the masking level difference, or MLD. If atone is presented in noise, the threshold for the tone is the samewhether the tone and noise are presented to one ear or two, as long asthe same tone and noise are presented to the two ears. But if I presentsomething different to the two ears-- say the noise to both ears, but thetone to only one ear--the threshold improves. I can get an even biggerimprovement if I put the tone and noise in both ears, but flip the phaseof the noise (NπS0 or N0Sπ). This happens because the listenerlocalizes the noise to one intracranial location and the tone to anotherin the conditions in which thresholds improve.The masking level difference occurs in various conditions and it variesin size with condition. In MLD terminology, N stands for noise and Sstands for signal. A subscript “m” means that the sound is presented toone ear only (monotic). A subscript “0” (zero) means that the sound ispresented to both ears and that it is identical in the two ears (diotic). Asubscript “π” means that the sound is presented to both ears but thatthe sound in one ear is 180 degrees (π radians) out of phase with thesound in the other ear (dichotic). NoS∏ produces the largest MLDs.

Page 20: Sound source segregation - UW Faculty Web Server

20

MLD in infants

Nozza (1987) tested the MLD in infants between 6 and 11 months ofage and in adults. He used a 35 dB spectrum level noise band as themasker. The stimuli were presented with TDH-39 earphones. He foundthat the difference between N0S0 and N0Sπ conditions was 10.4 dB foradults, but only 5.6 dB for infants, suggesting that infants derive lessbenefit from being able to spatially separate the tone and noise.There are a couple of limitations of this study. One is that the infants’thresholds are awfully good compared to others in the literature (wemight expect 65 dB SPL rather than 55 dB SPL). The other is that tomaximize the MLD you need to match the ears in intensity, which ishard to do for infants listening through TDH-39 earphones.

1.Nozza, R.J., The binaural masking level difference in infants andadults: Developmental change in binaural hearing. Infant Behav Dev,1987. 10: p. 105-110.

Page 21: Sound source segregation - UW Faculty Web Server

21

MLD in children

It is a little easier to do this experiment with children, as Hall, Grose andtheir colleagues did in several studies. The results of their 1997 studyare shown here. Each graph is for a different masker bandwidth. Ineach graph,the shaded area is the normal range of the adult MLD inthat condition. The filled symbols are for younger kids, the unfilledsymbols are for older kids, and the MLDs are plotted as a function ofage.If you look at the rightmost panel,where the masker bandwidth was1000 Hz (pretty broadband), even the 5-6 year olds are in the adultrange. So while we’re not sure where the infants really stand or whenthe MLD is mature, we know it is before 6 years of age, at least forbroadband maskers. However, you’ll notice in these other graphs thatwhen you get to narrower bandwidth maskers, the younger kids start todrop out of the adult range. For a very narrow noise masker, even someof the older kids look immature. This suggests that the ability to usespatial information to segregate sounds may be influenced by othercharacteristics of the sounds involved during development, althoughadults seem to be able to cope with these other characteristics of soundbetter than kids do.

Page 22: Sound source segregation - UW Faculty Web Server

22

Spatial unmasking

Baseball

(noise)

(noise)

It is of some interest to look at sound source segregation using spatialcues in a somewhat more realistic situation, as some investigators havenow started to do. These folks have examined kids’ ability to identifywords presented in noise in sound field when the words and noisecome from the same speaker and when the noise and words come fromdifferent speakers.

Page 23: Sound source segregation - UW Faculty Web Server

23

Spatial unmasking in preschoolchildren

Garadat and Litovsky (2006) tested 3-4 and 4-5 year olds, comparingtheir threshold for identifying a word when both the noise and wordwere presented from the front speaker and when the word came fromthe front, but the noise came from the right speaker. What they foundwas that the younger kids had higher thresholds in general, but thatthey experienced the same amount of masking when the noise wasplayed (differences between quiet and ”front” or “right” conditions) andthe benefit of moving the noise from the front speaker to the rightspeaker was about the same for these two age groups, about 9-10 dB.

Page 24: Sound source segregation - UW Faculty Web Server

24

Spatial unmasking in school-agechildren

In this study, Litovsky compared 4-7 year olds to adults for differentsorts of maskers--speech or speech shaped noise. Litovsky found thatchildren got the same benefit as adults from moving the masker to adifferent speaker-- in this case it’s the amount of masking that is plotted,and everybody in just about every condition gets about 5 dB lessmasking when the masker is moved to the speaker on the right.So in these sort of realistic conditions, kids as young as 3 or 4 years oldseem to be able to use spatial cues to segregate sources.

1.Litovsky, R.Y., Speech intelligibility and spatial release from maskingin young children. J Acoust Soc Am, 2005. 117: p. 3091-9.

Page 25: Sound source segregation - UW Faculty Web Server

25

Preferential looking procedure

baseball, baseball, baseball

(“Twenty subjects were tested…”)

baseball, baseball, baseballpopcorn, popcorn, popcorn

People have not examined spatial unmasking in infants, but there aredata bearing on infants’ ability to separate speech from competingspeech. In this study Newman and Jusczyk used a visual fixationparadigm to see if infants would be able to hear a word presentedagainst ongoing speech

First the infant was just exposed to a word, repeated several times, butpresented from the same speaker is a recording of a guy reading fromthe methods section of a journal article (this was their idea of a boringprose passage). Then after the baby had listened to that for awhile, theresearchers measured whether the infant preferred to listen to thatfamiliar word or an unfamiliar word (one she hadn’t heard before). Aslong as the infant looks at the bulls eye she gets to hear the wordpresented, and when she looks away, the word is turned off.Sometimes, when she looks she hears the familiar word, sometimesshe hears the unfamiliar word. Then you measure how long she looksfor each of the words. If the baby looks more for the familiar word, thenwe conclude that she must have heard that word during thefamiliarization phase.

Page 26: Sound source segregation - UW Faculty Web Server

26

Speech in speech recognition ininfants

Newman and Jusczyk results are shown here, on the left when the wordwas original played at a 5-dB signal to noise ratio relative to the competingspeech, and on the right when the signal to noise ratio was 0 dB. What isplotted is the time the infants spent looking to hear the familiar word asopposed to the unfamiliar word. Notice that they looked longer to thefamiliar word at 5 dB S/N ratio, but not at 0 dB S/N ratio. (Presumablyadults would have been able to segregate the words at S/N ratios of 0 dBand even lower.)Well, this isn’t a surprise to use, because we know that infants have highermasked thresholds than adults. But remember that at this age we’re notreally sure why their thresholds are higher (at least you don’t at this point inthe course). So maybe the problem is sound source segregation asopposed to masking.

IN a subsequent experiment, Hollich et al. figured that if the problem wassound source segregation, then giving the infants some extra informationto help them segregate the word would allow them to recognize the familiarword at lower signal-to-noise ratios. What they did was to repeat theexperiment as before, but this time they had a video screen near thespeaker with an image that was synchronized with the word during thefamiliarization phase. In one group of infants, the image was of a womansaying the word; in another it was an “oscilloscope” like trace that wassynchronized with the word. They ran control conditions in which the facewas static and in which the face was talking but not synchronized with theword. The idea is that the synchronized image will help the infants toseparate out the word because they know when exactly it is occurring.

Page 27: Sound source segregation - UW Faculty Web Server

27

Visual information improves speechin speech recognition in infants

Hollich et al.’s results are shown here as the looking times to the targetword (formerly the familiar word) and to the nontarget word (formerlythe unfamiliar word). The signal to noise ratio was 0 dB. When thesynchronized face or the “oscilloscope like trace” was presented withthe word during familiarization, the infants now show a preference forthe target (familiar) word. This suggests that the problem is not masking(at least in adults this sort of information doesn’t reduce maskingmuch), but an issue of separation.So we think that infants can segregate sound sources, but they mayneed extra information (greater intensity differences or synchronizedvisual information) to help them do so.

1.Hollich, G., R.S. Newman, and P.W. Jusczyk, Infants' use ofsynchronized visual information to separate streams of speech. ChildDev, 2005. 76: p. 598-613.2.Newman, R.S. and P.W. Jusczyk, The cocktail party effect in infants.Percept Psychophys, 1996. 58: p. 1145-1156.

Page 28: Sound source segregation - UW Faculty Web Server

28

Testing whether children cansegregate speech from speech

Ready Baron go to Blue 3 nowReady Ringo go to Red 5 now

You might think that if infants can use visual information to help themsegregate sound sources, that children should be even better at it. Wedon’t really know how well children can do this in the conditions inwhich infants have been tested. The experiments that have been donewith children have used very difficult conditions.The task is called the Coordinate response measure (CRM). Thesentences in CRM are all of the form “Ready CALL SIGN go to COLORNUMBER now”. The call sign can be Baron, Ringo, Eagle, Tango orsomething like that. There are 4 colors and the numbers 1-8 that can bein the sentence . ot The listener’s task in CRM I always to listen to thetalker who says the call sign “Baron” and to report the color and numberthat he says. So for the xample above, the child should response Blue3. However at the same time as the target sentence is pbeing played adistractor sentence with a different call sign, color, and number isplayed. The target and distracter sentences are pretty much alignedtemporally and they are spoken by two different male talkers. This canbe a challenging task because the temporal alignment of the words inthe sentences make them hard to segregate and the difference in thefundamental frequencies of the two voices is not that great. Thesentences come from the same location-- in other words, they don’thave much in the way of cues that allow their segregation.In the study that Wightman and Kistler published in 2006, they askedwhether having a video of a person saying “ready baron go to blue 3now” or whatever the target sentence was, would help kids of differentages to do better on this task.

Page 29: Sound source segregation - UW Faculty Web Server

29

Visual information doesn’t improvespeech in speech recognition in children

Their results are shown here, separated for different age groups. Each graph show percent correcta a function of signal-to-noise ratio. The horizontal lines show what people could do just by lookingat the video, with dashed lines indicating 95% confidence intervals. The unfilled symbols are forthe audio-only conditions and the filled symbols are for the audio plus visual condition. If you lookat adults, over on the right, what you see is that in the audio only condition, people only get 40%correct at -20 dB S/N, but they gradually improve as the S/N increases, up to 100%. When they getthe audio plus visual, they get over 90% correct at all S/N-- this is probably because they appear tobe really good at speech reading (remember that there are only 32 possible responses--with 4colors and 8 numbers so it should be pretty easy). But once the S/N gets over about -10 dB, theadults start to be able to use the audio and video together to improve performance. The 12-17 yearolds are similar. They don’t do quite as well with the video only, and the S/N has to be maybe 5 dBbefore they start to use the audio and video together. The 9-12 years olds seem to be able toimprove their performance a little at low S/N, but by -5 dB S/N their audiovisual and audio onlycurves are right on top of each other-- it is as if they are depending only on the auditoryinformation. The 6-9 year olds perform that way at all S/N. They can’t really do the video only andthey don’t use the video with the audio to recognize the sentences. Wightman and Kistler alsonoticed that the kids who could use the video information in the audiovisual conditions the bestwere also the ones who did best in the video only condition. This makes it seem like the little kidsdon’t even know how to use visual informational and auditory information together at all.

Does this mean that kids don’t use visual information when they are talking to a real person in areal situation? Well we know that hearing-impaired kids can use visual information to improve theirspeech perception, but then they may be forced to do that more than normal hearing kids are. Thisissue is far from settled.

1.Wightman, F., D. Kistler, and D. Brungart, Informational masking of speech in children: Auditory-visual integration. J Acoust Soc Am, 2006. 119: p. 3940-3949.

Page 30: Sound source segregation - UW Faculty Web Server

30

Conclusions Infants and children are more sensitive to sounds

that can be segregated from competing sounds,although infants show less benefit of segregationcues.

Under simple conditions, even 3-year-olds canuse segregation cues as well as adults.

Under complex conditions, even 10-year-olds donot use segregation cues as well as adults.

Page 31: Sound source segregation - UW Faculty Web Server

31

Informational maskingInterval 1 Interval 2

Leve

l

Frequency

Trial

1

2

3

The last topic I want to cover related to sound source segregation isinformational masking.I reviewed the conditions under which informational masking mightoccur during the discussion last week. Recall that the listener is tryingto detect a target tone in the presence of a number of other tones, butthat these other tones are far enough away from the target in frequencythat they should cause energetic masking (I.e., the same auditory nervefibers aren’t responding to both the target and masker tones). However,every time the masker tones are presented, their frequencies arerandomly changed. So in a 2AFC task, the listener might hear thesound shown for trial 1, interval 1 with the target tone shown in red.Then in interval two the masker tones are different. And each time themasker tones are played they are different. The rather startling result ofthis experiment is that adults’ threshold for the target tone can beelevated by 30-50 dB. That threshold elevation is called informationalmasking. Not all adults experience informational masking. We don’tknow why there are individual differences and we don’t know any wayin which the hearing of people who show informational masking differsfrom that of people who don’t, at least not yet.

Informational masking may be related to sound source segregation. Webelieve this because if we manipulate the target or masker in ways thatwould promote sound source segregation, the amount of informationalmasking goes down. For example, if the target comes on a little timeafter the masker or if the masker is amplitude modulated but the targetis not or if we play the masker from a different spatial location,informational masking goes down.

Page 32: Sound source segregation - UW Faculty Web Server

32

Informational masking in children

Lutfi et al. examined informational masking in children of three ages(preschool 4-5 years; early school age 6-8, later school age 9+ years)and in adults. These curves show the amount of masking (maskedthreshold for target minus unmasked threshold for target) as a functionof the number of tones in the masker, ranging from 2 up to 2000. Yousee for the adults that the amount of masking increases with thenumber of masker components up to 30 components, and thendecreases-- of course when there are a lot of components, the maskersounds like a noise.The results for the kids follow a similar form: the amount of maskingincreases to 20-30 components and then decreases, but thepreschoolers show a lot more masking than the adults, the early-schoolaged kids show a little less than the preschoolers and the late schoolage kids are just a little bit worse than he adults.So by this account, children are more susceptible to informationalmasking that adults are-- and that is interpreted to mean that they aremore affected by variability in the masker than adults are.1.Lutfi, R.A., D.J. Kistler, E.L. Oh, et al., One factor underlies individualdifferences in auditory informational masking within and across agegroups. Percept Psychophys, 2003. 65: p. 396-4062.Oh, E.L., F. Wightman, and R.A. Lutfi, Children's detection of pure-tone signals with random multitone maskers. J Acoust Soc Am, 2001.109: p. 2888-2895.

Page 33: Sound source segregation - UW Faculty Web Server

33

Informational masking in infants

Leibold and Werner conducted a study of informational masking ininfants. They tested three conditions shown in these graphs. Eachgraph shows sound frequency over time in the experiment. In thebroadband noise condition, bursts of noise (shaded gray) were playedrepeatedly, and the infant learned to respond when a tone at 1000 Hz(black bar) was played with the noise. The condition that is like the oneswe’ve already talked about in informational masking studies is on thebottom. The masker is two tones (gray bars), and every time themasker is played, the frequencies of the two tones is different. Thebaby still learned to respond when the target 1000 Hz tone was played.This is referred to as the random frequency condition. The middle graphis for a condition with two tones in the masker, but the frequencies ofthe two tones is always the same. This is referred to as the fixed-frequency condition. So what you would expect for an adult is maskingwith the broadband noise--that’s energetic masking; masking for therandom frequency condition--that’s informational masking; and nomasking for the fixed frequency condition because there is no variabilityto cause informational masking and the tones are too far away to causeenergetic masking.

1.Leibold, L.J. and L.A. Werner, Effect of masker-frequency variabilityon the detection performance of infants and adults. J Acoust Soc Am,2006. 119: p. 3960-3970.

Page 34: Sound source segregation - UW Faculty Web Server

34

Information masking in infants

The results of this experiment are shown here. The adults’ results prettymuch follow prediction--masking with broadband noise; lots of maskingwith random frequency maskers, and not much masking with fixedfrequency maskers. The results for infants are different. They showmasking in all the conditions for one thing. When those two fixed-frequency tones are the masker, infants get nearly as much masking asthey do with a broadband noise. Of course, they show even moremasking when the tones’ frequencies change randomly. So we mightthink that this result agrees with that of Lutfi et al.-- the study with kids.The complication is the comparison between the fixed frequency andrandom frequency masker conditions. The difference between thesetwo conditions is the same for the adults and infants. Using fixedfrequency maskers causes much more masking for the infants than forthe adults. If we then randomly vary the frequencies of the tones in themasker, everybody gets worse by the same amount. So the problem forthe infants is separating a tone from other tones; making the othertones vary randomly has the same effect on the infants as on theadults. This fixed frequency condition has not generally been tested inother studies of informational masking.

Page 35: Sound source segregation - UW Faculty Web Server

35

Fixed, remote frequency maskingin children

Leibold and Neff have recently shown that 5-7 and 8-10 year oldchildren also exhibit masking by fixed frequency tones, and that thedifference between random and fixed masker thresholds is the same forchildren and for adults. So this problem of separating a tone from tonesseems to persist well into childhood.

1.Leibold, L. and D.L. Neff, Effects of masker-spectralvariability and masker fringes in children and adults. J Acoust Soc Am,2007. 121: p. 3666-3676.

Page 36: Sound source segregation - UW Faculty Web Server

36

Informational masking?Interval 2

Leve

l

Frequency

Trial

1

2

3

Interval 1

So we might conclude that infants and young children are moresusceptible to informational masking than adults, but that they getinformational masking under conditions that don’t create informationalmasking for adults.Of course, maybe what we are measuring with those fixed-frequencymaskers isn’t informational masking at all.One way to look at that is to see whether manipulations that reduceinformational masking (and make sound source segregation easier)reduce masking by fixed frequency tonal maskers. So far the answerseems to be “yes”,.

Page 37: Sound source segregation - UW Faculty Web Server

37

Using temporal cues to reduceinformational masking

Leibold and Neff also found that if you make the target tone a littleshorter than the masker and you start the masker a little before thetarget tone (I.e., you create a masker “fringe”), it has little effect on thethreshold for the target in a broadband noise, but it does reducethreshold for the tonal maskers for all age groups.(Notice that the adults also get some fixed frequency masking and arelease from masking with the fringe. So adults do get some masking inthese conditions, just not as much as the children do.)That this temporal cue reduces fixed-frequency masking suggests thatfixed-frequency masking is a sort of informational masking and that itmay represent a failure of sound source segregation.So again, it appears that sound source segregation may be presentearly in life, but that it takes a long time to reach adult status.

Page 38: Sound source segregation - UW Faculty Web Server

38

Summary and conclusions Infants and children can segregate sound

sources, using the same acoustic cues that adultsuse.

In simple situations, children, but not infants, cansegregate sound sources as well as adults.

In complex situations, sound source segregationmay not be mature until well into the school years.