Perception Psychophysics 203-213 Perceivingvowels from ... · Figure 1. Power spedra of examples of...

11
Perception & Psychophysics 1984,35 (3), 203-213 Perceiving vowels from uniform spectra: Phonetic exploration of an auditory aftereffect QUENTIN SUMMERFIELD, MARK HAGGARD, JOHN FOSTER, and STUART GRAY MRC Institute of Hearing Research, University of Nottingham, Nottingham, England A carefully spoken vowel can generally be identified from the pattern of peaks and valleys in the envelope of its short-term power spectrum, and such patterning is usually necessary for the identification of the vowel. The present experiments demonstrate that segments of sound with uniform spectra, devoid of peaks and valleys, can be identified reliably as vowels under certain circumstances. In Experiment 1, 1,000 msec of a segment whose spectrum contained peaks in place of valleys and vice versa (i.e., the complement of a vowel) preceded a 25-msec spectral amplitude transition, during which the valleys became filled, leading into a 250-msec segment with a uniform spectrum. The segment with the uniform spectrum was identified as the vowel whose complement had preceded it. Experiment 2 showed that this effect was eliminated if the duration of the complement was less than 150 msec, if more than 500 msec of silence separated the uniform spectrum from the complement, or if the uniform spectrum and the complement were presented to different ears. This third result and comparisons with parameters of auditory aftereffects obtained by others with nonspeech stimuli suggest that the effect is rooted in peripheral adaptation processes and that central processes responsible for selective attention and perceptual grouping play only a minor role at most. Experiment 3 demonstrated that valleys in the spectral structure of a complement need be only 2 dB deep to generate the effect. The effect should therefore serve to enhance changes in spectral structure in natural speech and to alleviate the consequences of uneven frequency responses in communication channels. In a pilot experiment, we established that wave- forms with uniform spectra, devoid of peaks and val- leys, can be identified reliably as different vowels under certain circumstances. We synthesized wave- forms whose spectra were complementary to those of rectangular approximations to the vowels Iii and /e/, These complementary spectra had peaks in place of valleys and vice versa. We created stimuli in which a 4OO-msec segment of waveform with a uniform flat spectrum was surrounded by two SOO-msec segments of one of the two vowelcomplements. Fifty-millisecond spectral amplitude transitions linked the two types of waveform. The segments with uniform spectra sounded like the vowels whose complements sur- rounded them. Taking a more realistic vowel spec- trum as a starting point, Figure I displays a progres- sion of power spectra computed at SO-msec intervals during a sequence in which the complement of the vowel lal surrounds a segment with a uniform spec- The basic phenomenon discussed in this paper was demon- strated by Stuart Gray before his tragic death in a road accident in December 1977. Some of the data reported here were presented to the lOlst Meeting of the Acoustical Society of America, Ottawa, Ontario, May 1981, and to the summer meeting of the Experi- mental Psychology Society, Oxford, July 1981. We thank Carol Jameson for assistance in scoring the data and Andrew Sidwell, Matthew McGrath, Christopher Darwin, and two reviewers for their comments on earlier versions of the paper. Quentin Summer- field's mailing address is: MRC Institute of Hearing Research, UniversityPark, Nottingham, N07 2RD, England. trum (lower panel) and during a sequence in which the vowel lal itself alternates with silence (upper panel). The middle segment of each stimulus sounds like "AH." At least three factors, discussed in detail below, could contribute to an effect of this type: (1) Lis- teners may use the spectral distribution of the first temporal derivative of local spectral amplitude di- rectly as a cue to the identity of an isolated vowel- the "spectral change" hypothesis. (2) Listeners may use common covariation in amplitude to group per- ceptually a set of harmonics, thereby separating them from unrelated changes in spectral amplitude in background noises-the "perceptual grouping" hypothesis. (3) Adaptation may occur during the presentation of the first segment of the complement, resulting in an enhanced auditory representation of spectral energy at the onset of uniform spectrum at frequencies corresponding to the valleys in the com- plement-the "adaptation" hypothesis. These hypotheses are interpretative and relate to different levels of auditory and cognitive processing. Thus, they need not be mutually exclusive, and the one to be emphasized may depend upon particular factors in the task under consideration and the pur- pose of the explanation adopted. Spectral Change Hypothesis Plots of rate of change of spectral amplitude by 203 Copyright 1984 Psychonomic Society, Inc.

Transcript of Perception Psychophysics 203-213 Perceivingvowels from ... · Figure 1. Power spedra of examples of...

  • Perception & Psychophysics1984,35 (3), 203-213

    Perceiving vowels from uniform spectra:Phonetic exploration of an auditory aftereffect

    QUENTIN SUMMERFIELD, MARK HAGGARD, JOHN FOSTER,and STUART GRAY

    MRC Institute ofHearing Research, University ofNottingham, Nottingham, England

    A carefully spoken vowel can generally be identified from the pattern of peaks and valleys inthe envelope of its short-term power spectrum, and such patterning is usually necessary for theidentification of the vowel. The present experiments demonstrate that segments of sound withuniform spectra, devoid of peaks and valleys, can be identified reliably as vowels under certaincircumstances. In Experiment 1, 1,000 msec of a segment whose spectrum contained peaks inplace of valleys and vice versa (i.e., the complement of a vowel) preceded a 25-msec spectralamplitude transition, during which the valleys became filled, leading into a 250-msec segmentwith a uniform spectrum. The segment with the uniform spectrum was identified as the vowelwhose complement had preceded it. Experiment 2 showed that this effect was eliminated if theduration of the complement was less than 150 msec, if more than 500 msec of silence separatedthe uniform spectrum from the complement, or if the uniform spectrum and the complementwere presented to different ears. This third result and comparisons with parameters of auditoryaftereffects obtained by others with nonspeech stimuli suggest that the effect is rooted inperipheral adaptation processes and that central processes responsible for selective attentionand perceptual grouping play only a minor role at most. Experiment 3 demonstrated thatvalleys in the spectral structure of a complement need be only 2 dB deep to generate the effect.The effect should therefore serve to enhance changes in spectral structure in natural speech andto alleviate the consequences of uneven frequency responses in communication channels.

    In a pilot experiment, we established that wave-forms with uniform spectra, devoid of peaks and val-leys, can be identified reliably as different vowelsunder certain circumstances. We synthesized wave-forms whose spectra were complementary to those ofrectangular approximations to the vowels Iii and/e/, These complementary spectra had peaks in placeof valleysand vice versa. We created stimuli in whicha 4OO-msec segment of waveform with a uniform flatspectrum was surrounded by two SOO-msec segmentsof one of the two vowelcomplements. Fifty-millisecondspectral amplitude transitions linked the two types ofwaveform. The segments with uniform spectrasounded like the vowels whose complements sur-rounded them. Taking a more realistic vowel spec-trum as a starting point, Figure I displays a progres-sion of power spectra computed at SO-msec intervalsduring a sequence in which the complement of thevowel lal surrounds a segment with a uniform spec-

    The basic phenomenon discussed in this paper was demon-strated by Stuart Gray before his tragic death in a road accident inDecember 1977.Some of the data reported here were presented tothe lOlst Meeting of the Acoustical Society of America, Ottawa,Ontario, May 1981, and to the summer meeting of the Experi-mental Psychology Society, Oxford, July 1981. We thank CarolJameson for assistance in scoring the data and Andrew Sidwell,Matthew McGrath, Christopher Darwin, and two reviewers fortheir comments on earlier versions of the paper. Quentin Summer-field's mailing address is: MRC Institute of Hearing Research,UniversityPark, Nottingham, N07 2RD, England.

    trum (lower panel) and during a sequence in whichthe vowel lal itself alternates with silence (upperpanel). The middle segment of each stimulus soundslike "AH."

    At least three factors, discussed in detail below,could contribute to an effect of this type: (1) Lis-teners may use the spectral distribution of the firsttemporal derivative of local spectral amplitude di-rectly as a cue to the identity of an isolated vowel-the "spectral change" hypothesis. (2) Listeners mayuse common covariation in amplitude to group per-ceptually a set of harmonics, thereby separating themfrom unrelated changes in spectral amplitude inbackground noises-the "perceptual grouping"hypothesis. (3) Adaptation may occur during thepresentation of the first segment of the complement,resulting in an enhanced auditory representation ofspectral energy at the onset of uniform spectrum atfrequencies corresponding to the valleys in the com-plement-the "adaptation" hypothesis.

    These hypotheses are interpretative and relate todifferent levels of auditory and cognitive processing.Thus, they need not be mutually exclusive, and theone to be emphasized may depend upon particularfactors in the task under consideration and the pur-pose of the explanation adopted.

    Spectral Change HypothesisPlots of rate of change of spectral amplitude by

    203 Copyright 1984 Psychonomic Society, Inc.

  • 204 SUMMERFIELD, HAGGARD, FOSTER, AND GRAY

    Frequency

  • PERCEIVING VOWELS FROM UNIFORM SPECTRA 205

    Figure 1. Power spedra of examples of the four basic types ofsegment Incorporated In the sdmull of Experiment 1. (a) The spec-trum of a Begmentwith a uniform spectrum. The amplitudes of the50 harmonics of 100 Hz decrease at (j dB per octave above 100 Hz.Apparent ripple results from the difference between the frequencysampling of the spectral analysis and the frequency spacing of theharmonics. (b) The spectrum of the vowel lui. Three groups ofthree harmonics approximately centered on the frequencies of theflnt three formants of lui have been retained from the segmentwith the uniform spectrum, and the amplitudes of the oudylngmemben of each trio have been reduced by 5 dB. (c) The spectralcomplement of lui. Compared with the segment with the uniformspectrum, the amplitudes of nine harmonics have been reduced bytheir amplitudes In the vowel lu/. (d) The spectrum of a segmentof the Indlsdnd vowel /u/. Compared with the uniform spectrum,the amplitudes of the three groups of three harmonics comprisingthe vowel /n/ have been raised by 10 dB (ondylng memben ofeach trio) and by 15 dB (central memben of each trio).

    with very similar formant frequencies in British and American En-glish. The form of /iI used occurs only in American English.Relative to their levels in the uniform spectrum, the levels of theoutlying members of each trio were attenuated by S dB to shapethese pseudoformants. Panel b of Figure 2 displays the powerspectrum of the vowel lu/. Table 1 lists the frequencies of theharmonics used to synthesize each of the five vowels. Intensitieshave been specified relative to a uniform spectrum, rolling off at6 dB per octave, with the level of the l00-Hz harmonic at SS dB.

    Five vowel-complement segments were defined by eliminatingfrom the uniform spectrum the nine harmonics that would be pres-ent in the corresponding vowel. Panel c of Figure 2 displays thepower spectrum of the complement of the vowel lui.

    Finally, five indistinct vowel segments were derived from theuniform spectrum by raising the levels of the nine harmonics com-prising each vowel by 10 dB (for the outlying members of eachtrio) and by IS dB (for the central member of each trio). Panel d ofFigure 2 displays the power spectrum of the indistinct vowel lu/.

    The stimuli used in the experiments reported below were derivedfrom these four basic types of segment. Stimuli were output at10,000 sampleslsec through 12-bit digital-to-analog converters(DEC PDP-ll/60, LPAllK), low-pass filtered at 42S0Hz (KEMOVBF/S, -96 dB/octave), and recorded on audio tape. In listeningtests, the tapes were played to listeners binaurally, except in Ex-periment 2c, through TDH-39 headphones with the level of thelOO-Hz harmonic at 74 dB SPL (B8tK Type 42S3 wide-band arti-ficial ear, Type 4134 microphone, and Type 2203 sound-level

    1-25 2·50 3·75 5·00frequency (kHz)

    Ib) VOWEL

    '-25 2·50 3-75 5-00frequency (kHz)

    Idl INOISTINCT VOWEL

    19 50:; 40

    ~ 3020

    !g 50-:. 40--ti 30

    20 -~

    ~ 10

    fi 0-

    '-25 2·50 3-75 500frequency (kHz)

    (Ci VOWEL COMPLEMENT

    1·25 2-50 3·75 5

  • 206 SUMMERFIELD, HAGGARD, FOSTER, AND GRAY

    Table 1Relative Intensities (dB) of the Harmonics (Hz) of 100 Hz Defining the Five Vowel Stimuli

    Vowels

    EE AH 00 OR ER

    Hz dB Hz dB Hz dB Hz dB Hz dB

    Formant: Fl200 44.00 600 34.49 200 44.00 400 38.00 400 38.00300 45.49 700 38.16 300 45.49 500 41.07 500 41.07400 38.00 800 32.00 400 38.00 600 34.49 600 34.49

    Formant: F22200 23.24 1000 30.07 800 32.00 700 33.16 1300 27.802300 27.86 1100 34.24 900 35.98 800 37.00 1400 32.162400 22.49 1200 28.49 1000 30.07 900 30.98 1500 26.56

    Formant: F3

    2900 20.85 2300 22.86 2100 23.65 2300 22.86 1600 26.003000 25.56 2400 27.49 2200 28.24 2400 27.49 1700 30.483100 20.27 2500 22.14 2300 22.86 2500 22.14 1800 24.98

    meter, linear weighting) (55 dBA). The level of the segment withthe umform spectrum was 78 dB SPL (Linear) (71 dBA).

    SubjectsA screening test was prepared by recording a randomized se-

    quence of test stimuli derived from the five vowel segments. Theduration of each stimulus was 300 msec, with rise and fall times of25 msec (linear in voltage and time). The stimuli were presented atthe rate of one every 4 sec. Eight subjects, graduate students andmembers of staff of the Institute of Hearing Research, listened tothe sequence. They were instructed to identify each stimulus as oneof the sounds "ee," "ah," "00," "or," and "er" by marking aresponse sheet. (In Southern British English, orthographic "or" ispronounced as a pure vowel with no Ir/. All of the listeners werefamiliar with the American English IiI.) Six of the eight subjectsscored over 95070 correct. The other two scored closer to 80070. Inthe three experiments reported below, data were gathered onlyfrom the former six. They ranged in age from 23 to 31 years andhad audiometrically normal hearing. They included two of theauthors.

    EXPERIMENT 1

    Experiment 1 was intended to establish first thatstimuli akin to those displayed in the lower panel ofFigure 1 can generate percepts of a sufficient range ofvowel sounds to allow accuracy of identification tobe used as the dependent measure in explorations ofother parameters of the effect. The second goal wasto determine whether perceptual sensitivity to spectral-amplitude changes, in the sense proposed by the spec-tral change hypothesis, operated symmetrically. Twotypes of test stimulus were created. In the first, thecomplement of one of the five different vowels pre-ceded a spectral amplitude transition leading into auniform spectrum. In the second, the order of thesegments was reversed; the uniform spectrum pre-ceded a spectral amplitude transition leading into avowel complement. If perceptual sensitivity to in-creases of spectral amplitude per se is the basis of theeffect, listeners would identify the uniform spectrumas a vowel only if the complement preceded it. How-

    ever, if a full representation of physical changes inderivative terms is extracted, the offset of the seg-ment with the uniform spectrum may sound like thevowel whose complementfol/ows it.

    One basic problem had to be overcome in this andsubsequent experiments. The complements of vowelshave a spectrum structure which, though unnatural,can be fairly vowel-like in some cases and can becategorized. For example, the complement of Iiisounds a little like I ai, and vice versa. Potentially,listeners might learn the correspondence between thesounds of the complements and the effects theysometimes produce when coupled with uniform spec-tra, and so bias their responses. To overcome thisproblem, on half the trials in this and subsequentexperiments, "foil" stimuli were presented. ForExperiment 1, two foils were created from each teststimulus by replacing the uniform spectrum by anindistinct vowel. The indistinct versions of the vowelswere used, rather than the vowels themselves, toavoid creating a class of "clear" stimuli in opposi-tion to an unclear class. In no case was a complementpaired with its own indistinct vowel. This procedureuncouples the association of particular complementswith particular identification responses. It allows theaccuracy with which foils are identified to be a cri-terion of the reliability of a listener's responses pro-vided the perceived identity of the foils is sufficientlyrobust not to be changed by the preceding comple-ment.

    MethodTwo types of test stimulus were synthesized along with ap-

    propriate foils. In the complement-to-uniform (CU) stimuli,1,000 msec of the complement of a vowel was followed by a 25-msec linear spectral amplitude transition leading into a 275-msecsegment with a uniform spectrum, as schematized in Figure 3.During the spectral amplitude transition, the intensities of the 50harmonics changed linearly in voltage and time from their valuesin the complement to their values in the uniform spectrum. The

  • PERCEIVINGVOWELS FROM UNIFORM SPECTRA 207

    Complement Trans- Uniform-ition

    - - - ~~t=-tl------"""omp""""litude-Ltime

    a -,.----'--------,

    .;:-iiic:

    '"oS - 50 -I I

    a 5Frequency (kl-iz]EJgure 3. Schematic representation of a complement-to-uniform (CU) stimulus from Experiment 1. Three seg-

    ments are demarcated in the waveform envelope. The valleys In the spectrum of the vowel complement are pro-gressively fUied during the spectral ampUtude transition, leading Into the uniform spectrum. To make the nature ofthe spectralampUtude transition dear, the durations of the three segmf!ntshave not been drawn to scale.

    rise time was 100 msec, and the fall time was 2S msec. One teststimulus of this type was created for each vowel complement. Twofoils were created from each test stimulus by following the vowelcomplement with a spectral amplitude transition leading into anindistinct vowel in the place of the uniform spectrum. In theuniform-to-complement (UC) stimuli, the order of the segmentswas reversed. Of the uniform spectrum, 27S msec was followed bya 2S-msec spectral amplitude transition leading into 1,000 msec ofa vowel complement. The rise time was 2S msec, and the fall timewas 100 msec. Ten foils were created by replacing the uniformspectra with indistinct vowels.

    Separate randomizations were recorded for the CU and UCstimuli. In each randomization, the five test stimuli each occurred10 times and the 10 foils each occurred five times. Trials occurredat S-sec intervals. Three randomizations of control stimuli werealso recorded. The first contained 10 instances of each of the five300-msecvowels used in the screening test. The second contained10 instances of analogous 300-msec stimuli derived from the in-distinct vowel segments. The third contained 10 instances of 300-msecstimuli derived from the vowelcomplements and 10 instancesof a 300-msecsegment of the uniform spectrum.

    The subjects listened first to the tape containing the vowel stim-uli, second to the indistinct vowels, and third to the complementsand uniform spectrum. They were instructed to identify each ofthe stimuli presented as one of the five sounds, "ee," "ah,""00," "or," and "er." The subjects then listened to the CU stim-

    uli. They were instructed to identify the onset of the second seg-ment (the uniform spectrum) in one of the same five categories.Finally, they listened to the UC stimuli and were instructed to iden-tify the offset of the first segment in one of the five categoriesignoring the quality of its onset.

    Results and DiscussionResults, averaged over the six subjects, are given in

    Table 2. The first row shows that these listeners wereby now 100070 accurate when identifying the fivevowel stimuli. Four of the six also identified the in-distinct vowels (row 2) without error, and one subjectwas 90% correct and the other 88% correct. Overall,these scores were high enough to establish that the in-distinct vowels would be effective as foils. Row 3 liststhe percentages of occasions on which the segmentwith the uniform spectrum was identified as thevowel whose complement preceded it in the CUcondition. Performance was above chance for eachof the vowels individually. The foils were identifiedaccurately (94%), suggesting that subjects followedthe instructions and attempted to identify the onset

    Table 2.Results of Experiment I: Percentages of Correct Identification Responses in Each Vowel Category for Four Types of Stimulus

    [Vowels, Indistinct Vowels, Complement-Uniform (CU), Uniform-Complement (UO) and the Absolute Percentage ofResponses in Each Category Made When the Uniform Spectrum was Presented in Isolation, Averaged Over Six Subjects

    Response Category

    Stimulus . EE AH 00 OR ER Mean Foils

    Vowels 1'00.0 100.0 100.0 100.0 100.0 100.0Indistinct Vowels 95.0 98.3 95.0 95.0 98.3 96.3CU 98.3 96.7 83.3 71.7 78.3 85.7 94.3UC 0.0 48.3 33.3 28.3 15.0 25.0 94.0Uniform 1.7 31.7 10.0 45.0 11.7

  • 208 SUMMERFIELD, HAGGARD, FOSTER, AND GRAY

    of the second segment, not basing their responses onconscious inference from the perceived identity of thecomplements. In the UC condition, in row 4, perfor-mance was poor, although the foils were again iden-tified accurately. It was not possible to identify theoffset of the uniform spectrum as the vowel whosecomplement followed it. (The two subjects who werealso authors listened several times to this tape butcould not perceive the predicted vowel sound fromthe spectral amplitude transition leading into thevowel complement.) The fifth row of Table 2 lists asa control the percentages of responses in each of thefive vowel categories made when the uniform spec-trum was presented in isolation. Comparison of rows4 and S suggests that listeners identified the uniformspectrum in Condition UC in essentially the sameway as they identified it in isolation.

    Experiment 1 demonstrates two things. First, thebasic phenomenon, in which a segment with a uni-form .spectrum is heard as the vowel whose comple-ment abuts it, has been shown to generate a range ofpredictable vowel percepts. However, vowels areheard only when the complement precedes the uni-form spectrum. The result offers no support for themost general formulation of the spectral change hy-pothesis, although it remains possible that onlypositive-going changes (i.e., those at the onset of theuniform spectrum) are perceptually relevant. Theperceptual grouping hypothesis could be formulatedto encompass the obtained time asymmetry, but atthe expense of parsimony, and the adaptation hy-pothesis predicts the result obtained.

    In the introduction, we noted that the three hy-potheses are not directly opposed. To explore therelevance of each in their respective domains, weadopted an indirect approach. Experiment 2 exploresthe consequences of manipulating three parameters:the duration of the vowel complement, the separa-tion of the complement and the uniform spectrum,and, in order to measure the extent of interauraltransfer of the effect, the ear to which complementand uniform spectrum were presented. The timeasymmetry of the present results using vowel identifi-cation and the general nature of the effects obtainedby Wilson (1970) and Viemeister (1980) with non-speech stimuli favor an account in terms of adapta-tion. However, a relatively natural task, such asvowel identification, could engage other, higher levelprocesses in addition to lower level effects of adapta-tion. If so, the present effects may either emerge insituations where the corresponding psychoacousticphenomena with nonspeech stimuli have not beenfound or may display aspects not explicable by adap-tation.

    EXPERIMENT 1a

    Wilson (1970) found that the aftereffect producedby a comb-filtered noise is reduced by about 800JD

    after O.S to I sec. The pattern of decay with time wasapproximately linear in log time and could thereforebe extrapolated to determine the putative duration atwhich it would have decayed completely. This dura-tion varied with the duration of the preceding comb-filtered noise and was about 0.7S sec following a0.2S-sec exposure, 1.2S sec following a l D-sec ex-posure, and 3.0 sec following a 4.0-sec exposure.Viemeister (1980) measured a more gradual rate ofdecay for the adaptation of masking produced by anincomplete harmonic complex. About 20% of the ef-fect remained 6.4 sec after exposure to a 2.4-secadaptor.

    In Experiment 2a, the duration of the interval be-tween the end of a 1.0-sec vowel complement and theonset of the uniform spectrum was varied. In onecondition, this interval was silent; in the second con-dition, it was occupied by a spectral amplitude transi-tion. We reasoned that, if a version of the spectralchange hypothesis held, there might be an optimalrate of spectral amplitude change for producing per-cepts of vowels-other than the notionally infiniterate achieved when vowel complements directlyabutted the uniform spectrum. There are generalgrounds for expecting such optima in rate-sensitivesystems (e.g., Moller, 1972). Maximum accuracy invowel identification might then occur at somegreater-than-zero separation of vowel complementand uniform spectrum. If not, or if the optimum isthe fastest rate of change achievable, then greatestaccuracy should occur when vowel complementsdirectly abut the uniform spectrum.

    MethodTwo types of stimuli were derived from the complement-to-

    uniform stimuli in Experiment I. In each, 1,000 msec of a vowelcomplement with a l

  • PERCENING VOWELS FROM UNIFORM SPECTRA 209

    vowels, Iii and lui, fell to SOOJo with approximately100 msec of silence, ranging across subjects from 20to S12 msec. Third, there was no evidence of an opti-mal duration for a linear spectral amplitude transi-tion. The maximum apparent in the mean data forlui with 32 msec of transition was evidenced by onlytwo individual subjects. Finally, performance at thelonger separations was better in the variable transi-tion condition.

    Overall, the results obtained in the variable-silencecondition suggest that the effect of a vowel comple-ment decays within, at most, SOO msec. This is ashorter decay time than the analogous durationsmeasured by Wilson (1970) and Viemeister (1980).There is no evidence here of cognitive processes ofselective attention or judgmental contrast sustainingthe effect beyond the duration expected from theadaptation component alone. It seems likely, there-fore, that performance was better in the variable-transition condition because some portion of thespectral amplitude transition acted as a continuationof the complement. In other words, although theintensity contrast would be reduced, it would beshifted nearer to the onset of the segment with theuniform spectrum. Below, in Experiment 3, this pos-sibility is explored by determining the minimalamount of spectral amplitude structure required in avowelcomplement to elicit the effect.

    MethodViemeister (1980) determined that the adaptation of masking

    produced by an incomplete harmonic series develops with '0 to100 msec of exposure and is essentially complete after 400 msec.Experiment 2b determined the duration of vowel complement re-quired to produce the present effect. Ten complement durationswere used, ranging from 2' to 2'0 msec in 2'-msec steps. The risetime of the complement was 2' msec. The duration of the spectralamplitude transition was 1 msec and the duration of the segmentwith the uniform spectrum was 27' msec with a 25-msec fall time.Complements of the three vowels Iii, 181, and lui were used.Two foils were constructed with each complement. A randomiza-tion was recorded in which each stimulus occurred eight times. Thesix subjects were instructed to identify the onset of the second seg-ment as one of the three vowel sounds "ee," "ah," and "00."

    EXPERIMENT 2b

    Results and DiscussionIn Figure S, the percentage of times that a segment

    with a uniform spectrum was identified as the vowelwhose complement preceded it has been plottedagainst the duration of the complement. Data foreach vowel complement are presented averaged overthe six subjects. The accuracy with which the foilswere identified was 97070. Performance with thecomplement of lui at long durations has been de-pressed by the performance of two of the subjects,who consistently failed to identify the following uni-form spectrum as "00." The other four subjects pro-duced consistent data and identified "ee" and "00"above SOOJo correct with between 7S and ISO msec of

    100

    . 80.....'-'

    t8 60

    .....c

    t! 40QJ0..

    20

    a

    transition

    100

    ..... 80'-'

    ~8 60

    .....CQJ

    40u'-'"0..

    20

    0

    have been plotted for each complement individually.In each condition, the foils were identified accurately(96070 in the variable-transition condition and 100070in the variable-silencecondition).

    There are four relevant aspects of the results. First,in general, performance was best when vowel com-plements and uniform spectra abutted one anotherwith no separation, and performance declined as theseparation was increased. Second, performanceappears to have been consistently good with the com-plement of the vowel la/. This probably occurred be-cause of an overall bias due to the uniform spectrumitself sounding like "ah" (or "or"), as the results ofExperiment 1 showed. Results obtained with theother two complements will be more informative ofperformance, therefore. In the variable-silence con-dition, accuracy of identification of the two other

    FllUft ... Results of Experiment 2a annaed over six subjects.Tbe pereentaae of occasions on wblcb a seament wltb a uniformspectrum wu "correcdy" Idendned u tbe vowel wbose comple-ment preceded It Is plotted aaalnst tbe dundou of tbe spectnlampUtude tnnsldon (upper panel) or tbe dundon of tbe sUentIn-te"al (lower panel) wblcb Dnked tbe vowel complement and tbeuniform spectrum. Results for three vowel complements aft dIs-played .pantely. Eacb point plots tbe results of 60 obse"adons.

  • 210 SUMMERFIELD, HAGGARD, FOSTER, AND GRAY

    100

    90

    80

    - 70uQIL- 60L-au- 50cQIU 40L-QIa.

    30

    20

    10

    0

    25 50 70 100 125 150 175 200 225 250duration of complement (ms)

    Fll1lft 5. Results of Experiment 1b avenled over six subjects.The percentale of occuloDl on whleh a seament with a nnlformspectrum wu "correedy" IdendOed u the vowel whose comple-ment preceded It Is plotted alalDlt the dundon of the comple-ment. Results for three vowel complements are displayed sepa·ntely. Each point plots the results of'" ohse"adoDl.

    complement. Acknowledging the different proceduresand criteria of correct performance, this range is simi-lar to the range of 50 to 100 msec established byViemeister (1980) as being necessary to produce adap-tation of masking. As in Experiment 2a, there is noevidence that the speech-identification task has al-lowed cognitive processes to extend the effect beyondthe temporal limits expected from its adaptationcomponent.

    the indistinct vowels provided a set of clearer in-stances of the vowels than did the uniform spectrafollowing vowel complements. They might, there-fore, distinguish the foils from the test stimuli andthereby defeat the object of including the foils. Ac-cordingly, a new set of foils was constructed in whichthe vowel percept was slightly more ambiguous andless easily distinguished from the percepts producedby the test stimuli. In this new set of vowels, thelevels of the three harmonics centered on the fre-quencies of each vowel formant were reduced by 5 dBfrom their levelsin the original indistinct vowels.

    A randomization was recorded in which each stim-ulus and each foil occurred five times. The six sub-jects were instructed to identify the second segmentin each presentation as one of the five vowel sounds"ah," "ee," "00," "or," and "er."

    Results and DiscussionTable 3 contains the percentages of times that a

    uniform spectrum was identified as the vowel whosecomplement preceded it. Results have been collapsedover the five complements, but are reported for eachsubject individually. Overall, performance was closeto 90070 in the monotic conditions and at chance inthe dichotic conditions. There is no evidence of anyinteraural transfer. The foils were identified slightlymore accurately when presented dichotically thanwhen presented monotically; this suggests that thepoor performance with the dichotic test stimuli didnot result from switching attention between the earsor other task factors. It is more likely that the after-

    . effect of the vowel complements has depressed per-formance with the foils in the monotic conditions.

    The absence of interaural transfer of the presenteffect is consistent with Viemeister and Bacon's(1981) informal report. Although we acknowledge

    EXPERIMENT 2c

    Note-Data have been averaged over five vowel complementsand ears.

    Table 3Results of Experiment k for Six Individual Subjects: Percentalesof oeculoDl on Wblch (1) a Seament With a Uniform Spectrumwu IdendOed u the Vowel Whose Complement bad PrecededIt (Uniform Spectra) and (1) anlndlsdnct Vowel Followlnla

    Vowel Complement wuldendOed Correctly (FoUs), forlpallatenl and Contralateral PresentadoDl of

    Complements and Test Seaments

    Ipsilateral Contralateral Ipsilateral Contralateral

    Test Segment

    889894708884

    87.0

    Foils

    629274566468

    69.3

    222022201618

    19.7

    Uniform Spectra

    92989470

    10084

    89.7

    Q.S.J.F.M.R.M.F.P.B.S.R.Mean

    Subject

    Viemeister and Bacon (1981) suggested that theiraftereffect does not result from central processes be-cause it does not transfer between the ears, althoughthey presented no formal data on this point. In Ex-periment 2c, 995 msec of the complement of one ofthe five vowels Iii, lal, lui, hi, and IrI preceded295 msec of the uniform spectrum. The complementcould occur in either the left or the right ear; the uni-form spectrum could occur in either the ipsilateral orthe contralateral ear. When both were presented tothe same ear, they were linked by a S-msec spectralamplitude transition. When they were presented toopposite ears, the fall time of the complement andthe rise time of the uniform spectrum overlapped andwere each 5 msec. In all conditions, the rise time ofthe complement was 100msec and the fall time of theuniform spectrum was 25 msec.

    We were concerned that the subjects, who were be-coming wellpracticed by now, might realize that even

  • PERCEIVING VOWELS FROM UNIFORM SPECTRA 211

    that the lack of interaural transfer does not positivelyconfirm any particular peripheral site, it is more par-simonious to conclude that the present effect resultsfrom adaptation than to attempt to rescue the per-ceptual grouping hypothesis by presuming that theunderlying mechanisms are monaural.

    EXPERIMENT 3

    In Experiment 2a, uniform spectra were identifiedmore reliably when connected to vowel complementsby spectral amplitude transitions than when sepa-rated by silence. Presumably, the spectral ampli-tude transition extended the effective duration, andhence extended the proximity to the uniform spec-trum, of a complementary adaptor. That conclu-sion prompts the question of how much spectral am-plitude structure is required in a vowel complementto produce an effect, or perhaps more precisely howmuch temporal amplitude contrast with the uniformspectrum is required around the nominal formantfrequencies.

    MethodFor Experiment 3a, five new versions of the complement of each

    ofthe vowels Iii, lal, and lui, in which the depth of the spectralvalleys was progressively reduced, were defined. To do this, the logpower of each of the nine harmonics in the vowel spectra was re-duced by 7.5070,80070, 8.5070, 90070, and 9.5070 from the values listed inTable I. The five new complements were defined by subtractingthese new levels from the levels of the corresponding harmonics inthe uniform spectrum. We shall describe the new vowel comple-ments as entailing .5070, 10070, 1.5070, 20070, or 2.5070 of the spectralamplitude structure of the original vowel complements. Test stim-uli were synthesized in which 1,000 msec of one of the 1.5 newvowel complements was followed by a 2.5-msec spectral amplitudetransition leading into a uniform spectrum. Because there was nowconsiderable variation in the sound of the complements, it was notnecessary to synthesize foils. A randomization of 1.50 items wasrecorded in which each of the 1.5 test stimuli occurred 10 times.

    For Experiment 3b, an analogous set of vowel complements wasdefined in which the depth of spectral amplitude structure rangedfrom 1070 to 9070 in 1070 steps. Twenty-seven test stimuli were syn-thesized, and a randomization of 270 items was recorded, in whicheach of the test stimuli occurred 10 times. Thus, in the two experi-ments, spectral structure contrasts ranging from about 1 dB up to20 dB were examined.

    The six subjects listened to the tape recorded for Experiment 3aand then to the tape for Experiment 3b. In each case, they were in-structed to identify the onset of the second segment as one of thesounds "ee," "ah," and "00."

    Results andDiscussionThe results of Experiment 3a are displayed in the

    left-hand panel of Figure 6. The percentage of oc-casions on which the uniform spectrum was identi-fied as the vowel whose complement preceded it hasbeen plotted against the amount of spectral ampli-tude structure in the complement for each of the in-dividual complements. Three of the six subjects per-formed above chance, even on the restricted task ofidentifying "00" and "ee," with all of the modula-tion depths examined. The performance of the other

    three fell to chance only with the smallest amount of5070. The right-hand panel of Figure 6 presents theresults of Experiment 3b. Chance performance onthe restricted task of identifying "ee" and "00" oc-curred at 1.0%,2.1%,4.2%,6.3%,4.4%, and 8.0%,for the six individual listeners. The average is 4.3%.The averagedifference in the level of the nine formant-related harmonics between a complement with 4.3%spectral structure and the uniform spectrum wouldbe 1.6 dB, ranging from 2.2 dB (300 Hz) to 1.1 dB(2100Hz).

    The minimal spectral structure required to gen-erate the effect is perhaps the most interesting of thepresent results. Possibly, different degrees of adapta-tion are produced by spectral components differingin amplitude by 2 dB around a mean level of SO dB.Alternatively, if we accept Viemeister and Bacon's(1981) suggestion that the basis of the class of audi-tory aftereffect examined here is the adaptation ofsuppression, then suppression must enhance suchspectral amplitude variations sufficiently to meanthat when suppression adapts and the uniform spec-trum is presented, appreciable peaks in the internalauditory spectrum result.

    GENERAL DISCUSSION

    The three groups of experiments described heredemonstrate that if a segment of a waveform whosespectrum is complementary to that of a vowel (aspectrum with peaks in place of valleys and viceversa) precedes a segment with a uniform spectrumdevoid of peaks and valleys, then the uniform spec-trum, particularly at its onset, will sound like thevowel whose complement has preceded it. The effectdoes not occur if the complement follows the uni-form spectrum. Approximately 125 msec of the com-plement of a vowelmust precede a uniform spectrumfor the effect to emerge, and the effect is eliminatedif more than about 500 msec of silence intervene be-tween the complement and the uniform spectrum, orif the complement is presented to one ear and the uni-form spectrum to the other ear.

    LikelyBasesfor theEffectIn terms of the parameters so far explored, the

    present effect is similar to two effects studied pre-viously using nonspeech sounds. One is Wilson's(1970) demonstration that a broad-band noise pre-sented after exposure to a sinusoidally comb-filterednoise is perceived as possessing a complementaryspectral ripple. The other is the demonstration that ifa sinusoidal component missing from a harmonicseries is reintroduced, then that component is heardclearly against the background of the preexistingharmonics (e.g., Viemeister & Bacon, 1981). Super-ficially, it appears that each of these results can beexplained by suggesting that the auditory response

  • 212 SUMMERFIELD, HAGGARD, FOSTER, AND GRAY

    100

    90

    80

    70...0~ 60L..00... 50cQ)0 40L..Q)a.

    30

    20

    10

    0

    25 :20 15 10 5 9 8 7 6 5 4 3 2percent of full complement

    Figure 6. Results of Experiments 3a and 3b averaged over six subjects. The percentage of eeea-slons on which a segment with a uniform spectrum was Judged "correctly" as the vowel whose com-plement preceded It Is plotted against the amount of spectral structure In the complement. Resultsfor complements derived from three vowels are displayed separately. Each point plots the results of60 observations.

    diminishes through adaptation at frequencies wherethere is appreciable energy during the presentation ofthe adaptor, be it a vowel complement, a comb-filtered noise, or an incomplete harmonic series.When the subsequent segment-a uniform spectrum,a broad-band noise, or a complete harmonic series-is presented, energy at preexisting frequencies achievesa diminished auditory representation relative toenergy at new frequencies. The latter stands out,therefore, at least at its onset.

    Comparison of the present results with temporalparameters of neural adaptation supports this ex-planation. At moderate-to-high stimulus levels, thephysiological response to the onset of a tone, as re-flected in the rate of discharge in the majority ofauditory nerve fibers in guinea pig, shows an initiallyhigh rate of up to 800 spikes/sec for 1-2 msec, whichfalls rapidly during the next S-lO msec and asymp-totes at a rate between ISO and 200 spikes/sec (e.g.,Yates & Robertson, 1980), but the recovery follow-ing the cessation of stimulation is swift. Recovery iscomplete ISO msec after the cessation of a tone burst.Comparably, about 100 msec of silence eliminatedthe present effect for four of the six listeners in Ex-periment 2a.

    However, although perhaps a necessary part of anexplanation, simple adaptation may not provide asufficient explanation for the phenomenon. For twolisteners, the aftereffect was sustained for up to

    SOO msec in Experiment 2a, and analogous effectshave been found to last for rather longer, between 3and 6 sec, by Wilson (1970) and Viemeister (1980).This difference in time course suggests that factorsother than simple adaptation may playa role.

    A possible factor was suggested by Viemeister andBacon (1981). They demonstrated that the amount offorward masking produced by a reintroduced har-monic is greater than the amount produced by thecorresponding member of a complete harmonicseries. This finding suggests that the effect does notoccur because of a reduction in the auditory re-sponse to components present in the adapting sound,but rather that the response to the reintroduced com-ponent is enhanced. As an explanation, Viemeisterand Bacon suggested that suppression might adaptduring the presentation of an incomplete harmonicseries. That is, the effectiveness of processes thattend to enhance differences in level in adjacent spec-tral regions may diminish with exposure to a con-stant stimulus. As a result, new components are notsuppressed relative to their preexisting neighbors, al-though they will act as suppressors themselves, andso achieve a more potent auditory representationthan do old components and a more potent represen-tation than they would achieve if suppression hadnot adapted. While no aspect of the present results isincompatible with this hypothesis, they should not beconstrued as providing a direct test of it. At any rate,

  • PERCEIVING VOWELS FROM UNIFORM SPECTRA 213

    they do tend to confirm that an explanation in termsof peripheral processes is required. Use of a speech-identification task might have been supposed to en-gage processes underlying perceptual grouping andselective attention that could enhance peripheral ef-fects of adaptation. But the similarity of the timecourse of effects measured here in Experiment 2 tothe analogous effects obtained with nonspeech stim-uli by Wilson and by Viemeister, and the absence ofan effect in the contralateral ear in Experiment 3,suggests that central effects, if present at all, playaminor role.

    One aspect of the present results is difficult tocompare with psychoacoustical results because cor-responding conditions have not been explored. Ex-periment 3 demonstrated that, on average, listenerscan identify uniform spectra as the vowels whosecomplements have preceded them with average in-crease in the levels of the nine harmonics as small as2 dB. This" result and Viemeister and Bacon's ex-planatory hypothesis deserve further study.

    Potential Roles for the Present EffectProcesses of rapid adaptation tend to enhance on-

    sets selectively, while processes of suppression areusually regarded as preserving and possibly enhanc-ing differences in level in adjacent spectral regions inthe auditory representation of complex spectra (e.g.,Houtgast, 1974; Moore & Glasberg, 1983). The pres-ent results emphasize two other potential roles.Either simple adaptation or the adaptation of sup-pression could serve to enhance changes in spectralamplitude when energy occurs in spectral regionswhere immediately previously there was less energy.In speech, therefore, changes in the spectral dis-tribution of energy would be generally enhanced.This process could play a specific role in the per-ception of syllable-final formant transitions. At thebeginning of the transitions, changes in frequencywill make formants contrast with what were valleysin the preceding vowel. At the end of the transitions,overall intensity will tend to be low and the absenceof any offset effect in Experiment 1 means that theirdiscriminability will not be enhanced by the mech-anism enhancing spectral representation of onsets.Any contrastive enhancement of the former type willtherefore be valuable.

    More generally, from the ecological point of view,the phenomenon observed here may reduce the dele-terious consequences of poor acoustical environ-ments and low-fidelity communication channels.Being selective for change, preprocessing of the typeachieved by simple adaptation, or the adaptation ofsuppression, tends to reduce the effect of static orstatistical long-term spectral peaks and valleys upon

    the internal auditory representation, while enhancingthe effect of the time-varying information that is ofcommunicative or other biological significance. Re-sponsiveness to change may also assist the short-termprocess of extracting speech signals from backgroundnoises. Where a noise has a relatively static RMSamplitude, the envelope variation due to the speechat the informative low modulation frequencies will begreater than that due to the noise. The change-sensitive mechanism will then enhance the effectivesignal-to-noise ratio, especially since, in speech, theinformative modulation patterns in different criticalbands are correlated. (See Hall et al., in press.) Theease with which we have tapped into this psycho-acoustic phenomenon using a linguistic response sug-gests that central auditory analysis takes specific ad-vantage of this apsect of peripheral preprocessing.

    REFERENCES

    DANNENBRING, G. L., & BREGMAN, A. S. (1978). Steaming vs.fusion of sinusoidal components of complex tones. Perception& Psychophysics, 14, 369-376.

    DARWIN, C. J. (1983). Auditory processing and speech perception.In H. Bouma & D. G. Bouwhuis (Eds.), Attention and per-formance X. Hillsdale, N.J: Erlbaum.

    HALL, J. W., HAGGARD, M. P., & FERNANDES, M. A. (in press).Detection in noise by spectre-temporal pattern analysis. Journalofthe Acoustical Society ofAmerica.

    HOUTGAST, T. (1974). Lateral suppression in hearing: A psycho-physical study of the ear's capability to preserve and enhancespectral contrasts. Soesterberg, The Netherlands: Institute forPerception TNO.

    KOFFKA, K. (1922). Perception: An introduction to Gestalt Theorie.Psychological Bulletin, II', 551-585.

    MARK,D. (1982). Vision. San Francisco, CA: Freeman.MOLLER, A. (1972). Coding of amplitude and frequency modulated

    sounds in the cochlear nucleus of the rat. Acta PhysiologicaScandinavica, 16, 223-238.

    MOORE, B. C. J., & GLASBERG, B. R. (1983). Masking pat-terns for synthetic vowels in simultaneous and forward masking.Journal ofthe AcousticalSociety ofAmerica, 73, 906-917.

    VIEMEISTER, N. F. (1980). Adaptation of masking. In G.van den Brink & F. A. Bilsen (Eds.), Psychophysical, physio-logical, and behavioural studies in hearing (pp, 190-198). Delft:Delft University Press.

    VIEMEISTER, N. F., & BACON, S. P. (1981). Forward masking byenhanced components in harmonic complexes. Journal of theAcoustical Society ofAmerica, 71,1502-1507.

    WILSON, J. P. (1970). An auditory after image. In R. Plomp &G. F. Smoorenburg (Eds.), Frequency analysis and periodicitydetection in hearing, (pp, 303-315). Leiden: A. W. Sijthoff.

    YATES, G. K., & RoBERTSON, D. (1980). Very rapid adaptationin auditory ganglion cells. In G. van den Brink & F. A. Bilsen(Eds.), Psychophysical, physiological, and behavioural studies inhearing. Delft: Delft University Press.

    ZWICKER, E. (1964). "Negative afterimage" in hearing. Journalofthe Acoustical Society ofAmerica, 36, 2413-2415.

    (Manuscript received October 17,1983;revision accepted for publication December 22, 1983.)