1-s2.0-S0892199705001189-main

10

Click here to load reader

description

1-s2.0-S0892199705001189-main

Transcript of 1-s2.0-S0892199705001189-main

Page 1: 1-s2.0-S0892199705001189-main

Effects of Vocal Training and Phonatory Taskon Voice Onset Time

*Christopher R. McCrea and †Richard J. Morris

*Johnson City, Tennessee, and †Tallahassee, Florida

Summary: Objectives/Hypothesis: The purpose of this study was to exam-ine the temporal-acoustic differences between trained singers and nonsingersduring speech and singing tasks. Methods: Thirty male participants were sep-arated into two groups of 15 according to level of vocal training (ie, trained oruntrained). The participants spoke and sang carrier phrases containing En-glish voiced and voiceless bilabial stops, and voice onset time (VOT) wasmeasured for the stop consonant productions. Results: Mixed analyses of var-iance revealed a significant main effect between speech and singing for /p/and /b/, with VOT durations longer during speech than singing for /p/, andthe opposite true for /b/. Furthermore, a significant phonatory task by vocaltraining interaction was observed for /p/ productions. Conclusions: The re-sults indicated that the type of phonatory task influences VOT and that theseinfluences are most obvious in trained singers secondary to the articulatoryand phonatory adjustments learned during vocal training.

Key Words: Voice onset time—Phonatory task—Vocal training—Genderdifference.

INTRODUCTION

It has been suggested that trained singers are per-ceived to sing better than nonsingers becausetrained singers learn to perform a variety of

Accepted for publication May 18, 2005.Presented at the 33rd Annual Symposium: Care of the Pro-

fessional Voice, June 2–6, 2004, Philadelphia, Pennsylvania.From the *Department of Communicative Disorders, East

Tennessee State University, Johnson, City, Tennessee; and†Florida State University, Tallahassee, Florida.

Supported by a Dissertation Research Grant funded by theCongress of Graduate Students, the Provost’s Office, and theOffice of Research, Florida State University.

Address correspondence and reprint requests to ChristopherR. McCrea, Department of Communicative Disorders, EastTennessee State University, Box 70643, Johnson City, TN37614-1702. E-mail: [email protected]

Journal of Voice, Vol. 21, No. 1, pp. 54–630892-1997/$32.00� 2007 The Voice Foundationdoi:10.1016/j.jvoice.2005.05.002

54

phonatory1–3 and articulatory/resonatory4–10 adjust-ments during singing that nonsingers do not. Al-though these articulatory and phonatory differencesallow listeners to perceptually distinguish the twogroups during singing, the acoustic cues that helplisteners to perceptually separate trained singersand nonsingers have not been clearly identified.

Over the last 35 years, voice researchers have at-tempted to correlate the phonatory and articulatorymovements of trained singers with changes in theacoustic voice signal.7–14 For example, Lindblomand Sundberg7–10,12 correlated vocal tract adjust-ments with changes in the acoustic signal throughexamination of long-term average spectra (LTAS),lateral x-ray pictures, and mathematical models ofvocal tract function. Sundberg7 reported that in-creases in the width of the pyriform sinuses andlaryngeal ventricle resulted in an increase of energybetween 2500 Hz and 3000 Hz or in the singer’sformant in male singers. The singer’s formant was

Page 2: 1-s2.0-S0892199705001189-main

55EFFECTS OF VOCAL TRAINING

associated with a perceptual vocal ‘‘ring’’ or‘‘brightness,’’15–17 and it has been used as a qualifierof ‘‘good quality’’ singing.18,19

Whereas the examination of LTAS for the pres-ence of the singer’s formant has provided a meansthat can frequently differentiate between trainedsingers and nonsingers, other acoustic measureshave shown less promise. Brown et al11 comparedthe speech and singing productions of America theBeautiful for 20 trained singers and 20 nonsingers.In addition to perceptual judgments, a series ofacoustic measures was conducted in an effort toacoustically distinguish the trained singers and non-singers. The acoustic measures included standard de-viation of fundamental frequency, jitter, shimmer,noise-to-harmonics ratio, and a series of durationmeasures, including sentence, word, and syllable du-ration, as well as consonant-to-vowel duration ratiofor individual words. Only male standard deviationof fundamental frequency and male perturbationmeasures during speech displayed significant trainedsinger and nonsinger differences. None of the acous-tic duration measures displayed significant differen-ces between the trained singers and nonsingers.

Despite Brown et al11 not finding any speech du-ration differences between trained singers and non-singers, Rothman et al13 reported significantdifferences in word length and alveolar stop closureduration among perceptually identified singers, per-ceptually unidentified singers, and nonsingers. Na-ı̈ve listeners correctly identified 5 of 20 trainedsingers based on standard passage readings froma total of 20 singers and 20 age-matched nonsing-ers. The speech samples of the five correctly iden-tified trained singers were acoustically analyzed formean speaking fundamental frequency, sentenceduration, word duration, and consonant-to-vowelduration ratio. It was reported that the perceptuallyidentified trained singers displayed significantlylonger word durations for the word ‘‘white’’ thanthe nonsingers or unidentified singers. In addition,the perceptually identified singers displayed signif-icantly shorter stop closure durations for /t/ takenfrom the word ‘‘white’’ than the unidentified sing-ers or the nonsingers. Although similar word andstop closure duration differences were not observedacross other words or stops, the results indicatedthat there may be temporal-acoustic differences

between trained singers and nonsingers duringspeech.

Whereas the previous study focused on conso-nant articulation during speech, the importance ofconsonant articulation during singing should notbe overlooked. Vennard17 discussed the importanceof articulation in singing and reported that conso-nants are an important aspect of lyrical singing,by which the linguistic meaning of a song is ex-pressed. However, Vennard17 conceded that trainedsingers may not be up to this task during vocallychallenging musical pieces and states that, ‘‘.fre-quently upon such high notes or in such florid workgood pronunciation suffers.’’ Titze20 further notedthat speech intelligibility is sometimes compro-mised in lieu of musical phrasing. Unfortunately,data are lacking with regard to the accuracy of spe-cific consonant articulation during a singing task.

The task of examining the articulation of conso-nants during speech and singing can be accom-plished through temporal-acoustic measures.Voice onset time (VOT) has been established asan important acoustic measure used to distinguishvoiced from voiceless stop consonants across a vari-ety of languages.21,22 VOT is defined as the intervalbetween the release of an oral constriction of a stopconsonant and the start of vocal fold vibration forthe following vowel.21 When examining VOT, itis important to realize that three VOT value rangesmay be observed, including negative VOT, zeroVOT, or positive VOT. Negative VOT scores repre-sent vocal fold vibration before the release of theoral constriction and are associated with the term‘‘pre-voicing.’’ Zero VOT represents the initiationof vocal fold vibration simultaneous to the releaseof the oral constriction. Finally, positive VOT is as-sociated with the onset of vocal fold vibration afterthe release of the oral constriction. In English, allthree of these VOT ranges can be observed invoiced stops, but only positive VOTs are observedin voiceless stops. The effectiveness of VOT asa tool to help researchers and clinicians distinguishstop consonants according to voicing and place ofarticulation has been thoroughly examined.21–27

Given that VOT is an effective indicator of subtlearticulatory–phonatory interaction differences inspeech production, it may prove an effective mea-sure for acoustically representing previously

Journal of Voice, Vol. 21, No. 1, 2007

Page 3: 1-s2.0-S0892199705001189-main

56 CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

reported physiologic differences between trainedsingers and nonsingers.1–10 As such, it may be a use-ful, noninvasive method for documenting the artic-ulatory–phonatory aspects of vocal training duringboth speech and singing.

McCrea and Morris28 examined VOT for voicedand voiceless stop consonants produced by 10 malesubjects, 5 trained singers and 5 nonsingers, duringspeech and singing tasks. They reported significantlylonger VOTs for the trained singers when comparedwith the nonsingers for both singing and speaking. Itwas suggested that the relatively long VOTs of thetrained singers resulted from increased lingual con-trol and lingual pressure. Their results indicatedthat trained singers and nonsingers display differentacoustic-temporal patterns during speech and sing-ing. However, the results from McCrea and Morris28

were based on only 10 male participants. A largersample size is required to generalize any findings.Furthermore, McCrea and Morris28 noted that in aneffort to control for rate, frequency, and intensityof production, the singing sample may not havebeen natural. More research is needed in which trulysung productions are measured to more clearly deter-mine the effects of phonatory task (ie, speech versussinging) on VOT.

The first purpose of this study was to determinewhether trained singers and nonsingers display dif-ferent VOTs during speaking and/or singing. It washypothesized that the trained singers would displaysignificantly longer VOTs for bilabial stops than thenonsingers, regardless of task. The second purposeof this study was to examine whether the phonatorytask (speech vs singing) significantly affected VOT.It was hypothesized that the mean VOTs for thesung productions would be longer than those forthe spoken productions.

METHODS

ParticipantsThe participants for this study included 30 men.

The participants were divided into groups of 15trained singers and 15 nonsingers. The male trainedsingers ranged in age from 21 to 35 years old(mean, 27.8 years), and they reported receiving anaverage of 9.37 years of private voice lessons.The male nonsingers ranged in age from 21 to 24

Journal of Voice, Vol. 21, No. 1, 2007

years old (mean, 22 years). All nonsingers reportedthat they had not received any form of vocal train-ing since elementary school, including instructionreceived from singing in a band, middle school,high school, college, or church choir. All partici-pants were (1) nonsmokers, (2) between the agesof 21 and 35, (3) first-language General AmericanEnglish speakers, and (4) reported no history ofneurological, vascular, or sensory-motor impair-ment, which would affect articulation, phonation,and/or respiration.

EquipmentAll recordings occurred in a double-walled

sound-treated booth (IAC Model 4276, Huddleston,Satterfield, Evans & Mauney, Architects and Engi-neers, Tallahassee, FL) with the participant stand-ing. The voice signal recordings were made viaa stand-mounted microphone (Shure Model SM7)positioned 1 m in front of the participant at chestlevel connected to a Computerized Speech LabModel 4300B hardware/software system (CSL;KayElemetrics Corporation, Lincoln Park, NJ) Voicerecordings were digitized at a sampling frequencyof 44.1 kHz, stored, and analyzed using the CSL4300B system. The light signal from a quartz met-ronome (Matrix MR-500) was used to pace speak-ing rate.

ProcedureThe VOT values were calculated for bilabial

American English stop consonants in word-initialposition embedded in the phrases, ‘‘A peek at a pea-cock’’ and ‘‘A bee at a beehive.’’ The VOT of /p/ in‘‘peek’’ and ‘‘peacock’’ and the /b/ in ‘‘bee’’ and‘‘beehive’’ were measured for each production.This cognate pair of stop consonants was selectedto be used because previous research had shownsimilar patterns among different cognate pairswhen comparing singers and nonsingers.28 Eachparticipant performed at least one experimenter-su-pervised trial production to ensure appropriate rate,fundamental frequency, and intensity. All partici-pants could produce the phrases within an accept-able pitch or intensity range. The order of phraseproduction and phonatory task was countered-balanced within the groups of participants.

Page 4: 1-s2.0-S0892199705001189-main

57EFFECTS OF VOCAL TRAINING

Each participant was instructed to sing and speakeach phrase at a 5/4 rhythm on a comfortable, sin-gle note five consecutive times at an allegro rate of160 beats per minute (approximately three sylla-bles/second), as set by the flashing light of a metro-nome. Rate was controlled because it was reportedthat rate of speech has a significant effect on VOT,with slow rates associated with long VOT and fastrate associated with short VOT.29,30 Furthermore,a rate of approximately three syllables/second waschosen because it represented a moderately fastrate of speech and has been reported to be a com-mon oral reading rate in healthy young adults.31–37

The first and last productions were excluded fromanalysis, leaving the middle three productions foranalysis. A total of 720 tokens (30 participants 3

2 phrases 3 2 phonemes per phrase 3 3 repetitions3 2 phonatory tasks) were recorded and measuredfor VOT.

In an effort to capture a more realistic singingsample, the participants were instructed to imaginethat they were performing in a large auditoriumfilled with people throughout the experiment. Thecue was given to stimulate a singing sample thatrepresented how the participant would sound duringa performance. The influence of imagery and per-formance environment on temporal-acoustic meas-ures was reported by Rothman et al.38 Despite thenonsingers’ lack of significantly different durationmeasures across performance environment, resultsindicated that cueing participants to visualize per-forming in front of a crowd influenced temporal-acoustic measurement and may provide researcherswith more realistic values.

Temporal analysisIn accordance with previous VOT related studies,

VOT was measured by visually inspecting theacoustic signal as both an oscillogram trace andsound spectrogram via the CSL 4300B soft-ware.39,40 Using both the oscillogram and soundspectrogram displays of each phrase, VOT wasmeasured by placing a time marker at the onsetof the noise burst of each stop and another markerat the onset of steady-state vocal fold vibration.Steady-state vocal fold vibration was determinedusing the combined appearance of the first verticalstriation in the second formant on the sound

spectrogram and the first downward peak of thecomplex vowel waveform on the oscillogramtrace.39,40 The oscillogram and spectrogram weredisplayed in terms of time, denoted in millisecondsalong the horizontal axis. This allowed for directmeasurement of the time between the two markersand, thus, VOT. All VOT measures were made bythe lead investigator.

Statistical analysisTwo separate 2 3 2 mixed analyses of variance

(ANOVAs) were used to compare the participants’average VOTs across the between-subject factorsof vocal training (ie, trained versus nonsingers)and the within-subject factor of phonatory task(ie, singing versus speaking). Two separate AN-OVAs were used to simplify the statistical designand because it was already well established thatthe VOTs of voiced and voiceless stops are signif-icantly different.21,22,24 An alpha (a) level of 0.05was set as the level of significance. Relative power(1 – b) and effect size (h2) were also reported.

ReliabilityOne fifth of the 30 participants’ productions

(20% of the data) were chosen at random and rean-alyzed by the same investigator at least 3 weekspostrecording to determine intrarater reliability. In-terrater reliability was determined by having a re-search assistant (blind to the classification of theparticipants) measure VOT for 20% of the datathat the lead investigator had previously measured.Both intrarater and interrater reliability were in-dexed by the Pearson product moment correlation,and both intrarater and interrater reliability werehigh. For example, interrater reliability was r 5

0.85, and intrarater reliability was r 5 0.95. Themean VOT difference between the original and re-measured data was 1.41 ms for the intrarater reli-ability and 3.82 ms for the interrater reliability.

RESULTS

Trained singer and nonsinger main effectAs shown in Table 1, the VOTs were nearly iden-

tical for the two groups as the trained singers dis-played an average VOT for /p/ of 30.9 ms

Journal of Voice, Vol. 21, No. 1, 2007

Page 5: 1-s2.0-S0892199705001189-main

58 CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

[standard deviation (SD) 5 12.3 ms] and the non-singers’ average VOT for /p/ was 32.4 ms (SD 5

10.5 ms). Similarly for /b/, the trained singers dis-played an average VOT of 210.4 ms (SD 5 20.7ms), whereas the nonsingers’ average VOT was –5.8 ms (SD 5 17.6 ms). The differences betweenthe two groups were not significant for either /p/(F(1, 28) 5 0.209; P 5 0.65; h2 5 0.01; 1 –b 5 0.07) or for /b/ (F(1, 28) 5 0.850; P 5 0.36;h2 5 0.03; 1 – b 5 0.14).

Phonatory task effectsIn addition to examining trained singer and non-

singer differences, the VOT differences between thespeech and the singing tasks were examined usingthe mixed ANOVAs. Examination of Figures 1and 2 revealed that for the /p/ productions, bothgroups of subjects used longer VOTs during speechtasks. The average VOT for /p/ during singing was25.2 ms (SD 5 10.4 ms) for the trained singers and31.5 ms (SD 5 11.5 ms) for the nonsingers,

TABLE 1. Trained Singer and Nonsinger MeanVOT and SD in Milliseconds Across Voiced and

Voiceless Bilabial Stops

Vocal Training Phoneme Mean VOT (SD)

/p/ 30.9 (12.3)Trained singers (n 5 15)

/b/ 210.4 (20.7)/p/ 32.4 (10.5)

Nonsingers (n 5 15)/b/ 25.8 (17.6)

0

10

20

30

40

50

Speaking Singing

Phonatory Task

Vo

ice O

nset T

im

e (m

sec)

NonsingerSinger

FIGURE 1. Comparison of the mean VOT for /p/ in millisec-onds (ms) as a function of phonatory task and vocal training.

Journal of Voice, Vol. 21, No. 1, 2007

whereas the mean VOT for /p/ during speakingwas 36.7 ms (SD 5 11.5 ms) for the trained singersand 33.3 ms (SD 5 9.6 ms) for the nonsingers. For/b/, the average VOT during singing was –20.0 ms(SD 5 20.3 ms) for the trained singers and –15.2ms (SD 5 15.9 ms) for the nonsingers, and the av-erage VOT during speech was –0.7 ms (SD 5 16.5ms) for the trained singers and 3.6 ms (SD 5 14.2ms) for the nonsingers. These differences in meanVOT between speech and singing were significantfor /p/ (F(1, 28) 5 8.86; P 5 0.006; h2 5 0.24;1 – b 5 0.82) and /b/ (F(1,28) 5 26.56;P ! 0.001; h2 5 0.49; 1 – b 5 0.99).

Examination of Figure 1 reveals that the trainedsingers displayed shorter /p/ VOTs than the non-singers during singing (25.2 ms vs 31.5 ms), butthey displayed longer /p/ VOTs during speaking(36.7 ms vs 33.3 ms). The differences in VOTacross phonatory task and vocal training for /p/were significant (F(1, 28) 5 4.59; P 5 0.040;h25 0.14; 1 – b 5 0.54). However, this differencein VOT would be imperceptible, as both of thesevalues are well within the normal VOT range for/p/.

There was not an interaction between vocal taskand singing status in the VOTs for /b/ (Figure 2).Both the trained singers and the nonsingers usedsimilar voice onset times during the speaking(–0.7 ms vs 3.6 ms) and the singing (–20.0 ms vs–15.2 ms). However, the trained singers tended tovoice throughout the interval preceding their wordinitial /b/ productions, which resulted in negativevoice onset times. These differences in VOT across

-40

-30

-20

-10

0

10

20

30

Speaking Singing

Phonatory Task

Vo

ice O

nset T

im

e (m

sec)

NonsingerSinger

FIGURE 2. Comparison of the mean VOT for /b/ in millisec-onds (ms) as a function of phonatory task and vocal training.

Page 6: 1-s2.0-S0892199705001189-main

59EFFECTS OF VOCAL TRAINING

phonatory task and vocal training for /b/ were notsignificant (F (1, 28) 5 0.007; P 5 0.94; h2 5 0.01;1 – b 5 0.05).

DISCUSSION

The purpose of this study was to examine the ef-fects of vocal training and phonatory task on theVOTs of bilabial stops. There were no significantmain effect differences between average VOTs forthe trained singers and nonsingers across /p/ or /b/;however, there were significant differences be-tween mean VOTs for spoken and sung produc-tions. For /p/, sung productions displayed shorterVOTs than the spoken productions, whereas for /b/,the sung productions were produced with longerVOTs than the spoken productions. In addition, sig-nificant interactions during /p/ and /b/ production in-dicated that differences between speech and singingwere greater for trained singers than nonsingers.Discussion of the specific results follows.

Effects of vocal trainingNo significant differences in VOT were observed

for the main effect of vocal training. The overallVOTs of the trained signers and nonsingers were fairlysimilar. These results are similar to those of Brownet al,11 who found no significant difference betweentrained singers and nonsingers for sentence length,word length, and consonant-to-vowel duration ratiosfrom speech samples. Although the current resultspartially agree with the findings of Brown et al,11

they do not agree with the findings from a recentstudy by McCrea and Morris,28 who found signifi-cantly longer VOTs for trained singers as comparedwith nonsingers. Methodological and analysis differ-ences between the studies may be responsible for thedifferent results. For example, McCrea and Morris28

included voicing as a within-subject variable in a sin-gle 2 3 2 3 2 mixed ANOVA. The current experi-mental design used separate ANOVAs for /p/ and/b/ and thus treated voicing as a separate factor. Elim-inating voicing as a factor during analysis of the cur-rent data may have reduced its statistical influence onthe other factors and have provided a more conserva-tive analysis of the data. It is reasonable that thismore conservative analysis more likely reflects theobservable lack of effect of vocal training on VOT.

One possible explanation for the similar meanVOTs observed for the trained singers and nonsing-ers in the current study was the nonsingers’ amountof innate or natural singing talent. Watts et al41 re-cently reported that vocally untrained persons mayhave ‘‘natural singing talent,’’ as demonstratedthrough pitch matching accuracy. With regard tothe current study, it was possible that some nonsing-ers possessed natural vocal ability and used similararticulatory and phonatory movements as thoseused by trained singers. Although the inclusion cri-teria for the nonsingers used in this study was rela-tively strict and excluded anyone who had receivedany vocal instruction or practice from high school topresent, it was possible that some nonsingers pos-sessed some ‘‘natural’’ singing talent and produced/p/ and /b/ in a manner similar to that of the trainedsingers. Thus, the talented nonsingers’ VOTs re-sembled the trained singers’ VOTs. To rule outthis possibility, future research should examine thearticulatory timing in trained singers, talented un-trained singers, and untalented untrained singers.

Even though main effect VOT differences be-tween trained singers and nonsingers were not ap-parent, the search should continue for a reliableacoustic correlate for previously described percep-tual13 and physiologic1–6,11 differences betweentrained singers and nonsingers. It may be that fo-cusing on higher spectral moments such as standarddeviation, skew, and kurtosis rather than the meanmay provide an acoustic link between the physio-logic and perceptual distinction between singersand nonsingers. Finally, future research attemptingto acoustically separate trained singers from non-singers should include some form of perceptualevaluation to correlate possible psychophysicalinteractions with the acoustic measures.

Effects of phonatory taskIt was hypothesized that the participants would

display longer VOTs during singing than speaking.The current results partially agree with the hypothe-sis. Although the current results indicated thatspeaking and singing tasks were associated with sig-nificantly different VOTs, it was the VOTs duringspeaking that were longer than the VOTs measuredfrom singing for /p/. For /b/, the VOTs were longerduring singing than speaking. McCrea and Morris28

Journal of Voice, Vol. 21, No. 1, 2007

Page 7: 1-s2.0-S0892199705001189-main

60 CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

reported significantly longer mean VOTs acrossvoiced and voiceless stops produced during singingthan those produced during speaking by maletrained singers and nonsingers. Methodological dif-ferences between the two studies may have causedthe different results for /p/. The participants in thecurrent study were told to imagine that they werereading and singing the phrases in an auditoriumfilled with people, despite producing the phrases ina sound-treated booth, whereas the participants inthe previous study received no visual imagery in-structions. The instructions in the current study rep-resented an effort to make the sung phonatory tasksapproximate a singing performance. The current re-sults supported the findings of Rothman et al38 thatindicated the use of mental imagery results in tem-poral-acoustic measurement differences.

It is also possible that the participants simplyplaced more emphasis on the stops during speakingthan during singing. Voiceless phonemes ina stressed word-initial position, as was the case inthe current study, are associated with longer dura-tions.31–33 Furthermore, the prolongation of a soundis a main cue that identifies it as being stressed.During singing, the participants may have beenanxious to sing the vocalic portion of the wordsand thus produced the voiceless stops with less du-ration. Following this logic, a decrease in syllable

Journal of Voice, Vol. 21, No. 1, 2007

stress with a voiceless stop would be associatedwith a shortened noise burst, which could explainthe shorter VOT during singing than speaking.This adjustment would allow the participants toproduce the syllable in the duration set by the met-ronome with a longer vowel.

Although this explanation may be appropriate forthe productions containing /p/, the same is not truefor /b/. Lisker and Abramson21 noted that voicedstops can display negative and positive VOTs.Thus, relatively long VOTs for voiced stops couldbe in positive or negative directions. In the currentstudy, the longer VOTs for /b/ were generally in thenegative direction, which indicates that the partici-pants produced /b/ with prevoicing. During thesinging task, the negative VOTs for /b/ increasedsignificantly, indicating that the participants contin-ued phonating after producing the initial and medial/a/ in ‘‘A bee at a beehive.’’ This tendency was ob-served during VOT measurement. Figure 3 displaysthis tendency for the participants to continue pho-nation until the release of the stop burst duringsinging. Although prevoicing was also observedduring the speaking tasks, it was clear that the par-ticipants produced /b/ with prevoicing more oftenduring singing. This supported the current hypothe-sis and previous report28 that during singing a per-son will prolong a vocalic portion or quickly

“A bee at a bee h i ve”

FIGURE 3. Oscillogram and spectrogram of ‘‘A bee at a beehive’’sung by male trained singer #10.

Page 8: 1-s2.0-S0892199705001189-main

61EFFECTS OF VOCAL TRAINING

release the stop burst of a voiced consonant tomaintain the melody and pitch of a sung phrase.

The differences in VOT between the sung and thespoken productions may also reflect a difference inarticulatory accuracy. Vennard17 and Titze20 re-ported that during singing, accuracy of articulationoften suffers. The relatively long VOTs for /p/ dur-ing speaking probably reflected an articulatoryaccurate or stressed production. Likewise, the rela-tively short negative VOTs for /b/ during speakingmay also reflect increased sound stress or articula-tory accuracy. However, in the case of a voicedstop, increased emphasis/ accuracy would resultin briefly negative or positive VOTs. The long neg-ative VOTs for /b/ and the short positive VOTs for/p/ during singing could reflect decreased articula-tory accuracy and/or decreased sound emphasisbecause of the overriding desire to maintain stablemelody, tone, and intensity during singing.

A potentially biasing factor may have been themanner in which the speaking and singing stimuliwere modeled by the investigator. It was possiblethat the speaking model provided by the researchermay have placed greater emphasis on the produc-tion of /p/ during modeling the speech task andless emphasis on /b/ during modeling the singingtask. The researcher may have unconsciously spo-ken the phrases with extra stress on the word initialvoiced and voiceless bilabial stops or did not useenough word initial stress during the sung produc-tions. Future studies should use research assistantsblind to the purpose of the study and be designedto control the manner in which the stimuli are pre-sented to participants.

A significant interaction occurred between pho-natory task and vocal training. In this interaction,the trained singers showed a larger VOT differencebetween speaking and singing in comparison withthe speaking and singing VOTs for the nonsingers.These results were not in agreement with two pre-viously described studies designed to examinespeech duration in trained singers and nonsing-ers.11,13 Brown et al11 reported no significant differ-ences in spoken sentence, phrase, or word durationbetween trained singers and nonsingers. However,Brown et al11 did not examine VOT. Rothman etal13 examined closure duration, which accordingto its description was equivalent to VOT, for /t/

and reported significant differences betweenperceptually identified singers and perceptuallyunidentified singers. However, Rothman et al.reported that the closure durations of /t/ were sig-nificantly shorter for the perceptually identifiedsingers than the unidentified singers.

Whereas longer VOTs for trained singers thannonsingers do not agree with research examiningspeech duration in the two groups, the resultswere in general agreement with the results of previ-ous studies designed to examine articulatory and/orphonatory function in trained singers and nonsing-ers during speech and singing tasks. Brown et al4

and McGlone5,6 reported several articulatory differ-ences between trained singers and nonsingers dur-ing singing, including greater jaw and tonguedisplacement and more stable lingual pressure fortrained singers, but no differences between thegroups during speech. Previous results have also in-dicated phonatory differences between trained sing-ers and nonsingers during singing, including lowervertical laryngeal position for trained singersduring high-frequency production2 and decreasedvocal fold tension during loud, high-frequencyphonation for trained singers.1,3

The current results indicated that the trainedsingers and nonsingers differ during sung produc-tions of a phrase containing /p/ or /b/ in word initialposition. This interaction further indicates that thesignificant main effect of phonatory task on VOTis more apparent in trained singers as comparedwith nonsingers. As can be observed in Figure 1,the general trend for /p/ VOT to be longer duringspeaking than singing was greater in the trainedsingers’ VOTs as compared with the nonsingers’VOTs. The observed interaction between phonatorytask and vocal training might be explained by ex-amining some specific articulatory adjustmentslearned during vocal training. For example, trainedsingers learn to shape the vocal tract in a specificconfiguration to produce a perceptually distinctivetone. This distinctive tone is perceptually character-ized as resonant or full sounding and has been asso-ciated with an acoustical phenomenon known as thesinging formant.8–10 Ultimately, the vocal tract ma-nipulations result in an increase of space in the pos-terior oropharyngeal cavity and/or an overalllengthening of the vocal tract. Although these vocal

Journal of Voice, Vol. 21, No. 1, 2007

Page 9: 1-s2.0-S0892199705001189-main

62 CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

tract adjustments provide the singer with an in-crease of acoustic energy, which allows the singerto be heard over an orchestra, they may hinder ar-ticulatory accuracy.10 Vocal pedagogues and voicescientists have acknowledged that trained singersoften sacrifice clear articulation to produce a per-ceptually desirable sound at a uniform inten-sity.16,17,20 The relatively short positive VOTs for/p/ and long negative VOTs for /b/ during singingin the current study may be a reflection of an artic-ulatory consequence of the trained singers quicklyproducing the initial stop to have time to openand lengthen the vocal tract to produce a perceptu-ally resonant vowel either before /b/ production orimmediately after /p/ production. Finally, furtherresearch is needed to test the relation between vocaltract configuration and the articulatory accuracyproposed above.

CONCLUSIONS

These acoustic results indicated that VOT maybe an effective measure for examining vocal tractadjustment differences between speech and singing.Furthermore, the results provided further supportfor the notion that all participants used different ar-ticulatory and/or phonatory movements duringspeech as compared with singing. This finding indi-cates that, regardless of training, people make sig-nificant timing adjustments at the phonemesegment level when they sing, but trained singersseem to make more noticeable timing adjustmentsthan nonsingers.

In conclusion, these results represent a foundationfor future researchers interested in finding a correla-tion between physiologic vocal tract adjustmentsduring speech and singing and temporal-acousticmeasures. Future research using a combination ofphysiologic, aerodynamic, acoustic, and perceptualmeasures should be conducted to more closely ex-amine the effects of vocal tract adjustment on thetemporal-acoustic signal, and the difference be-tween the vocal tract adjustments of trained singersand nonsingers during speech and singing. Futureresearch examining VOT across voice-types, suchas tenors, baritones, and basses, may also providesome insight into the singing mechanism.

Journal of Voice, Vol. 21, No. 1, 2007

REFERENCES

1. Gauffin J, Sundberg J. Spectral correlates of glottal voicesource waveform characteristics. J Speech Hear Res.1989;32:556–565.

2. Shipp T, Izdebski K. Vocal frequency and vertical larynxpositioning by singers and nonsingers. J Acoust Soc Am.1975;58:1104–1106.

3. Sundberg J, Rothenberg M. Some phonatory characteris-tics of singers and nonsingers. Sp Trans Lab-Quart Prog-ress Stat Report. 1986;4:65–77.

4. Brown WS, Rothman H, Williams W. Physiological differ-ences between singers and non-singers. In: Lawrence V,ed. Transcripts of the Seventh Symposium on Care of the Pro-fessional Voice. New York: Voice Foundation; 1975:11–18.

5. McGlone R. Lingual pressure variation during singing bytrained and untrained individuals. Presented at the FifthSymposium on Care of the Professional Voice, New York,June 1976.

6. McGlone R. Supraglottal air pressure variation fromtrained singers while speaking and singing. In:Lawrence V, ed. Transcripts of the Sixth Symposium onCare of the Professional Voice. New York: The VoiceFoundation; 1977:48–49.

7. Sundberg J. Formant structure and articulation of spokenand sung vowels. Folia Phoniat. 1970;22:28–48.

8. Sundberg J. The source spectrum in professional singing.Folia Phoniat. 1973;25:71–90.

9. Sundberg J. Articulatory interpretation of the ‘‘singing for-mant.’’ J Acoust Soc Am. 1974;55:838–843.

10. Sundberg J. The acoustics of the singing voice. ScientificAm. 1977;3:82–91.

11. Brown WS, Rothman HB, Sapienza CM. Perceptual andacoustic study of professionally trained versus untrainedvoices. J Voice. 2000;14:301–309.

12. Lindblom BE, Sundberg J. Acoustical consequences of lip,tongue, jaw, and larynx movement. J Acoust Soc Am. 1971;50:1166–1179.

13. Rothman HB, Brown WS, Sapienza CM, Morris RJ.Acoustic analyses of trained singers perceptually identifiedfrom speaking samples. J Voice. 2001;15:25–35.

14. Schutte HK, Miller R. Differences in spectral analysis ofa trained and an untrained singer. NATS Bull. 1983;Nov/Dec:22–26.

15. Bartholomew WT. A physical definition of ‘‘good voice’’quality in male voice. J Acoust Soc Am. 1934;6:25–33.

16. Miller R. English, French, German, and Italian Techniquesof Singing: A Study in National Tonal Preferences andHow They Relate to Functional Efficiency. Metuchen,NJ: Scarecrow Press; 1977.

17. Vennard W. Singing: The Mechanism and the Technique.4th ed. New York: Carl Fischer; 1967.

18. Kitzing P. LTAS criteria pertinent to the measurement ofvoice quality. J Phonet. 1986;14:477–482.

19. Wedin S, Leanderson R, Wedin L. Evaluation of voicetraining. Folia Phoniat. 1978;30:103–112.

Page 10: 1-s2.0-S0892199705001189-main

63EFFECTS OF VOCAL TRAINING

20. Titze IR. Principles of Voice Production. EnglewoodCliffs, NJ: Prentice Hall; 1994.

21. Lisker L, Abramson A. A cross-language study of voicingin initial stops: acoustical measurements. Word. 1964;20:384–422.

22. Lisker L, Abramson A. Some effects of context on voiceonset time in English stops. Lang Speech. 1967;10:1–28.

23. Baran JA, Laufer MZ, Daniloff R. Phonological contrastiv-ity in conversation: a comparative study of voice onsettime. J Phonet. 1977;5:339–350.

24. Klatt DH. Voice onset time, frication, and aspiration inword-initial consonants clusters. J Speech Hear Res.1975;18:686–706.

25. Port RF, Rotunno R. Relation between voice-onset timeand vowel duration. J Acoust Soc Am. 1979;66:654–662.

26. Weismer G. Sensitivity of VOT measures to certain seg-mental features in speech production. J Phonet. 1979;7:197–204.

27. Zlatin MA. Voicing contrast: perceptual and productivevoice onset time characteristics of adults. J Acoust SocAm. 1974;56:981–994.

28. McCrea CR, Morris RJ. Comparisons of voice onset timefor trained male singers and male nonsingers duringspeech and singing. J Voice. In press.

29. Kessinger R, Blumstein S. Effects of speaking rate onvoice-onset time in Thai, French, and English. J Phonet.1997;25:143–168.

30. Kessinger R, Blumstein S. Effects of speaking rate onvoice-onset time and vowel production: some implicationsfor perception studies. J Phonet. 1998;26:117–128.

31. Crystal T, House A. A note on the variability of timingcontrol. J Speech Hear Res. 1988;31:497–502.

32. Crystal T, House A. Articulation rate and duration ofsyllables and stress groups in connected speech. J AcoustSoc Am. 1990;49:1842–1848.

33. Miller JL, Grosjean F, Lomanto C. Articulation rate and itsvariability in spontaneous speech: a reanalysis and someimplications. Phonetica. 1984;41:215–225.

34. Ramig L. Effects of physiological aging on selectedacoustic characteristics of voice. J Commun Dis. 1983;16:217–226.

35. Snidecor JC. A comparative study of pitch and durationcharacteristics of impromptu speaking and oral reading.Speech Monographs. 1943;10:50–56.

36. Snidecor JC. The pitch and duration characteristics of su-perior female speakers during oral reading. J Speech HearDis. 1951;16:44–51.

37. Walker V. Durational characteristics of young adults dur-ing speaking and reading tasks. Folia Phoniat. 1988;40:12–20.

38. Rothman HB, Brown WS, LaFond JR. Spectral changesdue to performance environment in singers, nonsingers,and actors. J Voice. 2002;16:323–332.

39. Brown WS, Morris RJ, Weiss R. Comparative methods formeasurement of VOT. J Phonet. 1993;21:329–336.

40. Smith BL, Hillenbrand J, Ingrisano D. A comparisonof temporal measures of speech using spectrogramsand digital oscillograms. J Speech Hear Res. 1986;29:270–274.

41. Watts CR, Murphy J, Barnes-Burroughs K. Pitch matchingaccuracy of trained singers, untrained subjects with talentedsinging voices, and untrained subjects with nontalentedsinging voices in conditions of varying feedback. J Voice.2003;17:185–194.

Journal of Voice, Vol. 21, No. 1, 2007