Toward Formant Synthesis With Articulatory Controls

8/3/2019 Toward Formant Synthesis With Articulatory Controls

1/6

TOWARD FORMANT SYNTHESIS WITH AR TICULATORY CONTROLSKennethN. Stevens

Research Laboratory of Electronics and Department of Electrical Engineering andComputer Science, Massachusetts Institute of Technology, Cambridge MA 02139ABSTRACT

This paper describes some recent advances informant synthesis of speech. Formant synthesis is basedon an acoustic representation of speech-sound productionin terms of sources of sound and filtering of these sourcesby the vocal tract. An extension of this source-filtermodel introduces a set of higher-level control parametersthat specify how the subglottal pressure, the laryngealconfiguration and state, and the vocal-tract configurationchange with time during an utterance. A set of mappingrelations derives the lower-level acoustic parameters fromthe higher-level quasi-articulatoryparameters. Synthesisof the segmental, syllabic, and prosodic aspects of anutterance with this more compact set of higher-levelparameters is illustrated with examples. Some advantagesof this approach to formant synthesis are described.

INTRODUCTIONFormant synthesis (as it is often callednowadays) had its beginning 40 o 50 years ago with thework of Gunnar Fant [11. This approach to speechsynthesis emerged from Fants development of the source-filter theory of speech production [2]. The simplest form

of this source-filter heory was applied most naturally tononnasal vowels. It showed that the quasi-periodicacoustic source at the glottis is largely independent of thevocal-tract shape for vowels. The transfer function fromthe acoustic volume-velocity source at the glottis to thevolume velocity at the lips was shown to be an all-poletransfer function. Consequently the synthesis of vowelswas relatively simple: an appropriate glottal waveformwhose frequency could be manipulated formed theexcitation for a cascade of simple resonators whosefrequencies could be changed by a set of controlparameters.This simple but important beginning had severallimitations. While it could do a reasonable synthesis ofsequences consisting of vowel and glides, it could not doas good a job for nasal vowels and consonants, for whichthe transfer function is more complex. Secondly, thesynthesis of obstruent consonants required differentsources, arising from noise due to turbulent airflow near aconstriction in the vocal tract, and the filtering of thesesources was far from being an all-pole filter. Third, thequality of the synthesized voice for vowels was not very

0-7803-7395-2/02/$17.00 02002 IEEE 67

natural, particularly when attempts were made tosynthesize a female voice.KLSYN and DECTALK

In the 1970s and 1 9 8 0 ~ ~ennis Klatt madesignificant advances in overcoming some of theselimitations. His work led to the development of DECtalk,which has been used in a number of applications,particularly in reading machines for the blind.speech based on the source-filterapproach, Dennisdeveloped ways to produce the frication noise forobstruent consonants using a bank of resonators connectedin parallel, each resonator representing a formant [10,11].Control of the amplitude of excitation of each resonator bya frication noise source was provided, so that the spectrumof the noise could be shaped by specifying the relative .amount of excitation of each natural frequency. Dennisalso made significant mprovements in the synthesisof theglottal source [121. Several parameters were available tocontrol the waveform of this source, and this flexibility inthe source made it possible to synthesize different voices,including female voices. The synthesizer with thisinventory of controls is known as KLsyn. With handgenerationof the control parameter for this synthesizer, twas possible to copy-synthesize utterances that wereindistinguishable from the utterances hat were copied,thus demonstrating the versatility of the synthesizer. Asimilar demonstration, using a different synthesizer butalso based on source-filter theory, was made by JohnHolmes in England [7].Corporation, Dennis developed a set of rules forcontrolling KLsyn from text. The complete text-to-speechsystem was first called Klattalk, and then DECtalk. Theintelligibilityof this synthesizer was quite high, althoughthe synthetic speech lacked some naturalness. A numberof male and female voices were available on thesynthesizer. This synthesizer was well regarded as areading machine for blind people, who particularly likedthe feature that allowed the speed of reading to beadjusted.significant advance in speech synthesis echnology and inour understanding of human speech production. However,in that synthesizer there were a large numberof

In order to improve the quality of synthetic,

Together with colleagues at Digital Equipment

The Klatt synthesizer (KLsyn) represented a


2/6

parameters (calledKL parameters) to control (about 40)and the rules for controlling the synthesizer were complex.For example, to generate an aspirated stop consonant, asin the syllable/ta/n English, an initial burst of noiseneeds to be generated, about 5-10 milliseconds induration; this is followed by an interval of aspiration; thenthe voicing source turns on, but for the first few tens ofmilliseconds there is an increased open quotient andspectrum tilt for this source, reflecting a spread glottisconfiguration; and, finally, the glottal source returns tonormal voicing. Also, during the initial few tens ofmilliseconds of voicing, the fundamental frequency mustbe raised somewhat relative to its normal value [15,8].The spread glottis during the aspiration and early part ofvoicing causes a small increase in the first-formantfrequency relative to its value for modal voicing. All ofthese parameters need to be controlled in a synthesizerbased on acoustic sources and filters. This combination ofacoustic parameters, however, is a consequence ofrelatively simple articulatory movements --- a spreadingof the glottis and a stiffening of the vocal folds, properlytimed in relation to the tongue-blade release for theconsonant. Or, as another example, the presence ofnasalization in KLsyn is marked by the introduction of apole-zero pair, each with a frequency and a bandwidth,and by an increase in the first-formant bandwidth becauseof increased acoustic losses in the nasal cavity. Againseveral parameters are needed to control the acousticconsequences of a simple articulatory gesture --- openingthe velopharyngeal port.At the time the DECtalk synthesizer wasdeveloped there were still aspects of the rules that requiredfurther work. Among these problems were the synthesisof nasal vowels and consonants, the synthesis of someconsonant sequences, improvement of voice quality, andsome prosodic aspects including timing, synthesis ofpauses, and initiation and termination of phrases.

HLSYNThe examples given above, and many others,demonstrate that attempts to do synthesis usingacoustically based parameters can lead to a complex set ofrules involving the control of many parameters. Thearticulatory manipulations that are involved in humangeneration of aspects of simple utterances likenasalization, aspirated stop consonants, or consonantsequences are quite simple relative to the detailed pattemsof parameters that are required in a strictly acousticallybased synthesizer. During the past few years, a controlsystem that is more articulatory-based has been developedfor the basic matt formant synthesizer [19,6]. Thenumber of articulatory-type parameters (higher-level, orHL parameters) that are controlled for this synthesizer is13 rather than the 40-odd acoustically based parameters.The KL Parameters are derived from the HL parameters

through a set of mapping relations that reflect thetransformation from articulatory movements toaerodynamic parameters to acoustic patterns. Thesynthesizer that includes the mapping relations and KLsynis called HLsyn.A block diagram of HLsyn is shown in Fig. lb.The 13 control parameters are shown as inputs at the top.A listing of these parameters is given in Table 1,with abrief description of each parameter. The KL parametersthat are derived from the mapping relations are dividedinto two classes: those that describe the acousticproperties of the sources, and those that specify thetransfer functions. The labels on the sketch of the vocaltract in Fig. l a illustrate the aspects of articulation that arecontrolled by each HL parameter.

f l 2, f3, f4an abI /

f0 ag ap ai ab an dc pS ue f l 12 f3 f4 HL parametersbased)Mapping Relations(includingthe circuit model usedto calculate pressures and flows)

Sources Transfer functions Speech outputI I I I

Figure 1. (a) Sketch of the vocal tract and larynx showingarticulators that are controlled by HL parameters inHLsyn. (b) Block diagrams showingHL parameters,mapping relations, and KL parameters calculated fromthese relations. (From Hanson and Stevens, [6].)

There are three types of control parameters inHLsyn: (1) a subglottal pressureps hat is the basicrespiratory driving force; (2) a set of parameters(g,dc. fD. ue) that control the glottal configuration and state,and, together with ps,control the glottal airflow; and (3)

68


3/6

parameters that describe the shape of the vocal tract(aof d, b. an). Note that the formant frequenciesa-@specify the natural frequencies of the vocal tract, and,although they are acoustic parameters, they are equivalentto specifying the vocal-tract shape for vowels.

psa1

TABLE .Description ofHlsynparameters

Subglottal pressure (cm H 2 0 )Cross-sectional area of constriction at the lips(mm2h

First four natural frequencies of vocal tract,assuming no narrow local constrictions(Hz)Fundamental frequency due to activeadjustments of vocal folds (Hz)Average area of glottal opening between themembranous portion of the vocal folds (mm2)Area of the posterior glottal opening(mm2)

Cross-sectional area of tongue bladeCross-sectional area of velopharyngeal port

Change in vocal-fold or wall compliances (%)Central to the mapping relations from the HLparameters to KL parameters is a procedure forcalculation of the airflows through the glottis and throughthe vocal tract, the intraoral pressure, and the influence ofthe yielding vocal-tract walls on these calculations. Fromthese airflows and pressures the amplitudes of the glottalsource and of turbulence noise at a vocal-tract constriction

are calculated. It is important to note that the acousticsources are not directly controlled in the HLsyn as theyare in KLsyn. Rather, these sources arise as aconsequence of adjustments of the subglottalpressure andof the glottal and supraglottalconstrictions n the vocaltract.parameters that can be set for different male and femalespeakers. These parameters include such factors as thedefault value of the average glottal opening duringphonation, the vocal-tract length, the fl) range for thespeaker, the transglottalpressure at phonation threshold,and others.We illustrate the operation of HLsynbydescribing some of the HL parameters used for thesynthesis of the syllableIta/that was discussed above.The formation of the tongue-blade constriction simplemented by the area parameter&,which is set to zeroduring the It/ closure, and is then released with a particulartrajectory [5,18]. Laryngeal activity is specified by thecross-sectional areagg of the glottis, which is widened,

The mapping relations inHLsyn include

and the vocal-fold complianced which is reduced toproduce increased stiffness for the voiceless consonant.These actions both contribute to inhibitionof glottalvibration. The tongue-body position during theconsonant, and its movement after the consonant release,are described by the first two formant frequencies and-2,which, in effect, specify the tongue-body position.With proper timing of these parameters, the relativelycomplex acoustic pattern for /t/ described earlier isgenerated. These changes in the acoustic pattern happenautomatically; they are built into the mapping relationsfrom HL to KL parameters.This example highlights the observation that therules for producing individual segments can be organizedwith respect to the distinctive features for the segments.In this example, the distinctive features of the consonant/t/ are [-sonorant, -continuant, +tongue blade, -voiced],and each of these features refers either to a particulararticulator or to some action of an articulator.

TOWARD RULESFOR SYNTHESISUSINGHLSYN

We iurnnow to a review of the structure of someof the rules for synthesisof utterances when anarticulatory-based synthesizer likeHLsyn is used. It isassumed that there is an initial planning stage for anutterance in which the input to the synthesizer is asequence of segments, each of which is a bundle ofdistinctive features. These segments are organized intosyllables, and word boundaries are marked. Phrases arealso marked, and syllable prominences are identified. Inthe following paragraphs we discuss someof the issuesinvolved in formulating rules that calculate HL parameters(1) for simple CVC syllables, (2) for syllables that containmore complex consonant clusters,(3) for consonantsequences across syllable boundaries, and (4) for entirephrases, including pauses, which contain prominent andreduced syllables. Finally, the adjustments that areneeded to synthesize different voices are discussed,together with approaches to the synthesis of differentspeech styles, including the variation of speech rate.

SIMPLE CVC SYLLABLESSynthesis of a stressedCVC syllable in anutterance-internal position produces two abrupt landmarksin the signal, one at the time of release of the primaryarticulator for the initial consonant and one at the time ofclosure for the final consonant. (We do not consider hereglides, for which there is no acoustic discontinuity n thesignal.) The consonants can be sonorants or obstruents,and, in the case of obstruents, can be fiicatives or stopswhich can be voiced or voiceless.

above illustrates how the HL parameters are controlled oThe synthesis of syllable-initial /t/ discussed

69


4/6

produce an initial voiceless stop consonant. Changing thevoicing of the initial stop consonant (from It/ to /cl/)requires relatively simple adjustments in the HLparameters. The most obvious changes are inx, hichremains more or less in modal position for Id ,and in&which increases relative to its default value, i.e., the vocalfolds become more slack. These adjustments facilitateglottal vibration during the consonant closure, in contrastto the adjustments for It/,which inhibit glottal vibration.Similar simple parameter adjustments are needed tochange from It/ to Is/ , (i.e., from stop to fricative, or from[-continuant] to [+continuant]), from It/ to Ipl (i.e., adifferent place of articulation), or from Id to Id (i.e., froma stop to a nasal). And synthesis of other sonorantconsonants or glides is achieved with proper formanttransitions which can be specified by tables.rules for synthesis of the initial consonant are organizedbased on the distinctive features for the consonant ---whether the consonant is [-sonorant] or [+sonorant],whether it is [-continuant] or [+continuant] in the case ofconsonants that are [-sonorant], whether it is [+voiced] or[-voiced], and the place of articulation. The rules forsynthesizingtwo consonants that differ in a single featureinvolve, for the most part, changes in one of the HLparameters, although minor adjustments may be needed insome of the other parameters.to syllable-initial consonants, but with some changes inthe details. For example, a syllable-finalIt/ is producedby forming a closure with the parameter&. In manycases, glottal vibration at the time of closure is inhibitedby narrowing the area of the glottal opening,producingglottalization. For other voiceless consonants, an increasein is used to terminate glottal vibration at the time ofclosure. In the case of syllable-final nasal consonants, theopening of the velopharyngeal port (with the parameter-n) is generally initiated well within the vowel.The time course of the formant transitionsthroughout the CVC syllable reflect primarily therelatively slow movements of the tongue body, liprounding, and jaw [141. Close to the consonant landmarksthere may be more rapid movements of some fonnants,particularly for labial consonants. The rules forgenerating these formant movements have been wellstudied, and will not be elaborated on here.

In terms of the HL parameters, then, the basic

Synthesis of syllable-final consonants is similar

SYLLABLESWITH CONSONANT CLUSTERSWhen a syllable begins or ends with a consonantcluster, as in small. day, orU, or example, the HLparameters are, in some respects, similar to aconcatenation of parameters for singleton consonants, butwith some modification. The process is illustrated withthe initial consonant sequence in small. The articulatorymovements that a speaker produces in such a sequence

70

CONSONA NT SEQUENCES ACROSS SYLLABLEBOUNDARIES

involve two kinds of actions. One is the movement of aprimary articulator, that is, an articulator that is producinga constriction or closure in the oral cavity (in this case, thetongue blade for Is 1 and the lips for/mi).The other is theaction of secondary articulators "behind" the constrictionsformed by the primary articulators (in this case, the glottisand vocal folds for Is/ and the velopharyngeal opening for/mi).These two sets of actions must be coordinated toproduce a sound pattern that can be interpreted by alistener. In this example of the wordd,here isinitially a narrowing of the tongue-blade constriction&that is normally produced by 1st in initial position.Toward the end of this action, the lip closure for I d sproduced withd.The parameter is zero for a few tensof milliseconds, and then increases. The time of openingof the lips is marked by a discontinuity in the sound. Atthe same time, the glottis areaa an articulator behind theconstriction) is increased for the voiceless Is/, and thevelopharyngeal openingan (another such articulator) isincreased for the nasal I d . Termination of fiication noisefor Is/ occurs at the time an ncreases. Onset of glottalvibration occurs wheng s narrowed sufficiently topermit phonation. This interplay of four overlappingparameters ---& andd or primary articulators andgandan or the secondary articulators --- leads to asequence of events: onset of frication noise, termination offiication noise, a time interval in which thevelopharyngeal port is open and the glottis is also open,onset of voicing for nasal murmur, labial release, andvelopharyngeal closure in the vowel. While these fourparameters are doing their dance, the formant transitionsreflect the slower tongue body movements.Similar sequences of these two kinds ofparameters --- primary articulatory movements andsecondary articulators --- are involved in other within-syllable consonant clusters. There are, of course,constraints on the consonant sequences that can occur inprevocalic and in postvocalic position in English. Forexample, each member of a sequence of obstruentconsonants must have the same voicing feature; andsonorant consonants are almost always adjacent to vowels.

While there are constraints on consonantsequences within a syllable, there are no such limitationson sequences across syllable or word boundaries. Thuswe have sequences like his coat, where there is a changein voicing in [continuant], and in place of articulation. Or,in the sequence dim s u m here is a change in the feature[nasal], a shift from a [+sonorant] to a [-sonorant]segment, and a change in place of articulation. In thesetwo examples, and in many others where the secondsyllable is not reduced, articulatory gestures for the


5/6

syllable-initial consonant are well preserved, while thegestures for the syllable-final consonant are oftenweakened. These kinds of articulatory overlap andweakening need to be incorporated in the synthesis rulesforHLsyn.An exception to this tendency is a sequenceconsisting of a syllable-final stop or nasal consonantfollowed by a word-initial a/, as in seen those or caught-hese, where the nonstrident at undergoes significantmodification[131. (It is noted that /a/ is the mostfrequently occurring syllable-initial consonant in English.)On the surface, these versions of /a/ depart significantlyfrom the basic representation of this consonant as[+continuant, -strident, +tongue blade], but themodifications in these preceding contexts of nasal or stopcontexts can be shown to be based on simple principles ofoverlap and enhancement [9].of alveolar stop (or nasal) consonants followed byconsonants with a different place of articulation. Anexample is in the word batman or teen bag, where thealveolar consonant in running speech may be weakenedby failure to make the closure with the tongue blade. Thetongue-body movement for the alveolar consonant, asimplemented in HLsyn by movements of f2andf is stilllargely preserved [9]. In these examples involvingsyllable-initial/a/ and syllable-final alveolars, then, thereare some gestures (and therefore someHL parameters)that remain the same independent of context, and otherparameters that may be modified in casual speech.Synthesis rules must be aware of these modifications andinvariances.

Similar principles can be involved in sequences

SYNTHESISOF PHRASES: IMPLEMENTATIONOF PROSODYWe discuss here two aspects of prosody (apartfrom timing) for which control of HL parameters plays animportant role. These are the subglottal pressureE,ndthe glottal configuration, as specified by the areag f theglottis. The beginning of a phrase is usually preceded bya pause, and the end of a phrase is usually followed by apause. Frequently a phrase is produced on one respiratorylimb. Internal to such a phrase, a pause may be inserted.Physiological actions for these various events in anutterance have been examined by S l i h 16], and we drawon her work in proposing how HL parameters can bemanipulated in the synthesis of these events.At the beginning of a phrase of this type, thesubglottal pressure rises, and at some point during this riseexceeds the phonation threshold, and voicing begins.This phonation threshold is a part of the specification ofthe individual characteristics of a speaker in the HLsynmapping relations. If the initial segment is an obstruentconsonant, this consonant is initiated at a lowerJ and

phonation may begin at aE igher than the threshold.The subglottal pressure usually continues to rise for thenext few tens of milliseconds, dependingon the locationof the first prominence in the phrase. Implementing thistype of onset in HLsyn produces a natural beginning to aphrase.At the end of the phrase, the subglottal pressuredecreases, and in HLsyn this influences the fundamentalfrequency and the amplitude of phonation. Phonationceases whenE rops below the phonation threshold. It iscommon for the glottis to begin to spread near the end ofthe phrase. In HLsyn, this is implemented by increasing

x. n acoustic consequence in the synthesizer is afurther decrease in amplitude of the glottal source, and adecrease in the high-fiequency amplitude relative to thelow-frequency amplitude of the source, leading to abreathy termination [171. A pause within an expiration isusually implemented by a decrease in subglottal pressureof a few cm H20, followed by an increase. This methodof generating a pause is simulated in HLsyn bymanipulatingp.Within a phrase, different syllables in theplanning stage are marked as more prominent or asreduced. Fundamental frequency and timing certainlyplay a role in implementing prominence. In HLsyn, theparameters can be increased to generate a prominentsyllable, andE an be decreased, together with anincrease in the glottal openingg,o produce a reducedsyllable.SYNTHESIZING DIFFERENT SPEAKERSANDSPEAKING STYLES

The synthesis of utterances with HLsyn or withIUsyn involves not only rules for generating the syllables,segments, and phrases, but also can include procedures forchanging speaker-dependent characteristics and speakingstyle (e.g., speaking clearly, casually, or rapidly, orspeaking with emotion). This is a topic that is onlybeginning to be examined, andHLsyn is a tool that can beused to contribute to research in these areas.speakers is related to their anatomical characteristics, suchas vocal-tract length, vocal-fold anatomy, nasal-cavityanatomy, ratio of pharyngeal-to-oral cavity lengths, andresonances of the tracheal cavity. The work of Hanson [3]and of Hanson and Chuang [4 ] has documented he rangesof individual differences in the characteristicsof theglottal source for female and male speakers, and some ofthese differences are presumably related to laryngealanatomy. In KLsyn and in HLsyn there are speakerconstants that can be adjusted to simulate these propertiesof the glottal source.dialect and to the learned speaking patterns of individualtalkers. These differences are especially evident in vowel

For example, one set of differences between

Other speaker characteristics are related to the

71


6/6

patterns. In a text-to-speech system involving HLsyn,these individual differences are handled with tables ofvowel formant targets and offglides, as well as formanttransitions into and out of consonants. Differences in theway speakers produce aspiration, the relative amplitudeand spectrum of strident consonants, and the way phrasesare initiated and terminated have all been observed, andthese differences must be accounted for by parameters ortables in the synthesizer.provides a natural way of manipulating clarity of speechand rate of speaking. The evidence from DECtalk is thatvery high speeds of synthesis are often used by blindindividuals who are familiar with the synthesizer. Clarityand speed are presumably adjusted in part by changing theamount of overlap between articulatory gestures. Forexample, production of a clear version of the word batmanwould involve creating both a closure and a release for /t/,with accompanying glottal adjustments, whereas in rapidspeech, the same gestures are used but with significantlymore overlap. Although a beginning has been madetoward the study of these and other topics related tospeech style, there is still much to be leamed.

Finally, an articulatory-based synthesizer

ACKNOWLEDGEMENTS

The development of the HLsyn synthesizer hasbeen the work of a number of individuals, whosecontributions are acknowledged. These include CorineBickley, Robert Beaudoin, Ed Bruckert, Eric Carlson,Helen Hanson, and David Williams. This research hasbeen supported in part by grants from the NationalInstitutes of Health to Sensimetrics Corporation(NS27407, MH52358, and DC04331) and to theMassachusetts Institute of Technology (DC00075).

REFERENCES[11 G. Fant, Modem Instruments and Methods forAcoustic Studies of Speech, Acta PolytechnicaScandinavica (Physics including Nucleonics series) Ph I,[2] G. Fant, Acoustic Theory of Speech Production, TheHague: Mouton, 1960.[3] H. M. Hanson, Glottal Characteristics of FemaleSpeakers: Acoustic Correlates,J. Acoust. Soc. Am. vol.[4] H. M. Hanson and E. S.Chuang, GlottalCharacteristics of Male Speakers: Acoustic Correlates and

pp. 1-81, 1958.

101, pp. 466-481, 1997.

Comparison with Female Data, J. Acoust. Soc.Am. vol.[5]H. M. Hanson, and K. N. Stevens, Modelling Stop-Consonant Releases for Synthesis, J. Acoust. Soc. Am.vol. 107, pp. 2907,2000.[6] H. M. Hanson, and K. N. Stevens, A Quasi-Articulatory Approach to Controlling Acoustic SourceParameters in a Klatt-Type Formant Synthesizer UsingHlsyn, J. Acoust. Soc. Am., in press.[7] J. N. Holmes, Formant Synthesizers: Cascade orParallel? Speech Communication vol. 2, pp. 251-273,1983.[8] A. S. House, and G. Fairbanks, The Influence ofConsonant Environment Upon the Secondary AcousticalCharacteristics of Vowels, J. Acoust. Soc. Am. vol. 25,[9] S. J. Keyser, and K. N. Stevens, EnhancementRevisited. In M. Kenstowicz (ed.), Ken Hale: A Life inLanguage, pp. 271-291, CambridgeMA: MIT Press,2001.[101 D. H. Klatt, Software for a Cascade/ParallelFormant Synthesizer, J. Acoust. Soc. Am. vol. 67, pp.[111 D.H. Klatt, Review of text-to-speech conversion forEnglish, J. Acoust. Soc.Am.vol. 82, pp. 737-793, 1987.[121 D. H. Klatt, and L. C. Klatt, Analysis, Synthesis,and Perception of Voice Quality Variations among Femaleand Male Talkers, J. Acoust. Soc.Am.vol. 87, pp. 820-857,1990.[131S. Y. Manuel, Speakers Nasalize /a/ ARer /n/ ,butListeners Still Hear /a/, J. Phonetics vol. 23, pp. 453-476,1995.[141S.Y. Manuel, and K. N. Stevens, FormantTransitions: Teasing Apart Consonant and VowelContributions, Proc. ICPhS-95, vol. 4, pp. 436-439,1995.[151 R. N. Ohde, Fundamental Frequency as an AcousticCorrelate of Stop Consonant Voicing, J. Acoust. Soc.[161J. Slifka, Respiratory Constraints on SpeechProduction at Prosodic Boundaries, Ph.D. thesis,Harvard-MIT Division of Health Sciences andTechnology, MIT, 2000.[171K. N. Stevens, Prosodic Influences on GlottalWaveform: Preliminary Data, Proc. InternationalSymposium on Prosody, Yokohama, Japan, pp. 53-64,1994.[181K. N. Stevens, Acoustic Phonetics, CambridgeMA: MIT Press, 1998.[191 K. N. Stevens, and C. A. Bickley, Constraintsamong Parameters Simplify Control of matt FormantSynthesizer, J. Phonetics vol. 19, pp. 161-174, 1991.

106, pp. 1064-1077,1999.

pp. 105-113,1953.

971-995, 1980.

Am.vol. 75, pp. 224-230, 1984.

72

Toward Formant Synthesis With Articulatory Controls

Documents

Transcript of Toward Formant Synthesis With Articulatory Controls