Composite Vocal Tract Model

5
1 2.5 MATLAB Exercise – Composite Vocal Tract Model Program Directory: matlab_gui\composite_vocal_tract Program Name: composite_vocal_tract_GUI25.m GUI data file: composite_vocal_tract.mat Callbacks file: Callbacks_composite_vocal_tract_GUI25.m TADSP: Section 5.2.4, pp. 226-227, Figure 5.47 This MATLAB exercise illustrates the role of each of the three components of the speech model for voiced and for unvoiced speech sounds, namely the vocal tract response, the glottal pulse (for voiced sounds), and the radiation response at the lips. Composite Vocal Tract Model – Theory of Operation Given a vowel sound, specified by a set of M =3 (in this exercise) variable resonances and one fixed resonance at 4500 Hz (both center frequencies and bandwidths of the resonances are given), along with a fixed radiation characteristic, and a fixed glottal pulse, this MATLAB exercise computes and plots (on a common graphics panel), for a given vowel sound, the log magnitude frequency responses of the vocal tract (as specified by the set of M variable and one fixed resonances), the radiation characteristic, and the glottal pulse, along with the composite response of all three components. For an unvoiced vowel sound (a whispered vowel) the exercise computes and plots (on a common graphics panel) the log magnitude response of the vocal tract along with the log magnitude radiation characteristic. For the radiation characteristics, this exercise uses a simple first order system with response: R(z)=1 - α · z -1 (1) For the glottal pulse characteristic, this exercise uses a Rosenberg glottal pulse approximation with a rise duration of 6% of the pitch period and a decay duration of 3% of the pitch period. The exercise also solves for the impulse response of the composite system and plots it on the upper graphics panel. The vowel spectrum is computed as a cascade of M =3 variable and one fixed second order systems (each one representing one of the M formants of the designated vowel sound) of the form: H v (z)= M Y k=1 1 - 2e -2πB k /Fs · cos(2πF k /F s )+ e -4πB k /Fs 1 - 2e -2πB k /Fs · cos(2πF k /F s )z -1 + e -4πB k /Fs z -2 (2) where M is 3 for this exercise, F k ,k =1: M is the set of vowel formant center frequencies, B k ,k =1: M is the set of vowel formant bandwidths, and F s is the sampling frequency of the system (10000 Hz is nominally chosen as the sampling rate for this exercise). Composite Vocal Tract Model – GUI Design The GUI for this exercise consists of two panels, 4 graphics panels, 1 title box and 11 buttons. The functionality of the two panels is: 1. one panel for the graphics display, 2. one panel for parameters related to the composite vocal tract model, and for running the program. The set of four graphics panels is used to display the following: 1. the impulse response of the vowel selected for this exercise from a set of 10 standard vowel sounds (top grphics panel),

description

Given a vowel sound, specified by a set ofM = 3 (in this exercise) variable resonances and one fixed resonance at 4500Hz (both center frequencies and bandwidths of the resonances are given), along with a fixed radiation characteristic,and a fixed glottal pulse, this MATLAB exercise computes and plots (on a common graphics panel), for a givenvowel sound, the log magnitude frequency responses of the vocal tract (as specified by the set of M variable andone fixed resonances), the radiation characteristic, and the glottal pulse, along with the composite response of allthree components. For an unvoiced vowel sound (a whispered vowel) the exercise computes and plots (on a commongraphics panel) the log magnitude response of the vocal tract along with the log magnitude radiation characteristic

Transcript of Composite Vocal Tract Model

  • 12.5 MATLAB Exercise Composite Vocal Tract Model

    Program Directory: matlab_gui\composite_vocal_tractProgram Name: composite_vocal_tract_GUI25.mGUI data file: composite_vocal_tract.matCallbacks file: Callbacks_composite_vocal_tract_GUI25.mTADSP: Section 5.2.4, pp. 226-227, Figure 5.47

    This MATLAB exercise illustrates the role of each of the three components of the speech model for voiced andfor unvoiced speech sounds, namely the vocal tract response, the glottal pulse (for voiced sounds), and the radiationresponse at the lips.

    Composite Vocal Tract Model Theory of Operation

    Given a vowel sound, specified by a set ofM = 3 (in this exercise) variable resonances and one fixed resonance at 4500Hz (both center frequencies and bandwidths of the resonances are given), along with a fixed radiation characteristic,and a fixed glottal pulse, this MATLAB exercise computes and plots (on a common graphics panel), for a givenvowel sound, the log magnitude frequency responses of the vocal tract (as specified by the set of M variable andone fixed resonances), the radiation characteristic, and the glottal pulse, along with the composite response of allthree components. For an unvoiced vowel sound (a whispered vowel) the exercise computes and plots (on a commongraphics panel) the log magnitude response of the vocal tract along with the log magnitude radiation characteristic.

    For the radiation characteristics, this exercise uses a simple first order system with response:

    R(z) = 1 z1 (1)For the glottal pulse characteristic, this exercise uses a Rosenberg glottal pulse approximation with a rise durationof 6% of the pitch period and a decay duration of 3% of the pitch period. The exercise also solves for the impulseresponse of the composite system and plots it on the upper graphics panel.The vowel spectrum is computed as a cascade of M = 3 variable and one fixed second order systems (each onerepresenting one of the M formants of the designated vowel sound) of the form:

    Hv(z) =

    Mk=1

    1 2e2piBk/Fs cos(2piFk/Fs) + e4piBk/Fs1 2e2piBk/Fs cos(2piFk/Fs)z1 + e4piBk/Fsz2 (2)

    where M is 3 for this exercise, Fk, k = 1 :M is the set of vowel formant center frequencies, Bk, k = 1 :M is the setof vowel formant bandwidths, and Fs is the sampling frequency of the system (10000 Hz is nominally chosen as thesampling rate for this exercise).

    Composite Vocal Tract Model GUI Design

    The GUI for this exercise consists of two panels, 4 graphics panels, 1 title box and 11 buttons. The functionality ofthe two panels is:

    1. one panel for the graphics display,

    2. one panel for parameters related to the composite vocal tract model, and for running the program.

    The set of four graphics panels is used to display the following:

    1. the impulse response of the vowel selected for this exercise from a set of 10 standard vowel sounds (top grphicspanel),

  • 22. the log magnitude frequency response of the Rosenberg glottal pulse, the selected vowel sound, and the radiationfilter, along with the composite vocal tract log magnitude frequency response for voiced vowel sounds. Forwhispered vowel sounds, this exercise plots the log magnitude frequency response of the selected vowel sound,along with the log magnitude frequency response of the radiation characteristic, and the composite response forunvoiced vowel sounds (second graphics panel).

    3. a wideband spectrogram display of the composite vowel sound, using either voiced or unvoiced excitation (thirdgraphics panel),

    4. a narrowband spectrogram display of the composite vowel sound, using either voiced or unvoiced excitation(fourth graphics panel).

    The title box displays the information about the selected vowel sound, the sampling rate and the period of the excitationpulses (in msec). The functionality of the 11 buttons is:

    1. a popupmenu button that allows the user to select the vowel sound (from a set of 10 vowel sounds) which is tobe synthesized using the glottal pulse stream and the vowel formants,

    2. an editable button that specifies the period of the voiced vowel excitation pulses in msec; (the default value is10 msec),

    3. an editable button that specifies the sampling rate of the simulation in samples per second (the default value isFs = 10000 samples per second),

    4. a popupmenu button that allows the user to select the vowel sound excitation as either periodic Glottal Pulses(for voiced vowel sounds), or Gaussian noise excitation (for whispered vowel sounds),

    5. an editable button that specifies the opening cycle of the glottal pulse as a percentage of the pitch period of thesound,

    6. an editable button that specifies the closing cycle of the glottal pulse as a percentage of the pitch period of thesound,

    7. a pushbutton to compute and plot the log magnitude response of the three individual vowel sound components(for voiced vowels) and the two individual vowel sound components (for whispered vowels), along with a plotof the composite log magnitude frequency response of the selected vowel sound (either voiced or whispered),

    8. a pushbutton to play, in sequence, the pitch pulse sequence, the excitation function obtained from the convolutionof the glottal pulse with the pitch pulse sequence, and the resulting vowel sound obtained by convolving thevowel impulse response with the excitation sequence (for voiced vowel sounds), or the noise excitation functionand the composite whispered vowel sound,

    9. an editable button that specifies the spectrogram dynamic range of plotting; the dynamic range parameter,dyn_range, has a default of 60 dB,

    10. a pushbutton to plot wideband and narrowband spectrograms for either the voiced vowel or the whispered vowelon separate graphics panels,

    11. a pushbutton to close the GUI.

    Composite Vocal Tract Model Scripted RunA scripted run of the program composite vocal tract GUI25.m is as follows:

    1. run the program composite vocal tract GUI25.m from the directory matlab gui\composite vocal tract,

    2. using the popupmenu button, select the vowel sound IY for this exercise, where the choice of vowels includethe 10 vowel sounds [IY, IH, EH, AE, AH, AA, AO, UH, UW, ER],

  • 33. using the editable buttons, choose a period, iperiod, and a sampling rate, fsd, for the exercise; the defaultvalues are 10 msec for iperiod and 10000 samples per second for fsd,

    4. using the popupmenu button, select the excitation for the vowel sound, namely either Glottal Pulses (for voicedvowels) or White Noise (for whispered vowels),

    5. using the editable buttons, choose values for the glottal pulse opening and closing cycles for voiced vowels,specified as percentages of the pitch period; the defaults of 6% opening cycle, and 3% closing cycle representvery narrow glottal pulses, which produce excellent quality vowel sounds and useful vowel spectrograms,

    6. using the editable button for spectrogram dynamic range, dyn range, choose a desired value; 60 dB is a fairlygeneral range for most vowel sounds,

    7. hit the Plot Vowel Spectrogram button to plot the components of the voiced vowel spectrum, namely the glottalpulse spectrum, the vocal tract response, the radiation response, and the composite spectrum for voiced vowelsounds; similarly, for whispered vowel sounds, hitting this button plots the components of the whispered vowelspectrum, namely the vocal tract response, the radiation response, and the composite spectrum for whisperedvowel sounds,

    8. hit the Play Vowel Sound button to play out a fixed number of samples of the impulse excitation, followed bythe selected vowel sound excited by the glottal pulse excitation (for voiced vowel sounds), or the noise excitationand the whispered vowel (for whispered vowel sounds); hitting the Play Vowel Sound causes a message box topop up and specify the sequence of sounds to be played either for voiced vowel sounds or for whispered speech,

    9. experiment with different vowel sounds, pitch periods, sampling rates, and excitations to see and hear the impacton the resulting vowel sound,

    10. hit the Close GUI button to terminate the run.

    An example of the graphical output obtained from this exercise using the vowel sound /IY/ with an excitation pulseperiod of 10 msec, and a sampling rate of fsd=10000 samples per second with Glottal Pulses excitation is shown inFigure 1 and with White Noise excitation is shown in Figure 2. The upper graphics panel shows the vowel impulseresponse and the second graphics panel shows the log magnitude frequency response of each of the three componentsof the vowel response (for voiced vowel sounds), and for each of the two components of the vowel response (forwhispered vowel sounds), along with the composite response of the vocal tract. The third graphics panel shows awideband spectrogram of the vowel sound, and the fourth graphics panel shows a narrowband spectrogram of thevowel sound.

    Composite Vocal Tract Model Issues for Experimentation1. run the scripted exercise above, and answer the following:

    how long (in msec) is the impulse response of the synthetic vowel used for this exercise (IY in this case)?

    what is the pitch frequency, in Hz, for this exercise?

    how much improvement in sound quality is obtained by using the Rosenberg glottal pulse, the radiationresponse curve at the lips, and the vocal tract response, as obtained from a 3-formant representation ofeach of the 10 vowel sounds?

    2. change the vowel sound to any other vowel in the list and repeat the exercise with the new vowel sound

    is there a uniformity of the vowel sound quality across the different vowel sounds?

    does any of the vowels sound much better than the others?

    does any of the vowels sound much worse than the others?

  • 4Figure 1: Voiced vowel sound impulse response (upper graphics panel) and log magnitude frequency responses ofvocal tract response components (second graphics panel). Plots of wideband and narrowband spectrograms, for thevoiced vowel sound, are shown in the third and fourth graphics panels.

    3. change either the pitch pulse period or the sampling rate

    what effect does changing the pitch period have on the resulting vowel quality? what effect does changing the sampling rate have on the resulting vowel quality?

    4. using the IY vowel sequence, answer the following questions:

    the radiation frequency response and the vocal tract frequency response for the IY vowel emphasize highfrequencies. Why is the overall composite frequency response lower at high frequencies than at low fre-quencies?

    what feature(s) of the composite frequency response account for the rapidly varying components of theimpulse response?

    note that there are two small bumps in the frequency response at very low frequencies. Which componentcauses each of these bumps?

    5. using the IH vowel sequence, answer the following questions:

    compare the components of the vowels IH and IY. How have the formant frequencies changed from IY toIH?

    why does the impulse response of IH have more pronounced rapidly varying components? note that the frequency response of IH has two distinct peaks at low frequencies. The lowest frequency

    peak is called the voice bar for the way it would show up in a spectrogram. What part of the modelcauses the voice bar peak in the composite frequency response?

    what requirement on the first formant frequency is necessary for there to be a clearly distinguished voicebar peak?

  • 5Figure 2: Whispered vowel sound impulse response (upper graphics panel) and log magnitude frequency responsesof vocal tract response components (second graphics panel). Plots of wideband and narrowband spectrograms, for thewhispered vowel sound, are shown in the third and fourth graphics panels.