Voice Quality + Spectral Analysis Feburary 15, 2011.

Voice Quality + Spectral Analysis

Feburary 15, 2011

Today• Today:

• Wrap up voice quality discussion

• Begin examination of spectral analysis

1. On the Tuesday after the break: back in the computer lab (SS 020).

• Analysis of Korean stops.

2. Remember:

• mid-term on Thursday

• Review sheet to be passed out today, once we wrap up voice quality…

3. Also note: the last TOBI homework

1. Modal Voice Settings• At the low end of a speaker’s F0 range:

1. Adductive tension force is moderate

2. Medial compression force is moderate

3. Vocal folds are short and thick.

• = longitudinal tension is low

4. Moderate airflow

• F0 is increased by:

1. Increasing the longitudinal tension

activity of the cricothyroid muscle

2. Increasing airflow

For the Record• Contraction of the cricothyroid muscle pulls down the thyroid cartilage.

• Interestingly: researchers often study the activity of this muscle using EMG.

A Little More Hardcore• Increasing Medial Compression of the vocal folds can create tense voice.

• Remember the Mpi contrasts:

• Also check out the Steve Sklar video

• Increasing Medial Compression even further can induce ventricular voice

• …in which the ventricular folds vibrate along with the (true) vocal folds.

• (go back to the video + endoscopy evidence)

• Finally, amping up the intensity of all the laryngeal forces results in harsh voice.

• Compare with: “death metal voice”

2. Creaky Voice• A voice quality that is somewhat similar to ventricular

voice is creaky voice.

• Also known as “glottal fry”

• Laryngeal settings for creaky voice:

1. Ventricular folds often compressed down on true vocal folds.

2. High medial compression

3. Very little longitudinal tension

4. Low airflow

• Air bubbles up sporadically through the folds, near the thyroid arch.

Creaky EGG

• Note: vocal folds are very short during creaky voicing.

• Look at the creaky video.

Creaky Quirks• Note: creaky voice often emerges at the low end of a speaker’s range.

• In a language like English, at the ends of utterances

• In a tone language, for very low tones.

• Note: creaky voice also often has a “double pulse” effect.

Modal to Creaky

[ ]

Jitter• Creaky voice often exhibits a lot of jitter and shimmer.

• Jitter =

• Variation in timing of glottal pulses

• Defined as a percentage:

• period deviation/period duration.

Shimmer• Shimmer =

• Variation in amplitude of glottal pulses

• Note: synthetic speech has to include jitter and shimmer

• …otherwise the voice won’t sound natural.

• Check out the “voice report” measures out in Praat.

Harsh Voice• A “raucous voice quality” (Holmes, 1932)

• Acoustically: fundamental frequency is aperiodic

• = lots of jitter (variability in time)

• Articulatorily: harsh voice does not add anything new to the voice quality parameters;

• it just increases the intensity of those already in operation.

• Harsh voice “excessive approximation of the vocal folds”

• = high medial compression and high adductive tension

Harsh, continued• “Harshness results from overtensions in the throat and neck; it is often if not usually accompanied by hypertensions of the whole body.” (Gray and Wise, 1959)

• Harsh F0 is usually > 100 Hz

• Creaky F0 is usually < 100 Hz

3. Breathy Voice• In breathy voice, the vocal folds remain open…

• and “wave” in the airflow coming up from the lungs.

• Laryngeal settings for breathy voice:

1. Low medial compression

2. Minimal adductive tension

3. Variable longitudinal tension (for F0 control)

4. Higher airflow

• Check out the breathy video.

Breathy Voice EGG

• Also note: closure phases in breathy voice are more symmetrical than in modal voice.

Some Real-Life Examples

breathy

modal

Contrasts• Gujarati contrasts breathy voiced vowels with modal voiced vowels:

• Hausa contrasts modal [j] with creaky/tense [j]:

• Hausa is spoken in West Africa (primarily in Nigeria)

• Creaky consonants are also said to be laryngealized.

All Three• Jalapa Mazatec has a three-way contrast between modal, breathy and creaky voiced vowels:

• Jalapa Mazatec is spoken in southern Mexico, around Oaxaca and Veracruz.

Voiced Aspirated• Some languages distinguish between (breathy) voiced aspirated and voiceless aspirated stops and affricates.

• Check out Hindi:

One Random Thing• Breathy voiced segments can “depress” the tone on a following segment.

• Examples from Tsonga:• Tsonga is spoken in South Africa and Mozambique.

• Voiced stops also “depress” tones more than voiceless stops.

• depressor consonants

• Nobody really knows why.

Open Quotient• From EGG measures, we can calculate the “open quotient” for any particular voicing cycle =

time glottis is open

period of voicing cycle

• EGG measures show that there are reliable differences in open quotient values between the three primary voicing types.

• Breathy voicing has a high open quotient

• Creaky voicing has a low open quotient

• Modal voicing is in between

Open Quotient Traces

one period

open phase

• The open quotient in modal voicing is generally around 0.5

Tense Voice

• Tense voice (from throat singing demo) has a lower open quotient.

• Result of medial compression.

• Actual value: about 0.3

one period open phase

OQ Traces, continued

• OQ for creaky voice is also supposed to be low…

• but it’s actually quite sporadic.

• Breathy voice OQ is quite high

• (0.65 or greater)

4. Whispery Voice• When we whisper:

• The cartilaginous glottis remains open, but the ligamental glottis is closed.

• Air flow through opening with a “hiss”

• The laryngeal settings:

1. Little or no adductive tension

2. Moderate to high medial compression

3. Moderate airflow

4. Longitudinal tension is irrelevant…

Nodules• One of the more common voice disorders is the development of nodules on either or both of the vocal folds.

• nodule = callous-like bump

• What effect might this have on voice quality?

Last but not least• What’s going on here?

• At some point, my voice changes from modal to falsetto.

5. Falsetto• The laryngeal specifications for falsetto:

1. High longitudinal tension

2. High adductive tension

3. High medial compression

• Contraction of thyroarytenoids

4. Lower airflow than in modal voicing

• The results:

• Very high F0.

• Very thin area of contact between vocal folds.

• Air often escapes through the vocal folds.

Falsetto EGG

• The falsetto voice waveform is considerably more sinusoidal than modal voice.

Some Real EGGs

Modal voice (F0 = 140 Hz)

Falsetto voice (F0 = 372 Hz)

Voice Quality SummaryAT LT MC Flow

Modal moderate varies moderate med.

Tense high varies high high

Creaky high low high low

Whisper low N/A high med.

Breathy low varies low high

Falsetto high high high low

• Last but not least, Korean makes an interesting distinction between “emphatic” (or fortis) obstruents and unaspirated and aspirated (lenis) obstruents.

What’s going on here?• A variety of things occur during the articulation of fortis

consonants in Korean.

1. Glottis is not open as wide (during closure) as in lenis stops.

Voicing begins more quickly after stop release

2. Increased airflow in fortis stops.

Higher F0 after stop release.

3. Vocal folds are “more tense” than in lenis stops.

• = greater medial compression

• = “squarer” glottal waveform

Back to the Source…• Modal voicing (by me):

• Note: completely closed and completely open phases are both actually quite short.

• Also: closure slope is greater than opening slope.

• Q: Why might there be differences in slope?

A Different Kind of Voicing• The basic voice quality in khoomei is called xorekteer.

• Notice any differences in the EGG waveforms?

• This voice quality requires greater medial compression of the vocal folds.

• ...and also greater airflow

Why Should You Care?• Remember that the most basic kind of sound wave is a sinewave.

time

pressure

• Sinewaves can be defined by three basic properties:

• Frequency, (peak) amplitude, phase

Complex Waves• It is possible to combine more than one sinewave together into a complex wave.

• At any given time, each wave will have some amplitude value.

• A1(t1) := Amplitude value of sinewave 1 at time 1

• A2(t1) := Amplitude value of sinewave 2 at time 1

• The amplitude value of the complex wave is the sum of these values.

• Ac(t1) = A1 (t1) + A2 (t1)

• Note: a harmonic is simply a component sinewave of a complex wave.

Complex Wave Example• Take waveform 1:

• high amplitude

• low frequency

• Add waveform 2:

• low amplitude

• high frequency

• The sum is this complex waveform:

+

=

Another Perspective• Sinewaves can also be represented by their power spectra.

• Frequency on the x-axis

• Intensity on the y-axis (related to peak amplitude)

Waveform Power Spectrum

Putting the two togetherWaveform Power Spectrum

+ +

= =

harmonics

More Combinations

• What happens if we keep adding more and more high frequency components to the sum?

+ =

+ =

A Spectral ComparisonWaveform Power Spectrum

What’s the Point?• Remember our EGG waveforms for the different

kinds of voice qualities:

• The glottal waveform for tense voice resembles a square wave.

• lots of high frequency components (harmonics)

What’s the point, part 2• A modal voicing EGG looks like:

• It is less square and therefore has less high frequency components.

• Although it is far from sinusoidal...

What’s the point, part 3• Breathy and falsetto voice are more sinusoidal...

• And therefore the high frequency harmonics have less power, compared to the fundamental frequency.

Let’s Check ‘em out• Head over to Praat and check out the power spectra of:

• a sinewave

• a square wave

• a sawtooth wave

• tense voice

• modal voice

• creaky voice

• breathy voice

Spectral Tilt

• Spectral tilt = drop-off in intensity of higher harmonics, compared to the intensity of the fundamental.

Voice Quality + Spectral Analysis Feburary 15, 2011.

Documents

Transcript of Voice Quality + Spectral Analysis Feburary 15, 2011.