Voice Quality + Spectral Analysis Feburary 15, 2011.

47
Voice Quality + Spectral Analysis Feburary 15, 2011

Transcript of Voice Quality + Spectral Analysis Feburary 15, 2011.

Page 1: Voice Quality + Spectral Analysis Feburary 15, 2011.

Voice Quality + Spectral Analysis

Feburary 15, 2011

Page 2: Voice Quality + Spectral Analysis Feburary 15, 2011.

Today• Today:

• Wrap up voice quality discussion

• Begin examination of spectral analysis

1. On the Tuesday after the break: back in the computer lab (SS 020).

• Analysis of Korean stops.

2. Remember:

• mid-term on Thursday

• Review sheet to be passed out today, once we wrap up voice quality…

3. Also note: the last TOBI homework

Page 3: Voice Quality + Spectral Analysis Feburary 15, 2011.

1. Modal Voice Settings• At the low end of a speaker’s F0 range:

1. Adductive tension force is moderate

2. Medial compression force is moderate

3. Vocal folds are short and thick.

• = longitudinal tension is low

4. Moderate airflow

• F0 is increased by:

1. Increasing the longitudinal tension

activity of the cricothyroid muscle

2. Increasing airflow

Page 4: Voice Quality + Spectral Analysis Feburary 15, 2011.

For the Record• Contraction of the cricothyroid muscle pulls down the thyroid cartilage.

• Interestingly: researchers often study the activity of this muscle using EMG.

Page 5: Voice Quality + Spectral Analysis Feburary 15, 2011.

A Little More Hardcore• Increasing Medial Compression of the vocal folds can create tense voice.

• Remember the Mpi contrasts:

• Also check out the Steve Sklar video

• Increasing Medial Compression even further can induce ventricular voice

• …in which the ventricular folds vibrate along with the (true) vocal folds.

• (go back to the video + endoscopy evidence)

• Finally, amping up the intensity of all the laryngeal forces results in harsh voice.

• Compare with: “death metal voice”

Page 6: Voice Quality + Spectral Analysis Feburary 15, 2011.

2. Creaky Voice• A voice quality that is somewhat similar to ventricular

voice is creaky voice.

• Also known as “glottal fry”

• Laryngeal settings for creaky voice:

1. Ventricular folds often compressed down on true vocal folds.

2. High medial compression

3. Very little longitudinal tension

4. Low airflow

• Air bubbles up sporadically through the folds, near the thyroid arch.

Page 7: Voice Quality + Spectral Analysis Feburary 15, 2011.

Creaky EGG

• Note: vocal folds are very short during creaky voicing.

• Look at the creaky video.

Page 8: Voice Quality + Spectral Analysis Feburary 15, 2011.

Creaky Quirks• Note: creaky voice often emerges at the low end of a speaker’s range.

• In a language like English, at the ends of utterances

• In a tone language, for very low tones.

• Note: creaky voice also often has a “double pulse” effect.

Page 9: Voice Quality + Spectral Analysis Feburary 15, 2011.

Modal to Creaky

[ ]

Page 10: Voice Quality + Spectral Analysis Feburary 15, 2011.

Jitter• Creaky voice often exhibits a lot of jitter and shimmer.

• Jitter =

• Variation in timing of glottal pulses

• Defined as a percentage:

• period deviation/period duration.

Page 11: Voice Quality + Spectral Analysis Feburary 15, 2011.

Shimmer• Shimmer =

• Variation in amplitude of glottal pulses

• Note: synthetic speech has to include jitter and shimmer

• …otherwise the voice won’t sound natural.

• Check out the “voice report” measures out in Praat.

Page 12: Voice Quality + Spectral Analysis Feburary 15, 2011.

Harsh Voice• A “raucous voice quality” (Holmes, 1932)

• Acoustically: fundamental frequency is aperiodic

• = lots of jitter (variability in time)

• Articulatorily: harsh voice does not add anything new to the voice quality parameters;

• it just increases the intensity of those already in operation.

• Harsh voice “excessive approximation of the vocal folds”

• = high medial compression and high adductive tension

Page 13: Voice Quality + Spectral Analysis Feburary 15, 2011.

Harsh, continued• “Harshness results from overtensions in the throat and neck; it is often if not usually accompanied by hypertensions of the whole body.” (Gray and Wise, 1959)

• Harsh F0 is usually > 100 Hz

• Creaky F0 is usually < 100 Hz

Page 14: Voice Quality + Spectral Analysis Feburary 15, 2011.

3. Breathy Voice• In breathy voice, the vocal folds remain open…

• and “wave” in the airflow coming up from the lungs.

• Laryngeal settings for breathy voice:

1. Low medial compression

2. Minimal adductive tension

3. Variable longitudinal tension (for F0 control)

4. Higher airflow

• Check out the breathy video.

Page 15: Voice Quality + Spectral Analysis Feburary 15, 2011.

Breathy Voice EGG

• Also note: closure phases in breathy voice are more symmetrical than in modal voice.

Page 16: Voice Quality + Spectral Analysis Feburary 15, 2011.

Some Real-Life Examples

breathy

modal

Page 17: Voice Quality + Spectral Analysis Feburary 15, 2011.

Contrasts• Gujarati contrasts breathy voiced vowels with modal voiced vowels:

• Hausa contrasts modal [j] with creaky/tense [j]:

• Hausa is spoken in West Africa (primarily in Nigeria)

• Creaky consonants are also said to be laryngealized.

Page 18: Voice Quality + Spectral Analysis Feburary 15, 2011.

All Three• Jalapa Mazatec has a three-way contrast between modal, breathy and creaky voiced vowels:

• Jalapa Mazatec is spoken in southern Mexico, around Oaxaca and Veracruz.

Page 19: Voice Quality + Spectral Analysis Feburary 15, 2011.

Voiced Aspirated• Some languages distinguish between (breathy) voiced aspirated and voiceless aspirated stops and affricates.

• Check out Hindi:

Page 20: Voice Quality + Spectral Analysis Feburary 15, 2011.

One Random Thing• Breathy voiced segments can “depress” the tone on a following segment.

• Examples from Tsonga:• Tsonga is spoken in South Africa and Mozambique.

• Voiced stops also “depress” tones more than voiceless stops.

• depressor consonants

• Nobody really knows why.

Page 21: Voice Quality + Spectral Analysis Feburary 15, 2011.

Open Quotient• From EGG measures, we can calculate the “open quotient” for any particular voicing cycle =

time glottis is open

period of voicing cycle

• EGG measures show that there are reliable differences in open quotient values between the three primary voicing types.

• Breathy voicing has a high open quotient

• Creaky voicing has a low open quotient

• Modal voicing is in between

Page 22: Voice Quality + Spectral Analysis Feburary 15, 2011.

Open Quotient Traces

one period

open phase

• The open quotient in modal voicing is generally around 0.5

Page 23: Voice Quality + Spectral Analysis Feburary 15, 2011.

Tense Voice

• Tense voice (from throat singing demo) has a lower open quotient.

• Result of medial compression.

• Actual value: about 0.3

one period open phase

Page 24: Voice Quality + Spectral Analysis Feburary 15, 2011.

OQ Traces, continued

• OQ for creaky voice is also supposed to be low…

• but it’s actually quite sporadic.

• Breathy voice OQ is quite high

• (0.65 or greater)

Page 25: Voice Quality + Spectral Analysis Feburary 15, 2011.

4. Whispery Voice• When we whisper:

• The cartilaginous glottis remains open, but the ligamental glottis is closed.

• Air flow through opening with a “hiss”

• The laryngeal settings:

1. Little or no adductive tension

2. Moderate to high medial compression

3. Moderate airflow

4. Longitudinal tension is irrelevant…

Page 26: Voice Quality + Spectral Analysis Feburary 15, 2011.

Nodules• One of the more common voice disorders is the development of nodules on either or both of the vocal folds.

• nodule = callous-like bump

• What effect might this have on voice quality?

Page 27: Voice Quality + Spectral Analysis Feburary 15, 2011.

Last but not least• What’s going on here?

• At some point, my voice changes from modal to falsetto.

Page 28: Voice Quality + Spectral Analysis Feburary 15, 2011.

5. Falsetto• The laryngeal specifications for falsetto:

1. High longitudinal tension

2. High adductive tension

3. High medial compression

• Contraction of thyroarytenoids

4. Lower airflow than in modal voicing

• The results:

• Very high F0.

• Very thin area of contact between vocal folds.

• Air often escapes through the vocal folds.

Page 29: Voice Quality + Spectral Analysis Feburary 15, 2011.

Falsetto EGG

• The falsetto voice waveform is considerably more sinusoidal than modal voice.

Page 30: Voice Quality + Spectral Analysis Feburary 15, 2011.

Some Real EGGs

Modal voice (F0 = 140 Hz)

Falsetto voice (F0 = 372 Hz)

Page 31: Voice Quality + Spectral Analysis Feburary 15, 2011.

Voice Quality SummaryAT LT MC Flow

Modal moderate varies moderate med.

Tense high varies high high

Creaky high low high low

Whisper low N/A high med.

Breathy low varies low high

Falsetto high high high low

Page 32: Voice Quality + Spectral Analysis Feburary 15, 2011.

• Last but not least, Korean makes an interesting distinction between “emphatic” (or fortis) obstruents and unaspirated and aspirated (lenis) obstruents.

Page 33: Voice Quality + Spectral Analysis Feburary 15, 2011.

What’s going on here?• A variety of things occur during the articulation of fortis

consonants in Korean.

1. Glottis is not open as wide (during closure) as in lenis stops.

Voicing begins more quickly after stop release

2. Increased airflow in fortis stops.

Higher F0 after stop release.

3. Vocal folds are “more tense” than in lenis stops.

• = greater medial compression

• = “squarer” glottal waveform

Page 34: Voice Quality + Spectral Analysis Feburary 15, 2011.

Back to the Source…• Modal voicing (by me):

• Note: completely closed and completely open phases are both actually quite short.

• Also: closure slope is greater than opening slope.

• Q: Why might there be differences in slope?

Page 35: Voice Quality + Spectral Analysis Feburary 15, 2011.

A Different Kind of Voicing• The basic voice quality in khoomei is called xorekteer.

• Notice any differences in the EGG waveforms?

• This voice quality requires greater medial compression of the vocal folds.

• ...and also greater airflow

Page 36: Voice Quality + Spectral Analysis Feburary 15, 2011.

Why Should You Care?• Remember that the most basic kind of sound wave is a sinewave.

time

pressure

• Sinewaves can be defined by three basic properties:

• Frequency, (peak) amplitude, phase

Page 37: Voice Quality + Spectral Analysis Feburary 15, 2011.

Complex Waves• It is possible to combine more than one sinewave together into a complex wave.

• At any given time, each wave will have some amplitude value.

• A1(t1) := Amplitude value of sinewave 1 at time 1

• A2(t1) := Amplitude value of sinewave 2 at time 1

• The amplitude value of the complex wave is the sum of these values.

• Ac(t1) = A1 (t1) + A2 (t1)

• Note: a harmonic is simply a component sinewave of a complex wave.

Page 38: Voice Quality + Spectral Analysis Feburary 15, 2011.

Complex Wave Example• Take waveform 1:

• high amplitude

• low frequency

• Add waveform 2:

• low amplitude

• high frequency

• The sum is this complex waveform:

+

=

Page 39: Voice Quality + Spectral Analysis Feburary 15, 2011.

Another Perspective• Sinewaves can also be represented by their power spectra.

• Frequency on the x-axis

• Intensity on the y-axis (related to peak amplitude)

Waveform Power Spectrum

Page 40: Voice Quality + Spectral Analysis Feburary 15, 2011.

Putting the two togetherWaveform Power Spectrum

+ +

= =

harmonics

Page 41: Voice Quality + Spectral Analysis Feburary 15, 2011.

More Combinations

• What happens if we keep adding more and more high frequency components to the sum?

+ =

+ =

Page 42: Voice Quality + Spectral Analysis Feburary 15, 2011.

A Spectral ComparisonWaveform Power Spectrum

Page 43: Voice Quality + Spectral Analysis Feburary 15, 2011.

What’s the Point?• Remember our EGG waveforms for the different

kinds of voice qualities:

• The glottal waveform for tense voice resembles a square wave.

• lots of high frequency components (harmonics)

Page 44: Voice Quality + Spectral Analysis Feburary 15, 2011.

What’s the point, part 2• A modal voicing EGG looks like:

• It is less square and therefore has less high frequency components.

• Although it is far from sinusoidal...

Page 45: Voice Quality + Spectral Analysis Feburary 15, 2011.

What’s the point, part 3• Breathy and falsetto voice are more sinusoidal...

• And therefore the high frequency harmonics have less power, compared to the fundamental frequency.

Page 46: Voice Quality + Spectral Analysis Feburary 15, 2011.

Let’s Check ‘em out• Head over to Praat and check out the power spectra of:

• a sinewave

• a square wave

• a sawtooth wave

• tense voice

• modal voice

• creaky voice

• breathy voice

Page 47: Voice Quality + Spectral Analysis Feburary 15, 2011.

Spectral Tilt

• Spectral tilt = drop-off in intensity of higher harmonics, compared to the intensity of the fundamental.