Course web page AUD 6306 Speech Science

1

AUD 6306

Speech Science

Dr. Peter Assmann

Spring semester 2019

Course web page

http://www.utdallas.edu/~assmann/aud6306/

• Course information

• Speech demos

• Lecture slides

• Assigned reading material

• Additional resources in speech & hearing

Course materials

No required text

Some recommended books:

Kent, R.D. & Read, C. (2001). The Acoustic

Analysis of Speech. (Singular).

Stevens, K.N. (1999). Acoustic Phonetics

(Current Studies in Linguistics). M.I.T. Press.

Coming soon …

Course requirements

• Class presentations (20%)

• Presentation reports and homework (20%)

• Midterm take-home exam (25%)

• Final take-home exam (35%)

Course theme:

Hearing and speech are closely linked

2

Class presentations and reports

Sign up for two presentation dates.

Pick two broad topics from the field of speech

science (topics must be approved).

Select a suitable (peer-reviewed) paper for each

topic and present a brief (10-15 minute) summary

of the paper to the class to initiate/lead discussion

Prepare a written report (2-3 pages) due about

2-3 weeks after the presentation.


• When you find a paper you’d like to present, email

the citation or the PDF version of the paper to me

for approval. (In some cases I may suggest an

alternative, or more recent paper). I will post the

paper on the readings web page and email the link

to the class.


• Important note: papers must be selected and

approved one week before the presentation to

provide others time to read them.

• Everyone is expected to read the assigned articles

for each class and be prepared to discuss them.

Suggested Topics

Speech acoustics

Vowel production and perception

Consonant production and perception

Suprasegmentals and prosody

Speech perception in noise

Auditory grouping and segregation

Speech perception and hearing loss

Cochlear implants and speech coding

Development of speech perception

Second language acquisition

Audiovisual speech perception

Neural coding of speech

Models of speech perception

PubMed search engine:

http://www.ncbi.nlm.nih.gov/pubmed

Finding papers

3

Journal of the Acoustical Society of America:

http://scitation.aip.org/content/asa/journal/jasa

Finding papers UTD library - online journals

http://www.utdallas.edu/library/resources/journals.html

https://libguides.utdallas.edu/journal-collections

https://libproxy.utdallas.edu/login?url=http://search.eb

scohost.com/login.asp?profile=brain

Speech Science Primate vocal tractThe evolution of speech:

a comparative review

W. Tecumseh Fitch

Trends in Cognitive Sciences 4(7) July 2000

larynx

orangutan chimpanzee humantongue body

larynx

air sac

Vocal motor control

W. Tecumseh Fitch - Annu. Rev. Linguist. 2018. 4:255–279.

https://doi.org/10.1146/annurev-linguistics-011817-045748

Fitch 2018

Primate vocal tractThe evolution of speech:

a comparative review

W. Tecumseh Fitch

Trends in Cognitive Sciences 4(7) July 2000

http://www.utdallas.edu/library/resources/journals.html

https://libguides.utdallas.edu/journal-collections

https://libproxy.utdallas.edu/login?url=http://search.ebscohost.com/login.asp?profile=brain

4

Source-filter theory of speech production

OutputSound

Vocal Tract

VibratingVocal folds

Lungs

Power supply Oscillator Resonator

Human vocal tract

Acoustics

of speech

● Articulation

● Phonation

Organs of speech

• Lungs: apply pressure to generate

air stream (power supply)

• Larynx: air forced through the

glottis, a small opening between the

vocal folds (sound source)

• Vocal tract: pharynx, oral and

nasal cavities serve as complex

resonators (filter)

Source-Filter Theory

From Fitch, W.T. (2000). Trends in Cognitive Sciences

Audio demo: the source signal

• Source signal for an adult male voice

• Source signal for an adult female voice

• Source signal for a 10-year child

5

Vocal fold oscillation

• One-mass model

– Air flow through the

glottis during the closing

phase travels at the

same speed because of

inertia, producing

lowered air pressure

above the glottis.

Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html

Vocal fold oscillation

• Three-Mass Model

– One large mass

(representing the

thyroarytenoid muscle)

and two smaller masses,

M1 and M2 (representing

the vocal fold surface) .

All three masses are

connected by springs and

damping constants.

Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html

Source-Filter Theory: Vowels

G. Fant (1960). Acoustic Theory of Speech Production

Linear systems theory

Assumptions: (1) linearity (2) time-invariance

Vowels can be decomposed into two primary components: a source (input signal) and a filter(modulates the input).

Time domain version:

U( t ) T( t ) R( t ) = P( t )

Frequency domain version:

U( f ) · T( f ) · R( f ) = P( f )

Source-Filter Theory: Vowels

Frequency

Am

pli

tud

e

Source properties

In voiced sounds the glottal source spectrum contains

a series of lines called harmonics.

The lowest one is called the fundamental frequency

(F0).

0 200 400 600 800 1000-50

-40

-30

-20

-10

0

Frequency (Hz)

Rela

tive A

mplit

ude (

dB

)

Frequency (Hz)

F0

Amplitude

Spectrum

Demo: harmonic synthesis

Additive harmonic synthesis: vowel /i/

Cumulative sum of harmonics: vowel /i/

Additive synthesis: “wheel”

Cumulative sum of partials:

6

Filter properties

The vocal tract resonances (called formants) produce

peaks in the spectrum envelope.

Formants are labelled F1, F2, F3, ... in order of

increasing frequency.

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

F1 F2

F3

F4AmplitudeSpectrum

(with superimposedLPC spectral envelope)

source filter radiation =output sound

Frequency

Am

pli

tude

/ i /

/ a /

Source-filter theory

Source: J. Hillenbrand

Robert Mannell, Macquarie Universityhttp://clas.mq.edu.au/phonetics/phonetics/vowelartic/vowel_artic.gif

Source-filter theorySource: J. Hillenbrand



7


/s/ and /sh/ Source-Filter Models Overlaid

Speech terminology…

Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration.

Formants: resonances of the vocal tract.

F0

Formant

Frequency

Amplitude

Source properties: Pitch

Fundamental frequency (F0) is determined by the

rate of vocal fold vibration, and is responsible for the

perceived voice pitch.

Source properties: Pitch

F0 can be removed by filtering (as in telephone circuits)

and the pitch remains the same.

This is the problem of the missing fundamental, one

of the oldest problems in hearing science.

Pitch is determined by the frequency pattern of the

harmonics (or their equivalent in the time domain, the

periodicities in the waveform).

Harmonicity and Periodicity

Period: regularly repeating pattern in the

waveform

Period duration, T0 = 6 ms

waveform


Harmonic: regularly repeating peak in the

amplitude spectrum

0 0.5 1 1.5 2 2.5

-40

-20

0

20

Frequency (kHz)

Am

plit

ud

e (

dB

)

F0 = 1000 / 6 = 166 Hz

F0 = 1 / T0

amplitude

spectrum

8

Harmonics are

integer multiples

of F0 and are

evenly spaced in

frequency


• Period: regularly repeating pattern in

the waveformPeriod duration T0 = 6 ms

0 0.5 1 1.5 2 2.5

-40

-20

0

20

Frequency (kHz)

Am

plit

ud

e (

dB

)

F0 = 1000 / 6 = 166 Hz

F0 = 1 / T0

Waveform

Amplitude

Spectrum

Frequency (kHz)

Harmonic singing

Harmonic singing (also called overtone

singing) involves changing the shape of the

vocal tract to align the resonance frequencies

(formants) with harmonics of the fundamental.

A low, sustained fundamental is produced,

similar to the drone of a bagpipe, along with

flute-like harmonics that drift in and out.

Harmonic singing

Harmonic singing (also called overtone

singing) involves changing the shape of the

vocal tract to align the resonance frequencies

(formants) with harmonics of the fundamental.

A low, sustained fundamental is produced,

similar to the drone of a bagpipe, along with

flute-like harmonics that drift in and out.

Harmonic singing

Tuvan throat singing

http://www.youtube.com/watch?v=DY1pcEtHI_w&feature=youtu.be

Amazing Grace

http://www.youtube.com/watch?v=mO4Uh-Mini4&feature=youtu.be

Overtone singing

https://www.youtube.com/watch?v=UHTF1-IhuC0#t=21

Effects of F0 changes

Source-filter

independence

Effects of formant frequency changes

Source-filter

independence

http://www.youtube.com/watch?v=DY1pcEtHI_w&feature=youtu.be

http://www.youtube.com/watch?v=mO4Uh-Mini4&feature=youtu.be

https://www.youtube.com/watch?v=UHTF1-IhuC0

9

Voicing irregularities

Shimmer: variation in amplitude from one

cycle to the next.

Jitter: variation in frequency (period duration)

from one cycle to the next.

Voicing irregularities

Breathy voice is associated with a glottal

waveform with a steeper roll-off than modal

voice. As a result there is less energy in the

higher harmonics (steeper slope in the

spectrum).

Vocal tract properties

Resonating tube model– approximation for neutral vowel (schwa), [Ə]

– closed at one end (glottis); open at the other (lips)

– uniform cross-sectional area

– curvature is relatively unimportant

Glottis Lips

length, L

[ Ə ]

Uniform tube model (schwa)

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– Fn is the frequency of formant n in Hz

– c is the velocity of sound (about 35000 cm/sec)

– L is the length of the vocal tract (17.5 for adult male)

Vocal tract model


Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz

– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz

– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz

10

Acoustic vowel space

i “heed”

ɑ “hod”

u “who’d”

First formant, F1 frequency (Hz)

Se

co

nd

fo

rma

nt,

F2

fre

qu

en

cy (

Hz)

10008006004002000

0

1000

2000

3000

Ə

Vocal tract model


Fn = ( 2n – 1 ) c / 4 L

– Fn is the frequency of formant n in Hz

– c is the velocity of sound in air (about 35000 cm/sec)

– L is the length of the vocal tract (17.5 for adult male)

L

Vocal tract model


Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) =

– F2 = (2(2) –1)*35000/(4*17.5) =

– F3 = (2(3) –1)*35000/(4*17.5) =

L

Vocal tract model


Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz

– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz

– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz

L

Helium speech

The speed of sound in a helium/oxygen mixture

at 20°C is about 93000 cm/s, compared to

35000 cm/s in air. This increases the resonance

frequencies but has relatively little effect on F0.

In helium speech, the formants are shifted up

but the pitch stays the same.

Helium speech

Exercise: Compute the frequencies of F1, F2

and F3 for a 17.5 cm vocal tract producing the

vowel /ә/ (schwa) in a helium/air mixture

(velocity c ≈ 93000 cm/s)

Fn = ( 2n – 1 ) c / 4 L

F1 = (2(1) –1)*93000/(4*17.5) =

F2 = (2(2) –1)*93000 /(4*17.5) =

F3 = (2(3) –1)*93000 /(4*17.5) =

11

Helium speech

Exercise: Compute the frequencies of F1, F2

and F3 for a 17.5 cm vocal tract producing the

vowel /ә/ (schwa) in a helium/air mixture

(velocity c ≈ 93000 cm/s)

Fn = ( 2n – 1 ) c / 4 L

F1 = (2(1) –1)*93000/(4*17.5) = 1328.6

F2 = (2(2) –1)*93000 /(4*17.5) = 3985.7

F3 = (2(3) –1)*93000 /(4*17.5) = 6642.9

Helium speech

Audio demos

– Speech in air

– Speech in helium

– Pitch in air

– Pitch in helium

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Time (ms)

Fre

quency (

kH

z)

0 100 200 300 400 500 600 700 800 9000

1

2

3

4

Speech in airSpeech in air


Time (ms)

Fre

quency (

kH

z)

0 100 200 300 400 500 600 700 8000

1

2

3

4

Speech in heliumSpeech in helium


Sulfur Hexaflouride

Helium

– density of 0.1786 g/L at sea level

Air


Sulfur Hexaflouride (SF6)


Speech production with vocal tract filled with SF6

http://www.youtube.com/watch?v=d-XbjFn3aqE

Perturbation Theory

The first formant (F1) frequency is lowered by

a constriction in the front half of the vocal tract

(/u/ and /i/), and raised when the constriction is in the back of the vocal tract, as in /u/.

delta

F1

glottis lips

http://www.youtube.com/watch?v=d-XbjFn3aqE

12

Perturbation Theory

The second formant (F2) is lowered by a

constriction near the lips or just above the

pharynx; in /u/ both of these regions are

constricted. F2 is raised when the constriction is

behind the lips and teeth, as in the vowel /i/.

delta

F2

glottis lips

Perturbation Theory

The third formant (F3) is lowered by a

constriction at the lips or at the back of the

mouth or in the upper pharynx. This occurs in

/r/ and /r/-colored vowels like American English /ɚ/ (as in “heard”).

delta

F3

glottis lips

Perturbation Theory

F3 is raised when the constriction is behind

the lips and teeth or near the upper pharynx.

delta

F3

glottis lips

Perturbation Theory

All formants tend to drop in frequency when

the vocal tract length is increased or when a

constriction is formed at the lips.

glottis lips

Perturbation Theory

F1 frequency is correlated with jaw

opening (and inversely related to tongue

height ).

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

amplitude

spectrum

Perturbation Theory

F2 frequency is correlated with tongue

advancement (front-back dimension)

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

amplitude

spectrum

13

Digital representations of signals

time

am

plit

ude

sampling quantization

Spectral analysis

Amplitude spectrum: sound pressure levels

associated with different frequency

components of a signal Power or intensity

Amplitude or magnitude

Log units and decibels (dB)

Phase spectrum: relative phases associated

with different frequency components Degrees or radians

Spectral analysis of speech

Why perform a frequency analyses of speech?

– Ear+brain carry out a form of frequency analysis

– Relevant features of speech are more readily visible

in the amplitude spectrum than in the raw waveform

Spectral analysis of speech

But: the ear is not a spectrum analyzer.

– Auditory frequency selectivity is best at low

frequencies and gets progressively worse at higher

frequencies.

Measuring formants

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /i/ Relative amplitude (d

B)

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /a/ Relative amplitude (d

B)

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /u/

Frequency (Hz)

Relative amplitude (d

B)

Click on lines to hear individual harmonics

Formant frequency peak

estimation requires an

interpolation process.

amplitude

spectrum

14

0 2 4 6-20

0

20

40

60

Frequency (kHz)

Am

plit

ude (

dB

)

Formant Estimation

F1 F2 F3Vowel spectra have peaks

corresponding to the center frequencies of formants

Formants

Spectrum of natural vowel / /

0 0.5 1 1.5 220

30

40

50

60

Frequency (kHz)

Am

plit

ude (

dB

)

Formant Estimation

F1But: harmonics also

generate spectral peaks;formant frequencies do not necessarily coincide with

harmonic frequencies

harmonics

Children’s speech

Children’s voices have high F0s.

When F0 is 400 Hz (not unusual for 3-year

olds), only 4 harmonics appear in the

frequency range between 0-1600 Hz.

Frequency (kHz)

Am

pli

tud

e

0 1.5

Sparce sampling problem

Vowel identity is dependent on the

frequencies of formant peaks.

Formants are difficult to estimate when

fundamental frequency is high.

Am

pli

tud

e

0 1.5

0 0.5 1 1.5 2 2.5 3 3.50

10

20

30

40

50

60

70

Frequency (kHz)

Am

plit

ude (

dB

)

LPC spectrum

FFT spectrum

LPC spectrumFormants sometimes appear to merge

LPC

spectrum

15

Short-term amplitude spectrum

0 1 2 3 4-10

0

10

20

30

40

50

60

Frequency (kHz)

Am

plit

ud

e (

dB

)

F3 = 2755 Hz

F1 = 281 Hz

F2 = 2196 Hz

Speech spectrogram

running amplitude spectra (codes amplitude

changes in different frequency bands over time).

Speech spectrograms

What is a speech spectrogram?

– Display of amplitude spectrum at successive

instants in time ("running spectra")

– How can 3 dimensions be represented on a two-

dimensional display? Gray-scale spectrogram

Waterfall plots

Animation

Speech spectrograms

Why are speech spectrograms useful?

– Shows dynamic properties of speech

– Incorporates frequency analysis

– Related to speech production

– Helps to visually identify speech cues

i “heed”

ɩ “hid”

e “hayed”

ɛ “head”

æ “had”ʌ “hut”

ɑ “hod”

ɔ “hawed”

o “hoed”

ʊ “hood”

u “who’d”

American English vowel space

Advancement

Height

F1

→

← F2

front center back

high

mid

low

Ə “schwa”

16

“The watchdog”

waveform

spectrogram

F3

F2

F10

5

Fre

qu

en

cy (

kH

z)

F3

F2

F1

0

4

Fre

qu

en

cy (

kH

z)

0

TrackDraw: a graphical speech synthesizer

TrackDraw: a graphical speech synthesizer

Peterson and Barney (1952)

Acoustic measurements (made from spectrograms) of formant frequencies (F1, F2, F3) in vowels spoken by 76 men, women and children.

vowel space: projection of a given talker’s vowels in a F1 x F2 plane

Simple target model: vowels are differentiated (perceptually) by F1 and F2 frequencies measured in the middle of the vowel (vowel target).

17

200 300 400 600 800 1000 200

1000

1500

2000

2500

3000

3500

i

i

eæ

L

a

cu

i

i

ie

æ

L

a

cu

i

i

ie

æ

L

a

c

u

i

F1 frequency (Hz)

Fre

qu

en

cy o

f F

2 (

Hz)


Men

Women

Children


i “heed”

ɩ “hid”

e “hayed”

ɛ “head”

æ “had”ʌ “hut”

ɑ “hod”

ɔ “hawed”

o “hoed”

ʊ “hood”

u “who’d”

American English vowel space

Advancement

Height

F1

→

← F2

front center back

high

mid

low

Ə “schwa”

0.2 0.4 0.6 0.8 1.0 0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Fre

quency o

f F

2 (

Hz)


i

i

i

i

i

ii

i

ii

ii

ii

ii

iii

iii

i i

iiii

ii

ii

ii

i i

ii

i i

iii

i

ii

i i

ii

ii

ii

ii

iiii

ii

i

i i i

ii

ii

ii i

ii

i

ii ii

ii

ii

iii i

ii i i

i

ii

i

i i

ii

ii

ii

ii

i i

i

i

i

i

ii

i i

i

iii

ii

i i

ii ii

ii

i

i

e

eee

ee e

e e

eeeee

e

eee

e

e ee

eee

e

e eee

e

e

ee

e e

ee ee

ee ee

ee

ee

e

ee

eee ee

ee ee e

e e

eee

@

@

@@@

@

@@

@@@ @

@@

@

@@@

@@ @

@@

@@@

@@@

@

@

@@

@@@@

@

@

@

@ @

@@

@

@

@@

@@@

@

@@@@@@

@@

@@

@@

@@

LL

LL

LL

LL

L

L L

LLLLL

LL

LL

LL

LL

LL

LL

LL

L

L

LL

LL

LL LL

L

L

LL

L

L

L

L

L

L

LL

LL

L

L

L LLL

LL

LL

LL

a aa

aa a a a

aa

a

aaa

aa

aaa

aaa

a

a

aaaa

aa

aa

a

aaa

a aaa

aaa

a

aa

a

aaa

a

aa

a

a

a

aa

a

a

a

a

a

a aa

c c

cc

ccc

c

c

c

c

c

c

c

cc

cc

cc

c

c cc

cc

cc

c

c

cc

c

c

cc

ccc

c

c

cc

c

c

c

c

c

c

c

c

c

cc

cc

cc

c c

c

c

c

c

cc

vv

v

v

v

v

vvvv

v

vv vv

v

vvv

v

vv

vv

v

vv

v

vv

vv

v

v

vv

v

v

vv

vv

v

vv

v

v

v

v vvv

vv

v

v

vv

vvv

vvv vv

u

u

uu

u

u

u

u

u

uu

u

uu

u

u

u

u

uu

u

u

u

u

uu

uu

uu

uu

uu

u

u

uu

u

u

u u

u

u

u

u

uu

u

u

u

u

u

u

u

uu

u u

u

uuu

uuu

3

3

3

3

33

33

33

33 3

333

3

3

33

3

333

3333

3

3

33

3 33

333

3

33 3

33

33

3

3

33

33

3

3

3 3

333

3 33

33

33

iii

iii

ii

ii

ii

ii

iiii

ii

iii i

ii

i i

i i

ii

ii

ii

i iii

ii ii

i i

ii

i

iii

ii

ii

ii

iiii

ii

i

ii

iii

ii

ii

iii

ii

iiiii

ii

i

i

ii

iii

i

iiii

iiii

ii

i

i

iii

i

i iee

ee

e

e

ee

ee

e

eee

e e

ee

ee

ee

e

e eeee

ee

e

ee

e

ee

eee e

ee

e

eee

e

e

e ee e

ee

e

e

@@

@@

@@

@@@@

@

@

@@@@ @

@

@@

@

@@@

@

@@ @ @

@

@@

@@

@

@

@@ @

@@

@

@@

@@

@@

@@

@

@

@@

@@

L L

L

LLL

LL

L

L

LLLL

L

L

L

L

LLL

LL

L

LL

LL

LL

LLL

LL

L

L

L

L

L

LL

LL

L

LLL

LL

L

L

LL

L

L

aa

a

aaa

aa

aa aa

a aaaa

aa

aa

aa

a

a

a

aa

aaa

a

a

a aaa

a

a

a

aa

aa

aa

aa

aa

aa

a

a

a

a

c

c

cc

c

c

cc

c

c

cc

c cc

c

c

c c

c

c

c

c c

cc c

c

cc

cc

cc

cc

c

c

c

cc c

c

c

cccc

cc

c

c

cc

cc

vv

vv

vvv

vv v

vvv

v

vv

v

v

vvv

vv

v

vv

v v

v

v

v

vv

v

v

v

vv

vv

vv

v

v

v

v

vv v

v

vvvv

v v

u

u

u

uu

u

uu

uu

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u u

u

u

u

u

uu

u

u

uu

u

u

uu

u

u

uu

uu

u

uuu

u

u

u

u

3

3

33

3

3

33

3

3

3

33

3

3 3

33 3

3

333

3

33

3

33 3

33

33

33

33

3 3

3

3

33

3

3

3

3

33

33 3

3

33

ii

ii

i

ii

i

ii

i

i

ii

i

i

i

ii

i

i

i

iiii

ii

ii

ii

i

i

ii

ii

i i

i

i

i

i iiii

ii

i i

i i i

ii

i

ii

e

e

e

e

ee

ee

e

e

ee

eee

eee

e e

ee

eeee

ee

ee

@

@@

@

@ @

@@

@@

@

@

@@

@

@

@@

@@

@

@@

@ @

@

@@

@@

LLL

L

L

L

LL L

L

LL

LL

L

L

LL

LL

L L

L

L LLL

L L

L

a

a

a

a

a

a

a

aa

aa

a

aa

a

a

aa

a

a

a

a

a

a

a

a

a a

aa

c cc

c

c

c

cc

c

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

vvvv

v

vv

v

v

v v

v

v

v

v

v

vv

v

v v

vv

v

v vv

v

v

v

u

u

u

u

u u

u

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

uu

u

u

3

3

33

33

3

3

33

33

3

3

3

333

33

33

3

3

3 3

3333


Frequency of F1 (kHz)

Fre

qu

ency

of

F2

(k

Hz)

0.2 0.4 0.6 0.8 1.00.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Fre

quency o

f F

2 (

Hz)


ii

i

i

ii

i

i

ii

ii

ii

ii

iii

iii

i i

iiii

ii

ii

ii

i i

ii

i i

iii

i

ii

i i

ii

ii

ii

ii

iiiiii

ii i i

iii iii i

ii

i

ii ii

ii

ii

iii i

ii i i

i

ii

i

i i

ii

ii

ii

ii

i i

ii

i

i

ii

i i

i

iii

ii

i iii i

i

ii

ii

e

eee

ee e

e e

eeeee

e

eee

ee e

eeee

e

e eee

ee

ee

e e

ee ee

ee ee

ee

ee

ee

e

eee ee

ee ee e

e e

eee

@

@@@@

@

@@

@@@ @

@@

@

@@@

@@ @

@@

@@@

@@@@

@

@@

@@@@@

@

@

@ @@

@

@@

@@

@ @@@

@@@@@@

@@

@@

@@

@@

LL

LL

LL

LL

L

L L

LLLLL

LL

LL

LL

LL

LL

LL

LL

LL

LL

LL

LL LL

L

L

LL

L

L

LL

L

L

LL

LL

L

LLL

LL

LL

LL

LL

a aa

aa a aa

aa

a

aaa

aa

aaaaa

a

a

a

aaaa

aa

aa

aaaa

a aaa

aaa

a

aa

a

aaa

a

aa

a

a

a

aa

a

a

a

aa

a aa

c c

cc

ccc

c

c

c

c

c

c

c

cc

cc

ccc

c cc

cc

ccc

c

cc

c

c

cc

ccc

c

c

cc

c

c

c

c

c

c

c

c

c

cc

cc

cc

c c

c

c

c

c

cc

vv

v

v

v

v

vvvv

vvv vv

v

vvv

v

vv

vv

v

vv

v

vv

vv

v

v

vv

v

v

vv

vv

v

vv

v

v

v

v vvv

vv

v

v

vv

vvv

vvv vv

u

u

uu

u

u

u

u

u

uu

u

uu

u

u

u

u

uu

u

u

u

u

uu

uu

uu

uu

uu

u

u

uu

u

u

u u

uu

u

u

uu

u

u

u

u

u

u

u

uu

u u

u

uuu

uuu

3

3

3

3

33

333

3

33 3333

3

3

33

3

333

3333

33

33

3 33

333

3

33 3

33

333

3

33

33

3

3

3 3

333

3 33

33

33

iii ii

i

ii

ii

ii

ii

iiii

ii

iii ii

ii ii i

ii

ii

iii iii

ii ii

i i

ii

iiii

iiii

ii

iiii

ii

ii

ii

ii

ii

ii

iii

ii

iiiii

ii

i

i

ii

iii

iii

iiii

ii

ii

ii

iii

i

i iee

ee

e

e

ee

ee

e

eee

e e

ee

ee

ee

e

e eeee

ee

e

ee

e

ee

eee e

ee

e

eee

ee

e eee

ee

e

e

@@

@@

@@

@@@@

@

@

@@@@ @@

@@@

@@@

@

@@ @ @

@

@@

@@

@

@

@@ @@

@

@

@@

@@

@@

@@

@

@

@@

@@

L L

L

LLL

LL

L

L

LLLL

LL

L

L

LLL

LL

L

LL

LL

LL

LLL

LL

L

L

L

L

L

LL

LL

L

LLL

LL

L

L

LL

L

L

aa

a

aaa

aa

aa aa

a aaaa

aa

aa

aa

a

a

a

aa

aaa

a

a

a aaa

a

aa

aa

aa

aa

aa

aa

aa

a

a

a

a

c

c

cc c

c

cc

c

c

cc

c cc

c

c

c c

c

c

c

c c

cc c

c

cc

cc

cc

cc

c

c

c

cc c

c

c

cccc

cc

c

c

cc

cc

vv

vv

vvv

vv v

vvv

v

vv

v

v

vvv

vv

v

vv

v v

v

v

v

vv

v

v

v

vv

vv

vv

v

v

v

v

vv v

v

vvvv

v v

u

u

u

uu

u

uu

uu

u

u

u u

uu

u

u

u

u

u

u

u

u

u

u

u u

u

u

u

u

uu

u

u

uu

u

u

uu

u

u

uu

uu

u

uuu

u

u

u

u

33

33

3

3

33

3

3

3

33

3

3 3

33 3

3

333

3

33

3

33 3

33

33

33

33

3 3

3

3

33

3

3

3

333

33 3

33

3

ii

ii

i

ii

i

ii

i

i

ii

i

i

i

ii

i

i

i

iiii

ii

ii

ii

i

i

ii

ii

i i

i

i

ii ii

ii

ii

i i

i i i

ii

i

ii

e

e

e

e

ee

ee

ee

ee

eee

eee

e e

ee

eeee

ee

ee

@

@@

@

@ @

@@

@@

@

@

@@

@

@

@@

@@

@

@@

@ @@

@@

@@

LLL

L

L

L

LL L

L

LL

LL

L

L

LL

LL

L L

L

L LLL

L L

L

a

a

a

a

a

a

a

aa

aa

a

aa

a

a

aa

a

a

a

a

a

a

aa

a a

aa

c ccc

c

c

cc

c

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

vvvv

v

vv

v

v

v v

v

v

v

v

v

vv

v

v v

vv

v

v vv

v

v

v

u

u

u

u

u u

u

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

uu

u

u

3

3

33

33

3

3

33

33

3

3

3

33 3

3 3

333

3

33

3333

Invariance problem

Frequency of F1 (kHz)

Fre

qu

ency

of

F2

(k

Hz)

Invariance problem

Dynamic cues in vowel perception

Talker normalization theories

– Potter and Steinberg (1950): invariant pattern

of stimulation shifted up or down along the

basilar membrane

– Miller (1989): Formant ratio theory

– Joos (1948): Frame of reference theory

– Nearey (1989): Extrinsic and intrinsic factors

Formant Dynamics

Formant frequency changes over time:

0 1 2 3 4 5 0

20

40

60

Frequency (kHz)

Am

plit

ude (

dB

)

F1

F2

F3

F4 F5

18

Vowel-inherent spectral change Dual-target model

F1

F2

F1

F2

Males

Females

““ formant tracks

5 6 7 8 9 10 11 12 13 14 15 16 17 18

1200

1400

1600

1800

Age (years)

Geo. M

ean form

ant fr

equency (

Hz)

Boys

Girls

FFs as a function of age and sex

Vowel formant space: F1 x F2Current study: Adults

300 400 500 600 800 1000

1000

1500

2000

2500

3000

3500

U o ɑ

æI

i

Uo

ɑ

ʌ

æɛ

I

i

Males

Females

F1 Frequency (Hz)

F2 F

requency (

Hz)

ɛ

ʌ

Vowel formant space: F1 x F2Current study: Children (all age groups)

300 400 500 600 800 1000

1000

1500

2000

2500

3000

3500

Ã

I

ʌBoys

Girls

F1 Frequency (Hz)

F2 F

req

uen

cy

(H

z)

ii

II ɛ

ɛ ææU

U oo ɑ

ɑ

19

Graphical interpretation of CLIH

(sliding template) model

Movement along

diagonal for

different

speakers

Fixed pattern

of ‘holes’ in the

template

correspond to

stored vowel

reference

pattern

Nearey & Assmann,

2006

5 6 7 8 9 10 11 12 13 14 15 16 17 18

100

150

200

300

Age (years)

Fundam

enta

l fr

equency (

Hz)

Boys

Girls

F0 as a function of age and sex

F0 distribution – child talkers

50 100 150 200 250 300 350 4000

200

400

600

800

1000

Fundamental Frequency (Hz)

Nu

mb

er

of

toke

ns

F0 distribution – males (blue)

50 100 150 200 250 300 350 4000

200

400

600

800

1000


Nu

mb

er

of

toke

ns

20

F0 distribution – females (red)

50 100 150 200 250 300 350 4000

200

400

600

800

1000


Nu

mb

er

of

toke

ns

Wavesurfer

Download Wavesurfer:www.speech.kth.se/wavesurfer

Wavesurfer User Manualwww.speech.kth.se/wavesurfer/man.html

Course web page AUD 6306 Speech Science

Documents

Transcript of Course web page AUD 6306 Speech Science