Course web page AUD 6306 Speech Science
Transcript of Course web page AUD 6306 Speech Science
1
AUD 6306
Speech Science
Dr. Peter Assmann
Spring semester 2019
Course web page
http://www.utdallas.edu/~assmann/aud6306/
• Course information
• Speech demos
• Lecture slides
• Assigned reading material
• Additional resources in speech & hearing
Course materials
No required text
Some recommended books:
Kent, R.D. & Read, C. (2001). The Acoustic
Analysis of Speech. (Singular).
Stevens, K.N. (1999). Acoustic Phonetics
(Current Studies in Linguistics). M.I.T. Press.
Coming soon …
Course requirements
• Class presentations (20%)
• Presentation reports and homework (20%)
• Midterm take-home exam (25%)
• Final take-home exam (35%)
Course theme:
Hearing and speech are closely linked
2
Class presentations and reports
Sign up for two presentation dates.
Pick two broad topics from the field of speech
science (topics must be approved).
Select a suitable (peer-reviewed) paper for each
topic and present a brief (10-15 minute) summary
of the paper to the class to initiate/lead discussion
Prepare a written report (2-3 pages) due about
2-3 weeks after the presentation.
Class presentations and reports
• When you find a paper you’d like to present, email
the citation or the PDF version of the paper to me
for approval. (In some cases I may suggest an
alternative, or more recent paper). I will post the
paper on the readings web page and email the link
to the class.
Class presentations and reports
• Important note: papers must be selected and
approved one week before the presentation to
provide others time to read them.
• Everyone is expected to read the assigned articles
for each class and be prepared to discuss them.
Suggested Topics
Speech acoustics
Vowel production and perception
Consonant production and perception
Suprasegmentals and prosody
Speech perception in noise
Auditory grouping and segregation
Speech perception and hearing loss
Cochlear implants and speech coding
Development of speech perception
Second language acquisition
Audiovisual speech perception
Neural coding of speech
Models of speech perception
PubMed search engine:
http://www.ncbi.nlm.nih.gov/pubmed
Finding papers
3
Journal of the Acoustical Society of America:
http://scitation.aip.org/content/asa/journal/jasa
Finding papers UTD library - online journals
http://www.utdallas.edu/library/resources/journals.html
https://libguides.utdallas.edu/journal-collections
https://libproxy.utdallas.edu/login?url=http://search.eb
scohost.com/login.asp?profile=brain
Speech Science Primate vocal tractThe evolution of speech:
a comparative review
W. Tecumseh Fitch
Trends in Cognitive Sciences 4(7) July 2000
larynx
orangutan chimpanzee humantongue body
larynx
air sac
Vocal motor control
W. Tecumseh Fitch - Annu. Rev. Linguist. 2018. 4:255–279.
https://doi.org/10.1146/annurev-linguistics-011817-045748
Fitch 2018
Primate vocal tractThe evolution of speech:
a comparative review
W. Tecumseh Fitch
Trends in Cognitive Sciences 4(7) July 2000
4
Source-filter theory of speech production
OutputSound
Vocal Tract
VibratingVocal folds
Lungs
Power supply Oscillator Resonator
Human vocal tract
Acoustics
of speech
● Articulation
● Phonation
Organs of speech
• Lungs: apply pressure to generate
air stream (power supply)
• Larynx: air forced through the
glottis, a small opening between the
vocal folds (sound source)
• Vocal tract: pharynx, oral and
nasal cavities serve as complex
resonators (filter)
Source-Filter Theory
From Fitch, W.T. (2000). Trends in Cognitive Sciences
Audio demo: the source signal
• Source signal for an adult male voice
• Source signal for an adult female voice
• Source signal for a 10-year child
5
Vocal fold oscillation
• One-mass model
– Air flow through the
glottis during the closing
phase travels at the
same speed because of
inertia, producing
lowered air pressure
above the glottis.
Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html
Vocal fold oscillation
• Three-Mass Model
– One large mass
(representing the
thyroarytenoid muscle)
and two smaller masses,
M1 and M2 (representing
the vocal fold surface) .
All three masses are
connected by springs and
damping constants.
Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html
Source-Filter Theory: Vowels
G. Fant (1960). Acoustic Theory of Speech Production
Linear systems theory
Assumptions: (1) linearity (2) time-invariance
Vowels can be decomposed into two primary components: a source (input signal) and a filter(modulates the input).
Time domain version:
U( t ) T( t ) R( t ) = P( t )
Frequency domain version:
U( f ) · T( f ) · R( f ) = P( f )
Source-Filter Theory: Vowels
Frequency
Am
pli
tud
e
Source properties
In voiced sounds the glottal source spectrum contains
a series of lines called harmonics.
The lowest one is called the fundamental frequency
(F0).
0 200 400 600 800 1000-50
-40
-30
-20
-10
0
Frequency (Hz)
Rela
tive A
mplit
ude (
dB
)
Frequency (Hz)
F0
Amplitude
Spectrum
Demo: harmonic synthesis
Additive harmonic synthesis: vowel /i/
Cumulative sum of harmonics: vowel /i/
Additive synthesis: “wheel”
Cumulative sum of partials:
6
Filter properties
The vocal tract resonances (called formants) produce
peaks in the spectrum envelope.
Formants are labelled F1, F2, F3, ... in order of
increasing frequency.
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plit
ud
e in
dB
F1 F2
F3
F4AmplitudeSpectrum
(with superimposedLPC spectral envelope)
source filter radiation =output sound
Frequency
Am
pli
tude
/ i /
/ a /
Source-filter theory
Source: J. Hillenbrand
Robert Mannell, Macquarie Universityhttp://clas.mq.edu.au/phonetics/phonetics/vowelartic/vowel_artic.gif
Source-filter theorySource: J. Hillenbrand
Source-filter theorySource: J. Hillenbrand
Source-filter theorySource: J. Hillenbrand
7
Source-filter theorySource: J. Hillenbrand
/s/ and /sh/ Source-Filter Models Overlaid
Speech terminology…
Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration.
Formants: resonances of the vocal tract.
F0
Formant
Frequency
Amplitude
Source properties: Pitch
Fundamental frequency (F0) is determined by the
rate of vocal fold vibration, and is responsible for the
perceived voice pitch.
Source properties: Pitch
F0 can be removed by filtering (as in telephone circuits)
and the pitch remains the same.
This is the problem of the missing fundamental, one
of the oldest problems in hearing science.
Pitch is determined by the frequency pattern of the
harmonics (or their equivalent in the time domain, the
periodicities in the waveform).
Harmonicity and Periodicity
Period: regularly repeating pattern in the
waveform
Period duration, T0 = 6 ms
waveform
Harmonicity and Periodicity
Harmonic: regularly repeating peak in the
amplitude spectrum
0 0.5 1 1.5 2 2.5
-40
-20
0
20
Frequency (kHz)
Am
plit
ud
e (
dB
)
F0 = 1000 / 6 = 166 Hz
F0 = 1 / T0
amplitude
spectrum
8
Harmonics are
integer multiples
of F0 and are
evenly spaced in
frequency
Harmonicity and Periodicity
• Period: regularly repeating pattern in
the waveformPeriod duration T0 = 6 ms
0 0.5 1 1.5 2 2.5
-40
-20
0
20
Frequency (kHz)
Am
plit
ud
e (
dB
)
F0 = 1000 / 6 = 166 Hz
F0 = 1 / T0
Waveform
Amplitude
Spectrum
Frequency (kHz)
Harmonic singing
Harmonic singing (also called overtone
singing) involves changing the shape of the
vocal tract to align the resonance frequencies
(formants) with harmonics of the fundamental.
A low, sustained fundamental is produced,
similar to the drone of a bagpipe, along with
flute-like harmonics that drift in and out.
Harmonic singing
Harmonic singing (also called overtone
singing) involves changing the shape of the
vocal tract to align the resonance frequencies
(formants) with harmonics of the fundamental.
A low, sustained fundamental is produced,
similar to the drone of a bagpipe, along with
flute-like harmonics that drift in and out.
Harmonic singing
Tuvan throat singing
http://www.youtube.com/watch?v=DY1pcEtHI_w&feature=youtu.be
Amazing Grace
http://www.youtube.com/watch?v=mO4Uh-Mini4&feature=youtu.be
Overtone singing
https://www.youtube.com/watch?v=UHTF1-IhuC0#t=21
Effects of F0 changes
Source-filter
independence
Effects of formant frequency changes
Source-filter
independence
9
Voicing irregularities
Shimmer: variation in amplitude from one
cycle to the next.
Jitter: variation in frequency (period duration)
from one cycle to the next.
Voicing irregularities
Breathy voice is associated with a glottal
waveform with a steeper roll-off than modal
voice. As a result there is less energy in the
higher harmonics (steeper slope in the
spectrum).
Vocal tract properties
Resonating tube model– approximation for neutral vowel (schwa), [Ə]
– closed at one end (glottis); open at the other (lips)
– uniform cross-sectional area
– curvature is relatively unimportant
Glottis Lips
length, L
[ Ə ]
Uniform tube model (schwa)
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– Fn is the frequency of formant n in Hz
– c is the velocity of sound (about 35000 cm/sec)
– L is the length of the vocal tract (17.5 for adult male)
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz
– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz
– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz
10
Acoustic vowel space
i “heed”
ɑ “hod”
u “who’d”
First formant, F1 frequency (Hz)
Se
co
nd
fo
rma
nt,
F2
fre
qu
en
cy (
Hz)
10008006004002000
0
1000
2000
3000
Ə
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– Fn is the frequency of formant n in Hz
– c is the velocity of sound in air (about 35000 cm/sec)
– L is the length of the vocal tract (17.5 for adult male)
L
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– F1 = (2(1) –1)*35000/(4*17.5) =
– F2 = (2(2) –1)*35000/(4*17.5) =
– F3 = (2(3) –1)*35000/(4*17.5) =
L
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz
– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz
– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz
L
Helium speech
The speed of sound in a helium/oxygen mixture
at 20°C is about 93000 cm/s, compared to
35000 cm/s in air. This increases the resonance
frequencies but has relatively little effect on F0.
In helium speech, the formants are shifted up
but the pitch stays the same.
Helium speech
Exercise: Compute the frequencies of F1, F2
and F3 for a 17.5 cm vocal tract producing the
vowel /ә/ (schwa) in a helium/air mixture
(velocity c ≈ 93000 cm/s)
Fn = ( 2n – 1 ) c / 4 L
F1 = (2(1) –1)*93000/(4*17.5) =
F2 = (2(2) –1)*93000 /(4*17.5) =
F3 = (2(3) –1)*93000 /(4*17.5) =
11
Helium speech
Exercise: Compute the frequencies of F1, F2
and F3 for a 17.5 cm vocal tract producing the
vowel /ә/ (schwa) in a helium/air mixture
(velocity c ≈ 93000 cm/s)
Fn = ( 2n – 1 ) c / 4 L
F1 = (2(1) –1)*93000/(4*17.5) = 1328.6
F2 = (2(2) –1)*93000 /(4*17.5) = 3985.7
F3 = (2(3) –1)*93000 /(4*17.5) = 6642.9
Helium speech
Audio demos
– Speech in air
– Speech in helium
– Pitch in air
– Pitch in helium
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
Time (ms)
Fre
quency (
kH
z)
0 100 200 300 400 500 600 700 800 9000
1
2
3
4
Speech in airSpeech in air
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
Time (ms)
Fre
quency (
kH
z)
0 100 200 300 400 500 600 700 8000
1
2
3
4
Speech in heliumSpeech in helium
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
Sulfur Hexaflouride
Helium
– density of 0.1786 g/L at sea level
Air
– density of 1.225 g/L at sea level
Sulfur Hexaflouride (SF6)
– density of 6.12 g/L at sea level
Speech production with vocal tract filled with SF6
http://www.youtube.com/watch?v=d-XbjFn3aqE
Perturbation Theory
The first formant (F1) frequency is lowered by
a constriction in the front half of the vocal tract
(/u/ and /i/), and raised when the constriction is in the back of the vocal tract, as in /u/.
delta
F1
glottis lips
12
Perturbation Theory
The second formant (F2) is lowered by a
constriction near the lips or just above the
pharynx; in /u/ both of these regions are
constricted. F2 is raised when the constriction is
behind the lips and teeth, as in the vowel /i/.
delta
F2
glottis lips
Perturbation Theory
The third formant (F3) is lowered by a
constriction at the lips or at the back of the
mouth or in the upper pharynx. This occurs in
/r/ and /r/-colored vowels like American English /ɚ/ (as in “heard”).
delta
F3
glottis lips
Perturbation Theory
F3 is raised when the constriction is behind
the lips and teeth or near the upper pharynx.
delta
F3
glottis lips
Perturbation Theory
All formants tend to drop in frequency when
the vocal tract length is increased or when a
constriction is formed at the lips.
glottis lips
Perturbation Theory
F1 frequency is correlated with jaw
opening (and inversely related to tongue
height ).
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plit
ud
e in
dB
amplitude
spectrum
Perturbation Theory
F2 frequency is correlated with tongue
advancement (front-back dimension)
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plit
ud
e in
dB
amplitude
spectrum
13
Digital representations of signals
time
am
plit
ude
sampling quantization
Spectral analysis
Amplitude spectrum: sound pressure levels
associated with different frequency
components of a signal Power or intensity
Amplitude or magnitude
Log units and decibels (dB)
Phase spectrum: relative phases associated
with different frequency components Degrees or radians
Spectral analysis of speech
Why perform a frequency analyses of speech?
– Ear+brain carry out a form of frequency analysis
– Relevant features of speech are more readily visible
in the amplitude spectrum than in the raw waveform
Spectral analysis of speech
But: the ear is not a spectrum analyzer.
– Auditory frequency selectivity is best at low
frequencies and gets progressively worse at higher
frequencies.
Measuring formants
0 500 1000 1500 2000 2500 3000 3500 4000 0
20
40
60
Click here to play vowel /i/ Relative amplitude (d
B)
0 500 1000 1500 2000 2500 3000 3500 4000 0
20
40
60
Click here to play vowel /a/ Relative amplitude (d
B)
0 500 1000 1500 2000 2500 3000 3500 4000 0
20
40
60
Click here to play vowel /u/
Frequency (Hz)
Relative amplitude (d
B)
Click on lines to hear individual harmonics
Formant frequency peak
estimation requires an
interpolation process.
amplitude
spectrum
14
0 2 4 6-20
0
20
40
60
Frequency (kHz)
Am
plit
ude (
dB
)
Formant Estimation
F1 F2 F3Vowel spectra have peaks
corresponding to the center frequencies of formants
Formants
Spectrum of natural vowel / /
0 0.5 1 1.5 220
30
40
50
60
Frequency (kHz)
Am
plit
ude (
dB
)
Formant Estimation
F1But: harmonics also
generate spectral peaks;formant frequencies do not necessarily coincide with
harmonic frequencies
harmonics
Children’s speech
Children’s voices have high F0s.
When F0 is 400 Hz (not unusual for 3-year
olds), only 4 harmonics appear in the
frequency range between 0-1600 Hz.
Frequency (kHz)
Am
pli
tud
e
0 1.5
Sparce sampling problem
Vowel identity is dependent on the
frequencies of formant peaks.
Formants are difficult to estimate when
fundamental frequency is high.
Am
pli
tud
e
0 1.5
0 0.5 1 1.5 2 2.5 3 3.50
10
20
30
40
50
60
70
Frequency (kHz)
Am
plit
ude (
dB
)
LPC spectrum
FFT spectrum
LPC spectrumFormants sometimes appear to merge
LPC
spectrum
15
Short-term amplitude spectrum
0 1 2 3 4-10
0
10
20
30
40
50
60
Frequency (kHz)
Am
plit
ud
e (
dB
)
F3 = 2755 Hz
F1 = 281 Hz
F2 = 2196 Hz
Speech spectrogram
running amplitude spectra (codes amplitude
changes in different frequency bands over time).
Speech spectrograms
What is a speech spectrogram?
– Display of amplitude spectrum at successive
instants in time ("running spectra")
– How can 3 dimensions be represented on a two-
dimensional display? Gray-scale spectrogram
Waterfall plots
Animation
Speech spectrograms
Why are speech spectrograms useful?
– Shows dynamic properties of speech
– Incorporates frequency analysis
– Related to speech production
– Helps to visually identify speech cues
i “heed”
ɩ “hid”
e “hayed”
ɛ “head”
æ “had”ʌ “hut”
ɑ “hod”
ɔ “hawed”
o “hoed”
ʊ “hood”
u “who’d”
American English vowel space
Advancement
Height
F1
→
← F2
front center back
high
mid
low
Ə “schwa”
16
“The watchdog”
waveform
spectrogram
F3
F2
F10
5
Fre
qu
en
cy (
kH
z)
F3
F2
F1
0
4
Fre
qu
en
cy (
kH
z)
0
TrackDraw: a graphical speech synthesizer
TrackDraw: a graphical speech synthesizer
Peterson and Barney (1952)
Acoustic measurements (made from spectrograms) of formant frequencies (F1, F2, F3) in vowels spoken by 76 men, women and children.
vowel space: projection of a given talker’s vowels in a F1 x F2 plane
Simple target model: vowels are differentiated (perceptually) by F1 and F2 frequencies measured in the middle of the vowel (vowel target).
17
200 300 400 600 800 1000 200
1000
1500
2000
2500
3000
3500
i
i
eæ
L
a
cu
i
i
ie
æ
L
a
cu
i
i
ie
æ
L
a
c
u
i
F1 frequency (Hz)
Fre
qu
en
cy o
f F
2 (
Hz)
Peterson and Barney (1952)
Men
Women
Children
Peterson and Barney (1952)
i “heed”
ɩ “hid”
e “hayed”
ɛ “head”
æ “had”ʌ “hut”
ɑ “hod”
ɔ “hawed”
o “hoed”
ʊ “hood”
u “who’d”
American English vowel space
Advancement
Height
F1
→
← F2
front center back
high
mid
low
Ə “schwa”
0.2 0.4 0.6 0.8 1.0 0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Fre
quency o
f F
2 (
Hz)
Peterson and Barney (1952)
i
i
i
i
i
ii
i
ii
ii
ii
ii
iii
iii
i i
iiii
ii
ii
ii
i i
ii
i i
iii
i
ii
i i
ii
ii
ii
ii
iiii
ii
i
i i i
ii
ii
ii i
ii
i
ii ii
ii
ii
iii i
ii i i
i
ii
i
i i
ii
ii
ii
ii
i i
i
i
i
i
ii
i i
i
iii
ii
i i
ii ii
ii
i
i
e
eee
ee e
e e
eeeee
e
eee
e
e ee
eee
e
e eee
e
e
ee
e e
ee ee
ee ee
ee
ee
e
ee
eee ee
ee ee e
e e
eee
@
@
@@@
@
@@
@@@ @
@@
@
@@@
@@ @
@@
@@@
@@@
@
@
@@
@@@@
@
@
@
@ @
@@
@
@
@@
@@@
@
@@@@@@
@@
@@
@@
@@
LL
LL
LL
LL
L
L L
LLLLL
LL
LL
LL
LL
LL
LL
LL
L
L
LL
LL
LL LL
L
L
LL
L
L
L
L
L
L
LL
LL
L
L
L LLL
LL
LL
LL
a aa
aa a a a
aa
a
aaa
aa
aaa
aaa
a
a
aaaa
aa
aa
a
aaa
a aaa
aaa
a
aa
a
aaa
a
aa
a
a
a
aa
a
a
a
a
a
a aa
c c
cc
ccc
c
c
c
c
c
c
c
cc
cc
cc
c
c cc
cc
cc
c
c
cc
c
c
cc
ccc
c
c
cc
c
c
c
c
c
c
c
c
c
cc
cc
cc
c c
c
c
c
c
cc
vv
v
v
v
v
vvvv
v
vv vv
v
vvv
v
vv
vv
v
vv
v
vv
vv
v
v
vv
v
v
vv
vv
v
vv
v
v
v
v vvv
vv
v
v
vv
vvv
vvv vv
u
u
uu
u
u
u
u
u
uu
u
uu
u
u
u
u
uu
u
u
u
u
uu
uu
uu
uu
uu
u
u
uu
u
u
u u
u
u
u
u
uu
u
u
u
u
u
u
u
uu
u u
u
uuu
uuu
3
3
3
3
33
33
33
33 3
333
3
3
33
3
333
3333
3
3
33
3 33
333
3
33 3
33
33
3
3
33
33
3
3
3 3
333
3 33
33
33
iii
iii
ii
ii
ii
ii
iiii
ii
iii i
ii
i i
i i
ii
ii
ii
i iii
ii ii
i i
ii
i
iii
ii
ii
ii
iiii
ii
i
ii
iii
ii
ii
iii
ii
iiiii
ii
i
i
ii
iii
i
iiii
iiii
ii
i
i
iii
i
i iee
ee
e
e
ee
ee
e
eee
e e
ee
ee
ee
e
e eeee
ee
e
ee
e
ee
eee e
ee
e
eee
e
e
e ee e
ee
e
e
@@
@@
@@
@@@@
@
@
@@@@ @
@
@@
@
@@@
@
@@ @ @
@
@@
@@
@
@
@@ @
@@
@
@@
@@
@@
@@
@
@
@@
@@
L L
L
LLL
LL
L
L
LLLL
L
L
L
L
LLL
LL
L
LL
LL
LL
LLL
LL
L
L
L
L
L
LL
LL
L
LLL
LL
L
L
LL
L
L
aa
a
aaa
aa
aa aa
a aaaa
aa
aa
aa
a
a
a
aa
aaa
a
a
a aaa
a
a
a
aa
aa
aa
aa
aa
aa
a
a
a
a
c
c
cc
c
c
cc
c
c
cc
c cc
c
c
c c
c
c
c
c c
cc c
c
cc
cc
cc
cc
c
c
c
cc c
c
c
cccc
cc
c
c
cc
cc
vv
vv
vvv
vv v
vvv
v
vv
v
v
vvv
vv
v
vv
v v
v
v
v
vv
v
v
v
vv
vv
vv
v
v
v
v
vv v
v
vvvv
v v
u
u
u
uu
u
uu
uu
u
u
u u
u
u
u
u
u
u
u
u
u
u
u
u
u u
u
u
u
u
uu
u
u
uu
u
u
uu
u
u
uu
uu
u
uuu
u
u
u
u
3
3
33
3
3
33
3
3
3
33
3
3 3
33 3
3
333
3
33
3
33 3
33
33
33
33
3 3
3
3
33
3
3
3
3
33
33 3
3
33
ii
ii
i
ii
i
ii
i
i
ii
i
i
i
ii
i
i
i
iiii
ii
ii
ii
i
i
ii
ii
i i
i
i
i
i iiii
ii
i i
i i i
ii
i
ii
e
e
e
e
ee
ee
e
e
ee
eee
eee
e e
ee
eeee
ee
ee
@
@@
@
@ @
@@
@@
@
@
@@
@
@
@@
@@
@
@@
@ @
@
@@
@@
LLL
L
L
L
LL L
L
LL
LL
L
L
LL
LL
L L
L
L LLL
L L
L
a
a
a
a
a
a
a
aa
aa
a
aa
a
a
aa
a
a
a
a
a
a
a
a
a a
aa
c cc
c
c
c
cc
c
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
vvvv
v
vv
v
v
v v
v
v
v
v
v
vv
v
v v
vv
v
v vv
v
v
v
u
u
u
u
u u
u
u
u
u u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
uu
u
u
3
3
33
33
3
3
33
33
3
3
3
333
33
33
3
3
3 3
3333
Peterson and Barney (1952)
Frequency of F1 (kHz)
Fre
qu
ency
of
F2
(k
Hz)
0.2 0.4 0.6 0.8 1.00.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Fre
quency o
f F
2 (
Hz)
Peterson and Barney (1952)
ii
i
i
ii
i
i
ii
ii
ii
ii
iii
iii
i i
iiii
ii
ii
ii
i i
ii
i i
iii
i
ii
i i
ii
ii
ii
ii
iiiiii
ii i i
iii iii i
ii
i
ii ii
ii
ii
iii i
ii i i
i
ii
i
i i
ii
ii
ii
ii
i i
ii
i
i
ii
i i
i
iii
ii
i iii i
i
ii
ii
e
eee
ee e
e e
eeeee
e
eee
ee e
eeee
e
e eee
ee
ee
e e
ee ee
ee ee
ee
ee
ee
e
eee ee
ee ee e
e e
eee
@
@@@@
@
@@
@@@ @
@@
@
@@@
@@ @
@@
@@@
@@@@
@
@@
@@@@@
@
@
@ @@
@
@@
@@
@ @@@
@@@@@@
@@
@@
@@
@@
LL
LL
LL
LL
L
L L
LLLLL
LL
LL
LL
LL
LL
LL
LL
LL
LL
LL
LL LL
L
L
LL
L
L
LL
L
L
LL
LL
L
LLL
LL
LL
LL
LL
a aa
aa a aa
aa
a
aaa
aa
aaaaa
a
a
a
aaaa
aa
aa
aaaa
a aaa
aaa
a
aa
a
aaa
a
aa
a
a
a
aa
a
a
a
aa
a aa
c c
cc
ccc
c
c
c
c
c
c
c
cc
cc
ccc
c cc
cc
ccc
c
cc
c
c
cc
ccc
c
c
cc
c
c
c
c
c
c
c
c
c
cc
cc
cc
c c
c
c
c
c
cc
vv
v
v
v
v
vvvv
vvv vv
v
vvv
v
vv
vv
v
vv
v
vv
vv
v
v
vv
v
v
vv
vv
v
vv
v
v
v
v vvv
vv
v
v
vv
vvv
vvv vv
u
u
uu
u
u
u
u
u
uu
u
uu
u
u
u
u
uu
u
u
u
u
uu
uu
uu
uu
uu
u
u
uu
u
u
u u
uu
u
u
uu
u
u
u
u
u
u
u
uu
u u
u
uuu
uuu
3
3
3
3
33
333
3
33 3333
3
3
33
3
333
3333
33
33
3 33
333
3
33 3
33
333
3
33
33
3
3
3 3
333
3 33
33
33
iii ii
i
ii
ii
ii
ii
iiii
ii
iii ii
ii ii i
ii
ii
iii iii
ii ii
i i
ii
iiii
iiii
ii
iiii
ii
ii
ii
ii
ii
ii
iii
ii
iiiii
ii
i
i
ii
iii
iii
iiii
ii
ii
ii
iii
i
i iee
ee
e
e
ee
ee
e
eee
e e
ee
ee
ee
e
e eeee
ee
e
ee
e
ee
eee e
ee
e
eee
ee
e eee
ee
e
e
@@
@@
@@
@@@@
@
@
@@@@ @@
@@@
@@@
@
@@ @ @
@
@@
@@
@
@
@@ @@
@
@
@@
@@
@@
@@
@
@
@@
@@
L L
L
LLL
LL
L
L
LLLL
LL
L
L
LLL
LL
L
LL
LL
LL
LLL
LL
L
L
L
L
L
LL
LL
L
LLL
LL
L
L
LL
L
L
aa
a
aaa
aa
aa aa
a aaaa
aa
aa
aa
a
a
a
aa
aaa
a
a
a aaa
a
aa
aa
aa
aa
aa
aa
aa
a
a
a
a
c
c
cc c
c
cc
c
c
cc
c cc
c
c
c c
c
c
c
c c
cc c
c
cc
cc
cc
cc
c
c
c
cc c
c
c
cccc
cc
c
c
cc
cc
vv
vv
vvv
vv v
vvv
v
vv
v
v
vvv
vv
v
vv
v v
v
v
v
vv
v
v
v
vv
vv
vv
v
v
v
v
vv v
v
vvvv
v v
u
u
u
uu
u
uu
uu
u
u
u u
uu
u
u
u
u
u
u
u
u
u
u
u u
u
u
u
u
uu
u
u
uu
u
u
uu
u
u
uu
uu
u
uuu
u
u
u
u
33
33
3
3
33
3
3
3
33
3
3 3
33 3
3
333
3
33
3
33 3
33
33
33
33
3 3
3
3
33
3
3
3
333
33 3
33
3
ii
ii
i
ii
i
ii
i
i
ii
i
i
i
ii
i
i
i
iiii
ii
ii
ii
i
i
ii
ii
i i
i
i
ii ii
ii
ii
i i
i i i
ii
i
ii
e
e
e
e
ee
ee
ee
ee
eee
eee
e e
ee
eeee
ee
ee
@
@@
@
@ @
@@
@@
@
@
@@
@
@
@@
@@
@
@@
@ @@
@@
@@
LLL
L
L
L
LL L
L
LL
LL
L
L
LL
LL
L L
L
L LLL
L L
L
a
a
a
a
a
a
a
aa
aa
a
aa
a
a
aa
a
a
a
a
a
a
aa
a a
aa
c ccc
c
c
cc
c
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
vvvv
v
vv
v
v
v v
v
v
v
v
v
vv
v
v v
vv
v
v vv
v
v
v
u
u
u
u
u u
u
u
u
u u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
uu
u
u
3
3
33
33
3
3
33
33
3
3
3
33 3
3 3
333
3
33
3333
Invariance problem
Frequency of F1 (kHz)
Fre
qu
ency
of
F2
(k
Hz)
Invariance problem
Dynamic cues in vowel perception
Talker normalization theories
– Potter and Steinberg (1950): invariant pattern
of stimulation shifted up or down along the
basilar membrane
– Miller (1989): Formant ratio theory
– Joos (1948): Frame of reference theory
– Nearey (1989): Extrinsic and intrinsic factors
Formant Dynamics
Formant frequency changes over time:
0 1 2 3 4 5 0
20
40
60
Frequency (kHz)
Am
plit
ude (
dB
)
F1
F2
F3
F4 F5
18
Vowel-inherent spectral change Dual-target model
F1
F2
F1
F2
Males
Females
““ formant tracks
5 6 7 8 9 10 11 12 13 14 15 16 17 18
1200
1400
1600
1800
Age (years)
Geo. M
ean form
ant fr
equency (
Hz)
Boys
Girls
FFs as a function of age and sex
Vowel formant space: F1 x F2Current study: Adults
300 400 500 600 800 1000
1000
1500
2000
2500
3000
3500
U o ɑ
æI
i
Uo
ɑ
ʌ
æɛ
I
i
Males
Females
F1 Frequency (Hz)
F2 F
requency (
Hz)
ɛ
ʌ
Vowel formant space: F1 x F2Current study: Children (all age groups)
300 400 500 600 800 1000
1000
1500
2000
2500
3000
3500
Ã
I
ʌBoys
Girls
F1 Frequency (Hz)
F2 F
req
uen
cy
(H
z)
ii
II ɛ
ɛ ææU
U oo ɑ
ɑ
19
Graphical interpretation of CLIH
(sliding template) model
Movement along
diagonal for
different
speakers
Fixed pattern
of ‘holes’ in the
template
correspond to
stored vowel
reference
pattern
Nearey & Assmann,
2006
5 6 7 8 9 10 11 12 13 14 15 16 17 18
100
150
200
300
Age (years)
Fundam
enta
l fr
equency (
Hz)
Boys
Girls
F0 as a function of age and sex
F0 distribution – child talkers
50 100 150 200 250 300 350 4000
200
400
600
800
1000
Fundamental Frequency (Hz)
Nu
mb
er
of
toke
ns
F0 distribution – males (blue)
50 100 150 200 250 300 350 4000
200
400
600
800
1000
Fundamental Frequency (Hz)
Nu
mb
er
of
toke
ns
20
F0 distribution – females (red)
50 100 150 200 250 300 350 4000
200
400
600
800
1000
Fundamental Frequency (Hz)
Nu
mb
er
of
toke
ns
Wavesurfer
Download Wavesurfer:www.speech.kth.se/wavesurfer
Wavesurfer User Manualwww.speech.kth.se/wavesurfer/man.html