Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are...

48
Speech Sounds of American English There are over 40 speech sounds in American English which can be organized by their basic manner of production Manner Class Number Vowels 18 Fricatives 8 Stops 6 Nasals 3 Semivowels 4 Affricates 2 Aspirant 1 Vowels, glides, and consonants differ in degree of constriction Sonorant consonants have no pressure build up at constriction Nasal consonants lower the velum allowing airflow in nasal cavity Continuant consonants do not block airflow in oral cavity 6.345 Automatic Speech Recognition Speech Sounds 1

Transcript of Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are...

Page 1: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Speech Sounds of American English

• There are over 40 speech sounds in American English which canbe organized by their basic manner of production

Manner Class NumberVowels 18Fricatives 8Stops 6Nasals 3Semivowels 4Affricates 2Aspirant 1

• Vowels, glides, and consonants differ in degree of constriction

• Sonorant consonants have no pressure build up at constriction

• Nasal consonants lower the velum allowing airflow in nasal cavity

• Continuant consonants do not block airflow in oral cavity

6.345 Automatic Speech Recognition Speech Sounds 1

rahkuma
Lecture # 3-4 Session 2003
Page 2: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Vowel Production

• No significant constriction in the vocal tract

• Usually produced with periodic excitation

• Acoustic characteristics depend on the position of the jaw,tongue, and lips

[i] [@] [a] [u]

6.345 Automatic Speech Recognition Speech Sounds 2

Page 3: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Vowels of American English

• There are approximately 18 vowels in American English made upof monothongs, diphthongs, and reduced vowels (schwa’s)

/i¤ / iy beat /O/ ao bought /a¤ / ay bite/I/ ih bit /^/ ah but /O¤ / oy Boyd/e¤ / ey bait /o⁄ / ow boat /a⁄ / aw bout/E/ eh bet /U/ uh book [{] ax about/@/ ae bat /u/ uw boot [|] ix roses/a/ aa Bob /5/ er Bert [}] axr butter

• They are often described by the articulatory features: High/Low,Front/Back, Retroflexed, Rounded, and Tense/Lax

6.345 Automatic Speech Recognition Speech Sounds 3

Page 4: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of the Cardinal Vowels

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

beet bat bott boot/bi¤ t/ /b@t/ /bat/ /but/

6.345 Automatic Speech Recognition Speech Sounds 4

Page 5: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Vowel Formant Averages

• Vowels are often characterized by the lower three formants

• High/Low is correlated with the first formant, F1

• Front/Back is correlated with the second formant, F2

• Retroflexion is marked by a low third formant, F3

Female Speakers Male Speakers

i¤ I e¤ E @ a O ^ o⁄ U u 5 { |0

500

1000

1500

2000

2500

3000

3500

Ave

rag

e F

req

uen

cy (

Hz)

Vowel

F1F2F3

i¤ I e¤ E @ a O ^ o⁄ U u 5 { |0

500

1000

1500

2000

2500

3000

3500

Ave

rag

e F

req

uen

cy (

Hz)

Vowel

F1F2F3

6.345 Automatic Speech Recognition Speech Sounds 5

Page 6: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Vowel Durations

• Each vowel has a different intrinsic duration

• Schwa’s have distinctly shorter durations (50ms)

• /I, E, ^, U/ are the shortest monothongs

• Context can greatly influence vowel duration

Female Speakers Male Speakers

i¤ I e¤ E @ a O ^ o⁄ U u 5 { | a⁄ o¤ a¤ ¤u0

50

100

150

200

250

Ave

rag

e D

ura

tio

n (

ms)

Voweli¤ I e¤ E @ a O ^ o⁄ U u 5 { | a⁄ o¤ a¤ ¤u

0

50

100

150

200

250

Ave

rag

e D

ura

tio

n (

ms)

Vowel

6.345 Automatic Speech Recognition Speech Sounds 6

Page 7: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Happy Little Vowel Chart"So inaccurate, yet so useful."

SCHWAS:Plain [{] About [{ba⁄t]Front [|] Roses [ro⁄z|z]Retroflex [}] Forever [f}Ev}]

F 2 In

crea

ses

F1 Increases

MID LOWHIGH

FRONT

BACK

OUu

ao

^,{

E@

eI

i

TENSE = Towards Edgestends to be longer

LAX = Towards Centertends to be shorter

Rob's

Think F3 is mighty low? Your pal 5 is the way to go!

6.345 Automatic Speech Recognition Speech Sounds 7

Page 8: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Fricative Production

• Turbulence produced at narrow constriction

• Constriction position determines acoustic characteristics

• Can be produced with periodic excitation

[f] [T] [s] [S]

6.345 Automatic Speech Recognition Speech Sounds 8

Page 9: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Fricatives of American English

• There are 8 fricatives in American English

• Four places of articulation: Labio-Dental (Labial), Interdental(Dental), Alveolar, and Palato-Alveolar (Palatal)

• They are often described by the features Voiced/Unvoiced, orStrident/Non-Strident (constriction behind alveolar ridge)

Type Unvoiced VoicedLabial /f/ f fee /v/ v vDental /T/ th thief /D/ dh theeAlveolar /s/ s see /z/ z zPalatal /S/ sh she /Z/ zh Gigi

6.345 Automatic Speech Recognition Speech Sounds 9

Page 10: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of Unvoiced Fricatives

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

fee thief see she/fi¤ / /Ti¤ f/ /si¤ / /Si¤ /

6.345 Automatic Speech Recognition Speech Sounds 10

Page 11: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Fricative Energy

Average Total Energy

Pro

babi

lity

Den

sity

una

djus

ted

for

freq

uenc

y

-100 -90 -80 -70 -60 -50 -40

0.0

0.02

0.04

0.06

NON-STRIDENTSTRIDENT

Strident fricatives tend to be stronger than non-strident fricatives.6.345 Automatic Speech Recognition Speech Sounds 11

Page 12: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Fricative Durations

Duration

Pro

babi

lity

Den

sity

una

djus

ted

for

freq

uenc

y

0.0 0.05 0.10 0.15 0.20 0.25 0.30

02

46

810

1214

UNVOICEDVOICED

Voiced fricatives tend to be shorter than unvoiced fricatives.6.345 Automatic Speech Recognition Speech Sounds 12

Page 13: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Examples of Fricative Voicing Contrast

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

sue zoo face phase/su/ /zu/ /fe¤ s/ /fe¤ z/

6.345 Automatic Speech Recognition Speech Sounds 13

Page 14: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Friendly Little Consonant Chart"Somewhat more accurate, yet somewhat less useful."

Labial Alveolar Palatal VelarPlace of Articulation

Voicing: Unvoiced Voiced

Nas

alFr

icat

ive

Sto

pM

anne

r of

Art

icul

atio

n p b

f v

m

Dental

T D

n 4

s z S Z

t d k g

The Semi-vowels:

is like an extremey i

is like an extremew u

is like an extremel o

is like an extremer 5

The Affricates:

is likeC t+S

is likeJ d+Z

The Odds and Ends:

h (unvoiced h)

H (voiced h)

F (flap) FÊ (nasalized flap)

? (glottal stop)

Weak (Non-strident) Strong (Strident)

Rob's

6.345 Automatic Speech Recognition Speech Sounds 14

Page 15: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

What is this word?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

6.345 Automatic Speech Recognition Speech Sounds 15

Page 16: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Stop Production

• Complete closure in the vocal tract, pressure build up

• Sudden release of the constriction, turbulence noise

• Can have periodic excitation during closure

[b] [d] [g]

6.345 Automatic Speech Recognition Speech Sounds 16

Page 17: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Stops of American English

• There are 6 stop consonants in American English

• Three places of articulation: Labial, Alveolar, and Velar

• Each place of articulation has a voiced and unvoiced stop

Type Voiced UnvoicedLabial /b/ b bought /p/ p potAlveolar /d/ d dot /t/ t totVelar /g/ g got /k/ k cot

• Unvoiced stops are typically aspirated

• Voiced stops usually exhibit a “voice-bar’’ during closure

• Information about formant transitions and release useful forclassification

6.345 Automatic Speech Recognition Speech Sounds 17

Page 18: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of Unvoiced Stops

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

poop toot kook/pup/ /tut/ /kuk/

6.345 Automatic Speech Recognition Speech Sounds 18

Page 19: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Examples of Stop Voicing Contrast

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

pop bob/pap/ /bab/

6.345 Automatic Speech Recognition Speech Sounds 19

Page 20: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Singleton Stop Durations

b d g p t k0

10

20

30

40

50

60

70

80V

OT

Dur

atio

n (m

s)

Voice onset times (VOTs) are longer for unvoiced stops.

6.345 Automatic Speech Recognition Speech Sounds 20

Page 21: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Voicing Cues for Stops

There are many voicing cues for a stop.6.345 Automatic Speech Recognition Speech Sounds 21

Page 22: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

/s/-Stop Durations

p t k0

10

20

30

40

50

60

70

80V

OT

Dur

atio

n (m

s)

Unvoiced stops are unaspirated in /s/ stop sequences.

6.345 Automatic Speech Recognition Speech Sounds 22

Page 23: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Examples of Front and Back Velars

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

keep cot/ki¤ p/ /kOt/

6.345 Automatic Speech Recognition Speech Sounds 23

Page 24: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

What is this word?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

6.345 Automatic Speech Recognition Speech Sounds 24

Page 25: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Nasal Production

• Velum lowering results in airflow through nasal cavity

• Consonants produced with closure in oral cavity

• Nasal murmurs have similar spectral characteristics

[m] [n] [4]

6.345 Automatic Speech Recognition Speech Sounds 25

Page 26: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Nasal of American English

• Three places of articulation: Labial, Alveolar, and Velar

Type NasalLabial /m/ m meAlveolar /n/ n kneeVelar /4/ ng sing

• Nasal consonants are always attached to a vowel, though canform an entire syllable in unstressed environments ([nÍ ], [mÍ ], [4Í ])

• /4/ is always post-vocalic in English

• Place identified by neighboring formant transitions

6.345 Automatic Speech Recognition Speech Sounds 26

Page 27: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of Nasals

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

simmer sinner singer/sIm5/ /sIn5/ /sI45/

6.345 Automatic Speech Recognition Speech Sounds 27

Page 28: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

What is this word?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

6.345 Automatic Speech Recognition Speech Sounds 28

Page 29: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Semivowel Production

• Constriction in vocal tract, no turbulence

• Slower articulatory motion than other consonants

• Laterals form complete closure with tongue tip,airflow via sides of constriction

[w] [y] [r] [l]

6.345 Automatic Speech Recognition Speech Sounds 29

Page 30: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Semivowels of American English

• There are 4 semivowels in American English

• Sometimes referred to as Liquids or Glides

Type Semivowel Nearest VowelGlides /w/ w wet /u/

/y/ y yet /i/Liquids /r/ r red /5/

/l/ l let /o/

• Glides are a more extreme articulation of a corresponding vowel

– Similar, though more extreme, formant positions

– Generally weaker due to narrower constriction

• Semivowels are always attached to a vowel, though /l/ can forman entire syllable in unstressed environments ([lÍ ])

6.345 Automatic Speech Recognition Speech Sounds 30

Page 31: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of Semivowels

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

we ye reed lee/wi¤ / /yi¤ / /ri¤ d/ /li¤ /

6.345 Automatic Speech Recognition Speech Sounds 31

Page 32: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Acoustic Properties of Semivowels

• /w/ and /l/ are the most confusable semivowels

• /w/ is characterized by a very low F1, F2

– Typically a rapid spectral falloff above F2

• /l/ is characterized by a low F1 and F2

– Often presence of high frequency energy

– Postvocalic /l/ characterized by minimal spectral discontinuity,gradual motion of formants

• /y/ is characterized by very low F1, very high F2

– /y/ only occurs in a syllable onset position (i.e., pre-vocalic)

• /r/ is characterized by a very low F3

– Prevocalic F3 < medial F3 < postvocalic F3

6.345 Automatic Speech Recognition Speech Sounds 32

Page 33: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

What is this word?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

6.345 Automatic Speech Recognition Speech Sounds 33

Page 34: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Affricate Production

• There are two affricates in American English:

Voiced Unvoiced/J/ jh judge /C/ ch church

• Alveolar-stop palatal-fricative pairs

• Sudden release of the constriction, turbulence noise

• Can have periodic excitation during closure

6.345 Automatic Speech Recognition Speech Sounds 34

Page 35: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Aspirant Production

• There is one aspirant in American English: /h/ (e.g., “hat’’)

• Produced by generating turbulence excitation at glottis

• No constriction in the vocal tract, normal formant excitation

• Sub-glottal coupling results in little energy in F1 region

• Periodic excitation can be present in medial position

6.345 Automatic Speech Recognition Speech Sounds 35

Page 36: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Spectrograms of Affricates and Aspirant

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

each huge/i¤ C/ /hyuJ/

6.345 Automatic Speech Recognition Speech Sounds 36

Page 37: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

What is this word?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

6.345 Automatic Speech Recognition Speech Sounds 37

Page 38: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Phonotactic Constraints

• Phonotactics is the study of allowable sound sequences

• Analyses of word-initial and -final clusters reveal:

– 73 distinct initial clusters (about 10 “foreign’’ clusters)

– 208 distinct final clusters

• Can be used to eliminate impossible phoneme sequences:

– /tk/ can’t end a word, and

– /kt/ can’t begin a word,

– Therefore, */. . . t k t . . ./ is an impossible sequence

6.345 Automatic Speech Recognition Speech Sounds 38

Page 39: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Word-Initial Consonants from MWP Dictionary

- of hy human sf sphere tr trueb be J just sk school ts tsunamibl black k can skl sclerosis tw twentybr bring kl class skr screen ty tuesdayby beauty kr cross skw square T thiefC child kw quite sky skewer Tr throughd do ky curious sl slow Tw thwartdr drive l like sm small D thedw dwell m more sn snake v veryf for mw moire sp special vw voyagerfl floor my music spl split vy viewfr from n not spr spring w wasfy few p people spy spurious y youg good pl place st state z zerogl glass pr price str street zl zlotygr great pw pueblo sw sweet zw zweibackgw guava py pure S she Z genreh he r right Sr shrewdhw which s so t to

6.345 Automatic Speech Recognition Speech Sounds 39

Page 40: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

The Syllable

• Syllable structure captures many useful generalizations

– Phoneme realization often depends on syllabification

– Many phonological rules depend on syllable structure

• Syllable structure is predicated on the notion of ranking thespeech sounds in terms of their sonority values

Sounds Sonority Values ExamplesLow Vowels 10 /a, O/Mid Vowels 9 /e, o/High Vowels 8 /i, u/Flaps 7 /r/Laterals 6 /l/Nasals 5 /m, n, 4/Voiced Fricatives 4 /v, D, z/Unvoiced Fricatives 3 /f, T, s/Voiced Stops 2 /b, d, g/Unvoiced Stops 1 /p, t, k/

6.345 Automatic Speech Recognition Speech Sounds 40

Page 41: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Syllables and Sonority

• Utterances can be divided into syllables

• The number of syllables equals the number of sonority peaks

• Within any syllable, there is a segment constituting a sonoritypeak that is preceded and/or followed by a sequence of segmentswith progressively decreasing sonority values

suprasegmentals u p r ^ s E g m E n t { l3 8 1 7 9 3 9 2 5 9 5 1 9 6

minimizationm I n I m a¤  z e S { n5 8 5 8 5 10 4 9 3 9 5

firef a¤  }3 10 (8) 9

6.345 Automatic Speech Recognition Speech Sounds 41

Page 42: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

The Syllable Template

Onset Coda Affix

Rhyme

Nucleus

• Branches marked by ◦ are optional

• Nucleus must contain a non-obstruent

• Sonority decreases away from nucleus

• Affix contains only coronals: /s, z, t, d, T, D, C, J/

• Only the last syllable in a word can have an affix

• /sp/, /st/, and /sk/ are treated as single obstruents6.345 Automatic Speech Recognition Speech Sounds 42

Page 43: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Some Examples

Affix3

Affix2

Affix1

Nucleus InnerCoda

OuterCoda

OuterOnset

InnerOnset

crown k r a w nfledged f l E J dlinks l I 4 k sdwarves d w a r v zstick st I ksixths s I k s T s

6.345 Automatic Speech Recognition Speech Sounds 43

Page 44: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Words Containing /r/ and /l/

Affix3

Affix2

Affix1

Nucleus InnerCoda

OuterCoda

OuterOnset

InnerOnset

rock r a kcrock k r a kcurt k 5 tcart k a r tcar k a rlick l I kbottle b a,lÍ tkill k I l

6.345 Automatic Speech Recognition Speech Sounds 44

Page 45: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Acoustic Realizations of /r/

rock curt car/rak/ /k5t/ /kar/

6.345 Automatic Speech Recognition Speech Sounds 45

Page 46: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Acoustic Realizations of /l/

lick bottle kill/lIk/ /batlÍ/ /kIl/

6.345 Automatic Speech Recognition Speech Sounds 46

Page 47: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Allophonic Variations at Syllable Boundaries

nitrate night rate/na¤  tre¤ t/ /na¤ t re¤ t/

6.345 Automatic Speech Recognition Speech Sounds 47

Page 48: Speech Sounds of American English - MIT · PDF fileSpeech Sounds of American English There are over 40 speech sounds in American English which ... Vowels, glides, and consonants differ

Assignment 2

6.345 Automatic Speech Recognition Speech Sounds 48