Geoff Plant, OAM MED-EL UKbeta.batod.org.uk/content/batod/conferences/conf13/acoustic... · Second...

122
Acoustic Phonetics Geoff Plant, OAM MED-EL UK Acoustic Phonetics

Transcript of Geoff Plant, OAM MED-EL UKbeta.batod.org.uk/content/batod/conferences/conf13/acoustic... · Second...

Acoustic Phonetics

Geoff Plant, OAM

MED-EL UK

Acoustic Phonetics

Background Knowledge Believe that therapists benefit

from a sound knowledge of: Articulatory phonetics how

speech is produced Acoustic phonetics the

physical characteristics of speech

Segmental / Suprasegmental Segmental The

sounds of speech phonemes, vowels and consonants,

the building blocks

Suprasegmental The prosodic aspects of speech rate, pitch, duration, intensity

a h

sh i m sh

Spectrograms = a visual representation of speech

Horizontal axis = time in seconds Vertical axis = frequency in kHz Shading = intensity of signal

Can also present information on voice pitch (red line) and the energy waveform.

Pitch is multiplied by 10 to make it easily visible on the spectrogram.

Shown are four different productions of the

Vocal Folds Source of voiced speech, produces a vibrating air stream Vocal folds abducted Vocal folds adducted to breathe to speak

Vocal Fold Vibration

http://www.voiceproblem.org/anatomy/understanding.asp

Vocal Fold Vibration

http://www.voiceproblem.org/anatomy/understanding.asp

Average F0 Female Speakers

(YEARS)

Duffy (1970) Hollien & Paul (1967) Age SF0 SF0 11 266 Hz 13 (pre-menarche) 260 Hz 13 (post-menarche) 245 Hz 15 237 Hz 216 Hz 16 214 Hz 17 211.5 Hz

Duffy

De Pinto & Hollien (1981) Study of same 11 Australian women speakers

recorded in 1945 (aged 18 -25), and around 35 years later (aged 52 60).

1945 1980 Mean SF0 228 Hz 180 Hz Why has this change occurred? May have been excited about being recorded in

1945, but a good possibility that SF0 has declined in past

50 years Also need to consider changes in vocal folds

(change in mass & loss of elasticity) following menopause

Average F0 Male Speakers

(YEARS)

Adolescent Boys (Hollien & Hollien, 1971)

AGE

US (North South)

US (Middle)

Europe ( Poland)

12

248 Hz

269 Hz

13

229 Hz

246 Hz

14

185 Hz/158 Hz

193 Hz

218 Hz

15

160 Hz

169 Hz

16

145 Hz

18

128 Hz

Emphatic Stress

Available cues include: FREQUENCY Pitch (F0) rise on stressed word DURATION Stressed word is longer INTENSITY Stressed word is louder

John ran down the road in Denver

John ran down the road in Denver

John ran down the road in Denver

John ran down the road in Denver

John ran down the road in Denver

John ran down the road in Denver

Source

Filter

Source/Filter

Long-term-average-spectrum

frequency axis, and provides a reproducible representation of overall spectral voice characteristics

Cleveland, Sundberg, & Stone, 2000

Vowels

Crystal, 1995

Vowels Number of vowels vary from

language to language some as low as three

English has around 20 vowels and

diphthongs varies by dialect Swedish has 18 vowels 9 long

and 9 short

/ba/ /bu/ /bi/ Vowels are produced by moving the body of the

tongue up/down, forwards/backwards Changes in lip shape spread/neutral/rounded Changes in tongue/lips lead to acoustic changes Most energy is in the low frequencies

Formants energy peaks related to tongue position and lip shape, vocal tract resonances

First two formants (F1, F2) are usually sufficient to allow identification of a vowel

First Formant (F1)

heed head had hard

First formant (F1) is related to tongue height is the tongue high, mid, low?

High tongue = LOW F1 Low tongue = HIGH F1

British English F1 (Wells, 1962)

æ a HIGH LOW

Second Formant (F2)

heed hard horde

Second formant (F2) is related to tongue place is it front, mid, back?

Front tongue = HIGH F2 Back tongue = LOW F2

British English F2 (Wells, 1962)

æ FRONT BACK

Long Vowels

Long Vowels

Long Vowels

Short Vowels

Short Vowels

Short Vowels

British English F1 / F2

Fry, 1979 based on data from Wells, 1962

æ a

British English F1 / F2

Fry, 1979 based on data from Wells, 1962

FRONT

F2 BACK

LOW F1 HIGH

Australian English F1 / F2

Bernard, 1989

FRONT

F2

BACK

LOW F1 HIGH

Regional Differences (US English)

California (Hagiwara, 1997) Midwest (Hillenbrand et al, 1995)

Gender Differences (Californian English)

0

500

1000

1500

2000

2500

3000

3500

i u a i u a

Adult Male Adult Female

Freq

uenc

y in

Hz

F1 F2Hagiwara, 1997

Gender Differences (Californian English)

0

500

1000

1500

2000

2500

3000

3500

i u a i u a

F1 F2

Freq

uenc

y in

Hz

MALE FEMALEHagiwara, 1997

Gender Differences (Californian English)

Hagiwara, 1997

Age & Gender Differences (F1)

Australian girls and boys (Busby and Plant, 1995)

Age & Gender Differences (F2)

Australian girls and boys (Busby and Plant, 1995)

Diphthongs Formed by moving from one vowel-like position

to another.

moving from a position approximating [a] to a position approximating [i].

Movement involves changes in tongue position,

lip shape, etc.

Diphthongs

buy boy

Formant movement reflects change in vocal tract shape

Diphthongs

bough beau

Formant movement reflects change in vocal tract

Diphthongs

Diphthongs

0500

1000150020002500

0 200 400 600 800

Diphthongs

0500

1000150020002500

0 200 400 600 800

Diphthongs

0500

1000150020002500

0 200 400 600 800

Consonants

syllables, produced when the vocal tract is either blocked or so restricted that there is audible

Crystal, 1995

Consonants Consonants in English are categorized by: Voicing are they voiced or voiceless? Manner of Articulation how are they

produced? Place of Articulation where are they

produced?

Voiced/Voiceless Voiced vocal cords closed & vibrating Voiceless vocal cords open English contrasts Voiced Voiceless

Voiced/Voiceless pairs only differ in voicing

/ / / / / / / /

Critical cue to initial voicing in stop consonants is

time from consonant release to onset of voicing VOT (Voice-Onset-Time)

Pea Bee Tea Dee Key Ghee

Pea Bee

Tea Dee

Key Ghee

Critical cue to voicing in initial continuant consonants is the presence/absence of low frequency speech energy

Final Voicing

Primary cue to final consonant voicing is the duration of the preceding vowel

Vowel duration preceding a voiceless

consonant is shorter than the duration of the vowel when it precedes a voiced consonant

cop/cob tap/tab

cot/cod tat/tad

pick/pig puck/pug

sooth/soothe sheath/sheathe

safe/save waif/wave

peace/peas bus/buzz

etch/edge batch/badge

Manner of Articulation Stops period of silence, burst upon release [ ]

Fricatives air forced through narrow constriction non-sibilant [ & sibilant fricatives

Affricates stop closure followed by fricative-like release

Nasals airstream passes through the nose

Semi-vowels vowel like movement, glide

STOPS period of silence, burst release

FRICATIVES lower intensity, broad band or high frequency energy

AFFRICATES stop closure followed by fricative-like release = +

NASALS airstream comes out through nose, low frequency energy peak around 300 Hz, weak upper frequencies

My boring mate Norman knows nothing

My boring mate Norman knows nothing

SEMI-VOWELS glide from a vowel-like target to the following vowel. Vital cue for /r/ is F3 movement

Semivowels F1/F2

LOW

MID

HIGH

Semivowels F2/F3

Place of Articulation Where does the constriction/narrowing occur

at the lips, the area behind the teeth, the soft palate, etc?

In English lips, lips/teeth, lips/tongue, ridge

behind upper teeth, hard palate, soft palate, glottis

All are voiced stops, differ only in their place of articulation. Cues to identification include spectrum of the burst (red arrow), direction of F2 movement (blue arrow), and duration of aspiration phase

Place of Articulation

Place of Articulation

Place of Articulation

Place of Articulation

Place of Articulation

Place of Articulation

Place of Articulation

Place of Articulation

harp hart hark

harp edit

harp hart hark edit

Place of Articulation

Look at average spectrum for initial in 30 msec following consonant release

Place of Articulation

30 msec average spectrum for following consonant release (Red = FFT analysis, Blue = LPC analysis)

Place of Articulation

30 msec spectra for

Place of Articulation

30 msec spectra for

Place of Articulation

30 msec spectra for

All are voiceless fricatives. Cues include frequency shape of consonant, intensity, and F2 movement (transition).

Place of Articulation

I say more I say no

Place of Articulation

I say m ore I say n o

Place of Articulation

Place of Articulation

: :

Place of Articulation

A common error

Another common error

beet boot sheet shoot

Voicing is usually easy for most hard-of-hearing

and many deaf people to pick up via aided hearing. It is impossible to see via lipreading

Manner is more difficult, but many deaf people can pick differences between stops/continuants, nasals/orals, etc. Some lipreading cues, but not many

Place is almost always difficult for hard of hearing and deaf people, BUT many cues via lipreading

Geoff Plant Hearing Rehabilitation Foundation Somerville, MA 02143 USA

[email protected] Website www.hearf.org

Contact