Speech Perception [ ] recognize speech wreck a nice beach ?

34
Speech Perception [] recognize speech wreck a nice beach ?

Transcript of Speech Perception [ ] recognize speech wreck a nice beach ?

Page 1: Speech Perception [  ] recognize speech wreck a nice beach ?

Speech Perception

[]

recognize speech wreck a nice beach?

Page 2: Speech Perception [  ] recognize speech wreck a nice beach ?

The Major Questions in Speech Perception:

1) How do we identify the sounds we hear?

2) What about the “lack of invariance” in the speech signal?

3) What about degraded signals?

Page 3: Speech Perception [  ] recognize speech wreck a nice beach ?

How do we identify sounds?

Speech occurs at an alarming rate:

(estimates vary between 120-180 wpm)…10-15 or 25-30 phonetic segments/second!

The speech signal is continuous – there are no easily identifiable boundaries between words

The speech signal to the right

is segmented into “how are you?”

Page 4: Speech Perception [  ] recognize speech wreck a nice beach ?

How do we deal with the “lack of invariance” in a speech signal?

Lack of Invariance comes from:

• Coarticulation effects (Allophonic variation)

“Tom Burton tried to steal a butter plate”

• Speaker variation

• No exact repetition• Reduction / deletion of segments

Page 5: Speech Perception [  ] recognize speech wreck a nice beach ?

Acoustic Cues

• No single acoustic cue is reliably present for any given phoneme– for [di] and [du], the /d/ is very different, but

speakers will indicate that it’s still the same segment

• Each phoneme has more than one acoustic cue– voice-onset-time (VOT)– energy in the burst– onset frequency of the first formant– placement in syllable

Page 6: Speech Perception [  ] recognize speech wreck a nice beach ?

Voice Onset Time (VOT)

• Measure of time between the burst of air and beginning of vocal-fold vibration of the adjacent vowel

• Best single cue for distinguishing between voiced/voiceless consonants in many languages: English, Dutch, Spanish, Hungarian, Tamil, Cantonese, Thai, and Eastern Armenian… (Lisker & Abramson, 1964)

• BUT we can still interpret whispered speech! (practically all voiceless)

Page 7: Speech Perception [  ] recognize speech wreck a nice beach ?
Page 8: Speech Perception [  ] recognize speech wreck a nice beach ?

Categorical Perception(chunking of speech signals)

• Although speech is non-discrete, we perceive it discretely!

• Task: Identify the sound

0------10------20------30------40------50------60

/d/ 100% | 50% | 100% /t/

Page 9: Speech Perception [  ] recognize speech wreck a nice beach ?

Categorical Perception Yeni-Komshian and LaFontaine (1983)

– 7 stimuli, between [di:]/[ti:] (VOT 0 - 60 ms)

0----10----20----30----40----50----60 | | | | |

same 1-step 2-steps

• Task: Discriminate between these sounds(2 steps apart – so 20 ms difference on VOT)

0/20 ms – 100% same 40/60 ms – 100% same10/30 ms – 50% same 30/50 ms - 50% same

20/40 ms – 100% different

Page 10: Speech Perception [  ] recognize speech wreck a nice beach ?

What about bilinguals?

• VOT boundaries vary between languages

• Perception studies show compromise-effectsCanadian French-English bilinguals

(Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973)

Spanish-English bilinguals(Williams, 1977, 1980)

Bilinguals seem to have developed a single perceptual system!

Page 11: Speech Perception [  ] recognize speech wreck a nice beach ?

Coarticulation Effects

Phonemes are influenced by the sounds around them!

• Take naturally recorded speech• Remove vowel• Guess the vowel

– Example: “see” [si:] – remove the vowel– Play 150 ms of /s/– Can identify removed vowel (for most vowels)

Page 12: Speech Perception [  ] recognize speech wreck a nice beach ?

How is speech perceived under less than ideal conditions?

- Semantic context

- Syntactic structure

- Acoustic Information

Top-down

UNDERSTANDING

Bottom-up

Page 13: Speech Perception [  ] recognize speech wreck a nice beach ?

A demonstration

The McGurk Effect:

We use visual AND auditory cues to determine what segments we’re hearing!

Page 14: Speech Perception [  ] recognize speech wreck a nice beach ?

Top-Down Processing(using semantic and syntactic information to decode

individual words in fluent speech)

*language* *speech* *recognition* *talk*

recognize speech

[]

Bottom-Up Processing(using acoustic information to encode the speech signal)

Page 15: Speech Perception [  ] recognize speech wreck a nice beach ?

Phoneme Restoration Effect(Warren, 1970)

• Replaced sounds with a cough

• Word presented in a sentence– The bill was sent to the legi_lature.

• “Where does the cough occur?”

• Participants thought whole word was present. The /s/ was mentally restored!– It was found that the _eel was on the orange.– It was found that the _eel was on the shoe.

Page 16: Speech Perception [  ] recognize speech wreck a nice beach ?

Semantic Influences(Garnes and Bond, 1976)

• 16 tokens, spanning the spectrum of bait-date-gate

• 3 carrier sentences:– Here’s the fishing gear and the ______.– Check the time and the _______.– Paint the fence and the _______.

• If unambiguous, get semantically implausible sentences (Paint the fence and the bait.)

• If ambiguous (near a phoneme boundary), semantic context effects

Page 17: Speech Perception [  ] recognize speech wreck a nice beach ?

Slurred Speech

• Syntactic and semantic cues help!

• Words (with noise) are perceived more accurately in sentences than in isolation– (Pollack & Pickett, 1964) – recorded

conversations and excised individual words. Presented the words to listeners for identification, and only half the excised words were correctly recognized.

Page 18: Speech Perception [  ] recognize speech wreck a nice beach ?

Rules of Rapid Speech:“hanmethethimbook”

• Often can drop the las consonan

• Consonants in clusters may be modified to have the same blace of articulation/voicing– |thimbook|, |thingcarpet|, |Istambul|

– NOT: |thingbook|, |thim slice|

• Almost all vowels can be shortened

Page 19: Speech Perception [  ] recognize speech wreck a nice beach ?

Listening for Mispronunciation(Cole, 1973; Cole, Jakimik, & Cooper, 1978; Cole & Jakimik, 1980)

20-minute story. Press a button whenever you hear a mispronunciation.

• Notice more stop errors with voicing:– 70% for stops (boot to poot) – 64% for affricates (chance to jance) – 38% for fricatives (fin to vin)

• Notice almost all place changes (80-90%): – (take to pake)– no higher percentage if voicing also changed (take to gake)

• Notice more errors at beginnings of words:– 72% for word-initial segments (dish to tish) – 33% for word-final segments (split to splid)

• Conclusion: we DO use bottom-up information!

Page 20: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

Page 21: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Lit Told Hid High No

Page 22: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Ate Whole Freak Haul

Page 23: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Hike Air Rub Ouch Hue

Page 24: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Huff Ink Earn Elf Aisle

Page 25: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Ale All Heap Hop(A lollipop)

Page 26: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Butcher Ed Stew Gather(Put your heads together)

Page 27: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Lease Hummer Reap Wrest Lee

(Lisa Marie Presley)

Page 28: Speech Perception [  ] recognize speech wreck a nice beach ?

Mad Gab

• Bill Spare Reed Oh-boy!

(Pillsbury Dough Boy)

Page 29: Speech Perception [  ] recognize speech wreck a nice beach ?

Models of Speech Perception

• Motor Theory of Speech Perception– Speech signals interpreted by reference to

motor speech movements

• Cohort Model

• TRACE Model

Page 30: Speech Perception [  ] recognize speech wreck a nice beach ?

Models of Speech Perception• Motor Theory of Speech Perception

• Cohort Model: 1) The acoustic information at the beginning of a

word activates a “cohort” of possible words

2) Syntax and semantics influence the selection of the target word from the cohort

• TRACE Model

Page 31: Speech Perception [  ] recognize speech wreck a nice beach ?

Cohort size

• Standard dictionary– after 50 ms, 115 nouns share the same sounds– after 100 ms, 43 nouns – after 200 ms, 11 nouns – after 300 ms, 5 nouns

(Average word length, depending on speech rate, for one-, two-, and three-syllable words is between 550 – 830 ms)

Word recognition occurs before the isolation point! (only one word possible)

Page 32: Speech Perception [  ] recognize speech wreck a nice beach ?

Models of Speech Perception

• Motor Theory of Speech Perception

• Cohort Model

• TRACE Model: Neural Network (Elman and McClelland 1984, 1986)

– processing occurs through excitatory and inhibitory connections – in processing units called nodes

– 3 levels of nodes: features, phonemes, and words all highly interconnected

Page 33: Speech Perception [  ] recognize speech wreck a nice beach ?

Evidence for the TRACE model (or other interactive models)

• We activate all possible words from the phonology regardless of semantic fit– “He swam across to the far side of the river and

scrambled up the bank before running off” primes bank as financial institution!

– parts of words cause priming: • “trombone” primes for rib just as well as “bone”

– word boundaries don’t interfere with phonological retrieval

• nudist is primed by the phrase “new distance”

– BUT we eliminate all the irrelevant words within a few syllables

Page 34: Speech Perception [  ] recognize speech wreck a nice beach ?

For more information

• b-d-g continuumhttp://www.phonetik.uni-muenchen.de/Lehre/Skripten/Haskins/Haskins/MISC/PP/bdg/bdgau.html

• Resources on phonetics and phonologyhttp://faculty.washington.edu/dillon/PhonResources/

• Why we need prosody and lexical accesshttp://emsah.uq.edu.au/linguistics/book/flant.htm