Speech Perception Richard Wright Linguistics 453.
-
date post
21-Dec-2015 -
Category
Documents
-
view
232 -
download
3
Transcript of Speech Perception Richard Wright Linguistics 453.
![Page 1: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/1.jpg)
Speech PerceptionSpeech Perception
Richard Wright
Linguistics 453
![Page 2: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/2.jpg)
Class OverviewClass Overview
PhysiologyAuditory Shaping of the signalAuditory CuesNormalization and ContextExperiment types
![Page 3: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/3.jpg)
Physiology 1: The EarPhysiology 1: The Ear
Outer: Pinna, Ear Canal, Ear Drum
Middle: Ossicles, Oval Window
Inner: Cochlea — Basilar Membrane, Tectorial Membrane, Hair Cells
![Page 4: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/4.jpg)
ear canal (external auditory meatus)
ear drum(tympanic membrane)
ossicular chain
pinna
cochlea
auditory nerve
oval window
![Page 5: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/5.jpg)
Physiology 1: The Outer EarPhysiology 1: The Outer Ear
Pinna: directional hearingEar Canal: high frequency emphasis
(very short resonator closed at one end)Ear Drum: membrane’s vibrations
convert pressure fluctuations to mechanical movement
![Page 6: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/6.jpg)
Physiology 1: The Middle EarPhysiology 1: The Middle Ear
Convert eardrum movement to movement of oval window — overcomes air to fluid impedance.
Lower frequency emphasis (500-4000 Hz)
Lessen impact of very loud noises by stiffening (damping)
Ossicles (Malleus, Incus, Stapes):
![Page 7: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/7.jpg)
Physiology 1: The Inner EarPhysiology 1: The Inner EarCochlea: fluid filled cavity, wave propagation
in fluid caused by movement of oval window
Basilar Membrane:stiff and narrow at base — wide and flaccid at apex: base = high frequencies and apex = low frequencies (acts like series of band pass filters). Most of membrane is devoted to sounds below 5000 Hz.
Shearing between Basilar and Tectorial
membranes displace hair cells exciting cochlear nerve endings
![Page 8: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/8.jpg)
Physiology 2: Nerual PathwayPhysiology 2: Nerual Pathway
Cochlear NerveCochlear NucleusLateral LemniscusAuditory Cortex
![Page 9: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/9.jpg)
Superior olive
Medial geniculate
CortexAuditory raditaions
Lateral lemniscus
Inferior coliculus
Probst
Monakow
Held
Cochlear nerve
Mid-line
CIC
Cochlearnucleus
![Page 10: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/10.jpg)
Auditory Shaping of the SignalAuditory Shaping of the Signal
Frequency Selectivity: Changes in frequency of stimulus do not result in equivalent changes in sensitivity
Non-linear loudness sensitivityPhase Locking and noise reductionLateral Inhibition and TuningOnsets and neural spikes
![Page 11: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/11.jpg)
Bark function
0
2
4
6
8
10
12
14
16
18
0 1000 2000 3000 4000 5000
Hz
Frequency SelectivityFrequency Selectivity
![Page 12: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/12.jpg)
rapid
adaptation
short term
adaptation
consonant release transient
formant transitions
steady state (saturated response)
schematic of
speech signal
F2
F1
spontaneous level
of fiber
Onset AdvantageOnset Advantage
Delgutte and Kiang (1984)
![Page 13: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/13.jpg)
What are Cues?What are Cues?
Cues: information in the signal that listeners use in recovering the segmental content of the utterance– Place cues– Manner cues– Voicing cues– Vowel quality cues
![Page 14: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/14.jpg)
Distribution of CuesDistribution of Cues
F3
F2
F1
stop release burst
fricative noise
F2 transitions nasal pole and zero
Place cues
![Page 15: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/15.jpg)
Distribution of CuesDistribution of Cues
Manner cues
F3
F2
F1
stop release burst
fricative noise nasal pole and zero
abruptness and
degree of attenuation
slope of formant
transitions
nasalization
of vowel
![Page 16: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/16.jpg)
Distribution of CuesDistribution of Cues
Voicing cues
F3
F2
F1
release burst amplitude
aspiration noise
vowel
duration
vowel duration
VOT
periodicity
stricture
duration
![Page 17: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/17.jpg)
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Distribution of CuesDistribution of Cues
![Page 18: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/18.jpg)
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions
Distribution of CuesDistribution of Cues
![Page 19: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/19.jpg)
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions
Nasals contain strong manner cues but weak place cues
Distribution of CuesDistribution of Cues
![Page 20: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/20.jpg)
Onset AdvantageOnset Advantage
Redundancy advantage:Onset stops automatically have both a releaseburst and a set of formant transitions
Coda stops may be unreleased and thereforehave less cue redundancy
![Page 21: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/21.jpg)
Onset AdvantageOnset Advantage
Onset consonant with flanking vowels
F2 Transitions
F2 Transition
Release burst
Abrupt attenuation
Abrupt attenuation
VOT
Vowellength
Vowellength
Constriction duration
![Page 22: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/22.jpg)
Experimental TasksExperimental Tasks
IdentificationDiscriminationRatingMethod of Adjustment (MOA)
![Page 23: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/23.jpg)
Exp.Tasks 1: IdentificationExp.Tasks 1: Identification
Listeners are asked to identify stimuli as speech sounds...
Open set: options openForced choice: listeners choices
constrained
![Page 24: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/24.jpg)
Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda
Stimuli– male speaker of American English– /ba, da, ga, ab, ad, ag/ bursts excised– 16 bit, 22 kHz– mixed in three levels of white noise:
• no noise
• noise at 2 dB above RMS of signal
• noise at 2 dB below RMS of signal
![Page 25: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/25.jpg)
Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda
Task– onsets & codas mixed and randomized– presented binaurally over headphones– 3 way forced choice task: “B D G”– labeled button press– self paced
![Page 26: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/26.jpg)
Exp.Tasks 2: DiscriminationExp.Tasks 2: Discrimination
Listeners are asked to respond “same” or “different” to presented sets of stimuli
AX discrimination: fixed initial stimulus, variable second stimulus (same/different)
ABX discrimination: two fixed initial stimuli, variable third stimulus (same A, same B)
![Page 27: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/27.jpg)
Experiment 2: vowel discriminationExperiment 2: vowel discrimination
Stimuli– Synthetic vowel continuum– Equal steps: 2.37 Bark along F1-F2
dimension– 16 bit, 11 kHz– variable AX design
![Page 28: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/28.jpg)
Task– same/different response to vowel pairs– presented binaurally over headphones– labeled button press– speeded (limited time to decide)
Experiment 2: vowel discriminationExperiment 2: vowel discrimination
![Page 29: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/29.jpg)
Exp.Tasks 3: RatingsExp.Tasks 3: Ratings
Listeners are asked to rate a stimulus in some way: goodness, similarity, accentedness
Example: Effect of intonational contour on naturalness: listeners hear sentences with and without f0 contour and rate naturalness on a 1-5 scale.
![Page 30: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/30.jpg)
Exp.Tasks 4: MOAExp.Tasks 4: MOA
Listeners are asked to adjust a stimulus along some dimensions until it fits some criterion: matches another stimulus, sounds most natural, matches a category, etc. (can be identification, discrimination, or rating exp.)
![Page 31: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/31.jpg)
Advantages and shortcomings 1Advantages and shortcomings 1
Open identification– Good: most natural, subjects understand
– Bad: time consuming, little control of variables, stats difficult (non-comparable resoponses across subjects
Forced choice identification– Good: less time consuming, control of response variables
– Bad: not as natural
![Page 32: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/32.jpg)
Advantages and shortcomings 2Advantages and shortcomings 2
Discrimination– Good: allows experimenter to map relationship between
classification and discrimination
– Bad: very time consuming, not at all natural, unintuitive to subjects
![Page 33: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/33.jpg)
Advantages and shortcomings 3Advantages and shortcomings 3
Rating– Good: allows experimenter to map preferences in a
multidimensional space, allows for correlation between one or more aspects of stimulus
– Bad: hard to control interactions between preferences and stimulus variables, not that natural
![Page 34: Speech Perception Richard Wright Linguistics 453.](https://reader034.fdocuments.net/reader034/viewer/2022052317/56649d545503460f94a315c0/html5/thumbnails/34.jpg)
Advantages and shortcomings 4Advantages and shortcomings 4
Method of adjustment (MOA)– Good: much quicker method of mapping multidimensional
perceptional
– Bad: not natural, complex interaction of stimulus variables