Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland
-
Upload
leo-dorsey -
Category
Documents
-
view
21 -
download
0
description
Transcript of Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland
Colleagues:• Allen Braun, NIH• Greg Hickok, UC Irvine• Jonathan Simon, Univ. Maryland
A brain’s-eye-view of speech perception
David Poeppel
Cognitive Neuroscience of Language LabDepartment of Linguistics and Department of Biology
Neuroscience and Cognitive Science ProgramUniversity of Maryland College Park
Students:• Anthony Boemio • Maria Chait • Huan Luo• Virginie van Wassenhove
“chair”“uncomfortable”“lunch”“soon”
encoding ?
representation ?
Is this a hard problem?Yes!If it could be solved straightforwardly(e.g. by machine), Mark Liberman would be in Tahiti having cold beers.
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time
- Psychophysical evidence for temporal integration
- Imaging evidence
interface with lexical items,word recognition
interface with lexical items,word recognition
hypothesis about storage:distinctive features[-voice] [+voice] [+voice][+labial] [+high] [+labial][-round] [+round] [-round][….] [….] [….]
interface with lexical items,word recognition
hypothesis about storage:distinctive features [-voice] [+voice] [+voice][+labial] [+high] [+labial][-round] [+round] [-round][….] [….] [….]
production,articulation ofspeech
interface with lexical items,word recognition
hypothesis about storage: distinctive features[-voice] [+voice] [+voice][+labial] [+high] [+labial][-round] [+round] [-round][….] [….] [….]
hypothesis about production:distinctive features[-voice] [+voice][+labial] [+high][….] [….]
production, articulation ofspeech
analysis of auditory signal spectro-temporal rep. FEATURES
interface with lexical items,wordrecognitionFEATURES
production, articulation ofspeechFEATURES
interface with lexical items,word recognition
coordinate transformfrom acoustic to articulatory space
auditory-motor interface
auditory-lexical interface
analysis of auditory signal spectro-temporal rep. FEATURES
production, articulation ofspeech
Unifying concept:distinctive feature
interface with lexical items,word recognition
coordinate transformfrom acoustic to articulatory space
analysis of auditory signal spectro-temporal rep. FEATURES
production, articulation ofspeech
STG (bilateral)acoustic-phonetic
speech codespMTG (left)
sound-meaning interface
Area Spt (left)auditory-motor interface
pIFG/dPM (left)articulatory-based
speech codes
Hickok & Poeppel (2000), Trends in Cognitive SciencesHickok & Poeppel (in press), Cognition
MTG and IFG overlap when controlling for the overt/covert distinction across tasks
Hypothesized functions:- lexical selection (MTG)- lexical phon. code retr. (MTG)- post-lexical syllabification (IFG)
Shared neural correlatesof word production and perception processes
Bilat mid/post STGL anterior STGL mid/post MTGL post IFG
Indefrey & Levelt, in press, CognitionMeta-analysis of neuroimaging data, perception/production overlap
Scott & Johnsrude 2003
Possible Subregions of Inferior Frontal GyrusBurton (2001)
Auditory Studies
Burton et al. (2000), Demonet et al. (1992, 1994), Fiez et al, (1995), Zatorre et al., (1992, 1996)
Visual Studies
Sergent et al. (1992, 1993), Poldrack et al., (1999), Paulesu et al. (1993, 1996), Sergent et al., 1993, Shaywitz et al. (1995)
Auditory lexical decision versus FM/sweeps (a), CP/syllables (b), and rest (c)
(a)
(b)
(c)
z=+6 z=+9 z=+12
D. Poeppel et al. (in press)
fMRI (yellow blobs) and MEG (red dots) recordings of speech perception show pronounced bilateral activation of left and right temporal cortices
T. Roberts & D. Poeppel(in preparation)
Binder et al. 2000
STG (bilateral)acoustic-phonetic
speech codespMTG (left)
sound-meaning interface
Area Spt (left)auditory-motor interface
pIFG/dPM (left)articulatory-based
speech codes
Hickok & Poeppel (2000), Trends in Cognitive SciencesHickok & Poeppel (in press), Cognition
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time
- Psychophysical evidence for temporal integration
- Imaging evidence
The local/global distinction in vision is intuitively clear
Chuck Close
What information does the brain extract from speech signals?
Acoustic and articulatory phonetic phenomena occur on different time scales
Phenomena at the scale of formant transitions, subsegmental cues“short stuff” -- order of magnitude 20-50ms
Phenomena at the scale of syllables (tonality and prosody)“long stuff” -- order of magnitude 150-250ms
finestructure
envelope
Does different granularity in time matter?
Segmental and subsegmental information serial order in speech fool/flu
carp/crapbat/tab
Supra-segmental information
prosody Sleep during lecture! Sleep during lecture?
The local/global distinction can be conceptualized as a multi-resolution analysis in time
Further processing
Supra-segmental information
(time ~200ms)
Segmental information
(time ~20-50ms)
syllabicity metrics tone features, segments
Binding process
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time
- Psychophysical evidence for temporal integration
- Imaging evidence
Temporal integration windows
Psychophysical and electrophysiologic evidence suggeststhat perceptual information is integrated and analysed intemporal integration windows (v. Bekesy 1933; Stevens andHall 1966; Näätänen 1992; Theunissen and Miller 1995; etc).
The importance of the concept of a temporal integration window is that it suggests the discontinuous processing of information in the time domain. The CNS, on this view, treats time not as a continuous variable but as a series of temporal windows, and extracts data from a given window.
arrow of time, physics
arrow of time, Central Nervous System
25ms
short temporalintegrationwindows
long temporalintegrationwindows
200ms
Asymmetric sampling/quantization of the speech waveform
This p a p er i s h ar d tp u b l i sh
Two spectrograms of the same word illustrate how differentanalysis windows highlight different aspects of the sounds.
(a) high time resolution - each glottal pulse visible as vertical striation
(b) high frequency resolution - each harmonic visible as horizontal stripe
(a)High time,low frequ.-resolution
(b)Low time,high frequ.-resolution
Hypothesis: Asymmetric Sampling in Time (AST)
Left temporal cortical areas preferentially extractinformation over 25ms temporal integration windows.
Right hemisphere areas preferentially integrate over long, 150-250ms integration windows.
By assumption, the auditory input signalhas a neural representation that is bilaterally symmetric(e.g. at the level of core); beyond the initial representation,the signal is elaborated asymmetrically in the timedomain.
Another way to cocneptualize the AST proposal is to say thatthe sampling rate of non-primary auditory areas isdifferent, with LH sampling at high frequencies (~40Hz)and RH sampling at low frequencies (4-10Hz).
25[40Hz 4Hz]
250
Size of temporal integration windows (ms)[Associated oscillatory frequency (Hz)]
LH RH
Pro
po
rtio
n o
f
ne
uro
na
l en
sem
ble
s
25[40Hz 4Hz]
250
Symmetric representation of spectro-temporal receptive fields in primary auditory cortex
a. Physiological lateralization
LH RH
Analysesrequiring hightemporal resolution
Analysesrequiring high spectralresolutionformant transitions
e.g. intonation contours
e.g.
b. Functional lateralization
Temporally asymmetric elaboration of perceptual representations in non-primary cortex
Asymmetric sampling in time (AST) characteristics
• AST is an example of functional segregation, a standard concept.
• AST is an example of multi-resolution analysis, a signal processing strategy common in other cortical domains (cf. visual areas MT and V4 which, among other differences, have phasic versus tonic firing properties, respectively).
• AST speaks to the “granularity” of perceptual representations: the model suggests that there exist basic perceptual representations that correspond to the different temporal windows (e.g. featural info isequally basic to the envelope of syllables, on this view).
• The AST model connects in plausible ways to the local versus global distinction: there are multiple representations of a given signalon different scales (cf. wavelets)
Global ==> ‘large-chunk’ analysis, e.g., syllabic levelLocal ==> ‘small-chunk’ analysis, e.g., subsegmental level
LH RH
Analysesrequiring hightemporal resolution
Analysesrequiring high spectralresolutionformant transitions
e.g. intonation contours
e.g.
25[40Hz 4Hz]
250
Size of temporal integration windows (ms)[Associated oscillatory frequency (Hz)]
LH RH
Pro
po
rtio
n o
f
ne
uro
na
l en
sem
ble
s
25[40Hz 4Hz]
250
Symmetric representation of spectro-temporal receptive fields in primary auditory cortex
a. Physiological lateralization
b. Functional lateralization
Temporally asymmetric elaboration of perceptual representations in non-primary cortex
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time AST model
- Psychophysical evidence for temporal integration
- Imaging evidence
Perception of FM sweeps
Huan Luo, Mike Gordon, Anthony Boemio,David Poeppel
FM Sweep Example
Time (s)0 0.0800227
–0.2
0.2
0
Time (s)0 0.0800227
–0.2
0.2
0
Time (s)0 0.0800227
0
5000
waveform
spectrogram
80msec, from 3-2 kHz, linear FM sweep
The rationale
• Important cues for speech perception:Formant transition in speech sounds
(For example, F2 direction can distinguish /ba/ from /da/)
• Importance in tone languages• Vertebrate auditory system is well equipped
to analyze FM signals.
Tone languages
• For example, Chinese, Thai…
• The direction of FM (of the fundamental frequency) is important in the language to make lexical distinctions.
• (Four tones in Chinese)
/Ma 1/, /Ma 2/ , /Ma 3/, /Ma 4/
Questions
• How good are we at discriminating these signals? determine the threshold of the duration of
stimuli (corresponding to rate) for the detection of FM direction
Any performance difference between UP and DOWN detection?
• Will language experience affect the performance of such a basic perceptual ability?
Stimuli
• Linearly frequency modulated• Frequency range studied: 2-3 kHz (0.5 oct)• Two directions (Up / Down )• Changing FM rate (frequency range/time) by changing
duration. For each frequency range, frequency span is kept constant (slow / Fast )
• Stimuli duration: from 5msec(100 oct/sec) to 640 msec (0.8 oct/sec)
Tasks• Detection and discrimination of UP versus DOWN• 2 AFC, 2IFC, 3IFC
Performance
30%
40%
50%
60%
70%
80%
90%
100%
100[5]
50[10]
25[20]
16.7[30]
12.5[40]
10[50]
6.2[80]
3.1[160]
1.6[320]
0.8[640]
FM Rate (oct/sec)[Stimulus Duration (ms)]
% C
orr
ec
t
Up
Down
Performance
30%
40%
50%
60%
70%
80%
90%
100%
100[5]
50[10]
25[20]
16.7[30]
12.5[40]
10[50]
6.2[80]
3.1[160]
1.6[320]
0.8[640]
FM Rate (oct/sec)[Stimulus Duration (ms)]
% C
orr
ec
t
Up
Down
Performance
30%
40%
50%
60%
70%
80%
90%
100%
100[5]
50[10]
25[20]
16.7[30]
12.5[40]
10[50]
6.2[80]
3.1[160]
1.6[320]
0.8[640]
FM Rate (oct/sec)[Stimulus Duration (ms)]
% C
orr
ec
t
Up
Down
2-3 kHz
1-1.5 kHz
600-900Hz
English speakers
• 3 frequency ranges relevant to speech(approximately F1, F2, F3 ranges)• single-interval 2-AFC
Two main findings:
• threshold for UP at 20ms • UP better than DOWN
Gordon & Poeppel (2001), JASA-ARLO
2IFC • To eliminate the possibility of bias strategy
subjects can use • To see whether the asymmetric performance of
English subjects is due to their “Up preference bias”
Interval 1 Interval 2
UP Down
Which interval (1 or 2) contains certain direction sound?
Same duration of the two sounds, so the only difference is direction
Results for Chinese Subjects
Expt. 2 (Chinese subjects)
0%
20%
40%
60%
80%
100%
5 10 20 30 40 50 80 160 320
Duration(ms)
Pe
rce
nt
Co
rre
ct
Up Down no significant difference
Threshold for both UP and DOWN is about
20 msec
Results for English Subjects
Expt. 2 (English subjects)
0%
20%
40%
60%
80%
100%
5 10 20 30 40 50 80 160 320
Duration(ms)
Pe
rce
nt
Co
rre
ct
up down
No difference now between UP and DOWN
Threshold for both at 20msec
No difference between Chinese and English subjects now.
3IFC
Standard Interval 1 Interval 2
Choose which interval contains DIFFERENT among the three sounds (different quality rather than only direction)
UP UP Down
Expt. 3 vs. Expt. 2 (English speakers)
0%
20%
40%
60%
80%
100%
5 10 20 30 40 50 80 160 320
Duration(ms)
Pe
rce
nt
Co
rre
ct
up down 3interval difference
Expt. 3 vs. Expt. 2 (Chinese speaker)
0%
20%
40%
60%
80%
100%
5 10 20 30 40 50 80 160 320
Duration(ms)
Pe
rce
nt
Co
rre
ct
Up Down Difference detection
No difference between Chinese and English subjects
Threshold confirmed at 20ms
3 IFC versus 2 IFC
Conclusion
• Importance of 20 msec as the threshold for discrimination of FM sweeps
- corresponds to temporal order threshold determined by Hirsh 1959
- consistent with Schouten 1985, 1989 testing FM sweeps
- this basic threshold arguably reflects the shortest integration window that generates robust auditory percepts.
Click trainsClick trains
Anthony Boemio & David PoeppelAnthony Boemio & David Poeppel
Click Stimuli
Psychophysics
Auditory visual integration: the McGurk effect
Virginie van Wassenhove, Ken Grant,David Poeppel
McGurk Effect
• Audiovisual (AV) token
• Visual (V) token
• Auditory (A) token
-40%
-20%
0%
20%
40%
60%
80%
100%
-46
7
-40
0
-33
3
-26
7
-20
0
-13
3
-67 0
67
13
3
20
0
26
7
33
3
40
0
46
7
A lead SOA (ms) A lag
Res
po
nse
Rat
e (%
)
Fusion Rate Visually driven Auditorily driven Corrected Fusion Rate
Response rate as a function of SOA (ms) in the ApVk McGurk pair.
Mean responses (N=21) and standard errors. Fusion rate (open red squares) and corrected fusion rate (filled red squares, dotted line) are /ta/ responses, visually driven responses (open green triangles) are /ka/, and auditorily driven responses (filled blue circles) are /pa/. A negative value in corrected fusion rate is interpreted as a visually dominated error response /ta/.
Identification Task (3AFC) ApVk
True bimodal responses
TWI
Simultaneity Judgment Task (2AFC) ApVk vs. AtVt and AbVg vs. AdVd
0%
20%
40%
60%
80%
100%-4
67
-40
0
-33
3
-26
7
-20
0
-13
3
-67 0
67
13
3
20
0
26
7
33
3
40
0
46
7
A Lead SOA(ms) A Lag
Sim
ult
an
eit
y R
ate
(%
)
ApVk AtVt AbVg AdVd
Simultaneity judgment task. Simultaneity judgment as a function of SOA (ms) in both incongruent and congruent conditions (A pVk and AtVt N=21; AbVg
and AdVd N=18). The congruent conditions (open symbols) are associated with broader and higher simultaneity judgment
profile than the incongruent conditions (filled symbols).
Temporal Window of Integration (TWI) across Tasks and Bimodal Speech Stimuli
-200 -150 -100 -50 0 50 100 150 200
A lead SOA (ms) A lag
AdVd
AtVt
AbVg
AbVg
AbVg
ApVk
S
ID
Stimulus TaskA Lead, Left
Boundary (ms)A Lag, Right
Boundary (ms)Plateau
Center (ms)Window Size
(ms)
ApVk
ID -25 +136 +56 161
S -44 +117 +37 161
AtVt S -80 +125 +23 205
AbVg
ID -34 +174 +70 208
S -37 +122 +43 159
AdVd S -74 +131 +29 205
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time • AST model
- Psychophysical evidence for temporal integration• FM sweeps and click trains: 20-30ms integration• AV processing in McGurk: 200ms integration
- Imaging evidence
Binding of Temporal Quanta in Speech Processing
Maria Chait, Steven Greenberg, Takayuki Arai, David Poeppel
Multi Resolution Analysis Hypothesis
“SYLLABLE”
Supra- segmental
information
(t.s ~300 ms)
(Sub)-segmental information
(t.s ~30 ms)
syllabicity stress tone feature
Binding process
Original
0-265Hz
5045-6000 Hz
265-315Hz
E1, FS1
E14, FS14
E2, FS2
Low Pass E1 (0-3 Hz)
Low Pass E14 (0-3 Hz)
Low Pass E2 (0-3 Hz)
E1×FS1
E2×FS2
E14×FS14
Filtering Computing the Envelope and fine Structure
Low Pass Filter
Multiply E by FS
0-265Hz
5045-6000 Hz
265-315Hz
E1, FS1
E14, FS14
E2, FS2
High Pass E1 (22- Hz)
High Pass E14 (22- Hz)
High Pass E2 (22- Hz)
E1×FS1
E2×FS2
E14×FS14
S_l
ow
S_h
igh
Signal Processing:
•0-6 khz
•14 channels
•spaced in 1/3 octave steps along the cochlear frequency map.
•Every two neighboring channels are separated by 50hz
Envelope Extraction
Time
Amplitude
Original Envelope
Low Passed Envelope
High Passed Envelope
Original
High Passed Low Passed
Evidence:
• Comodulation masking release• Ahissar et al. (2001) - Phase locking in the auditory
cortex to the envelope of sentence stimuli. • Shannon (1995)• Drullman (1994):
Effect of low pass filtering the envelope on speech reception:*severe reduction at 0-2Hz cutoff frequencies*marginal contribution of frequencies above 16HzEffect of High Pass filtering the envelope:*reduction in speech intelligibility for cutoff frequencies above 64Hz*no reduction in sentence intelligibility when only frequencies below 4Hz are reduced
Experiment 1Stimuli:
- 53 Sentences from the IEEE corpus. - Nonsense Syllables (CUNY)
8 Blocks – 2(voiced/voiceless)*2 vowels(/a/,/i/) *2(CV/VC)
- 3 manipulations0-3 Hz Low Pass22-40 Hz Band Pass0-3 and 22-40 Hz
Each subject hears all 53 sentences but only one manipulationper sentence. A practice block of 26 sentences precedes the experiment.
Task:- Sentences: subjects asked to write down what they heard as precisely as they
can- Syllables: 7-alternative forced choice
Presented Dichotically
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0-3
22-40
Dichotic
high-pass
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0-3
22-40
Dichotic
high-passlow-pass
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0-3
22-40
Dichotic
high-passlow-pass high-passplus
low-pass?
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0-3
22-40
Dichotic
high-passlow-pass high-passplus
low-pass?Result reflects the interaction between information carried on the short and long time scales.
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time • AST model
- Psychophysical evidence for temporal integration• FM sweeps and click trains: 20-30ms integration• AV processing in McGurk: 200ms integration• Interaction of temporal windows
- Imaging evidence
fMRI study of temporalstructure in concatenated FMs
Anthony Boemio, Allen Braun, Steven Fromm, David Poeppel
Stimulus Properties
Stimulus Properties
All 13 stimuli have nearly identical long-term spectra and RMS power over the entire 9-second stimulus duration. Stimuli differ only in segment duration which was determined by drawing from a Gaussian distribution (previous panel), with means of 12, 25, 45, 85, 160, and 300ms.
Spectrograms Ampl. vs. TimePSDs
Time (sec)0 1Frequency (Hz)100 1E4
1
1E-10
FM Stimulus
CNST Stimulus
TONE Stimulus
fMRI• Single-trial sparse acquisition paradigm(clustered volume acqu.)• 1.5T GE Signa, echo-planar sequence• 11.4s TR (9s signal,2.4s volume), TE 40ms• 24 reps/condition• SPM 99 random-effectsModel, p<0.05 corrected
SPM 99 Cohort Analysis
FMs-CNST Categorical Contrasts (p < 0.05 corr.)
Mean Supra-threshold Voxels vs. Segment Duration Summed Over All
Conditions/Auditory Areas/Hemispheres
0
50
100
150
200
12 25 45 85 160 300Segment Duration (ms)
errorbars are SEM
acquisition
threshold set bycategorical contrastto CNST stimulus-–anything below thislevel will be zero inthe SPM
Only 1secondof stimuli areshown forclarity
SegmentSegment Transition
Hemodynamic response/stimulus modelNot all segment transitions are equal.
Including the segment transitions and segments themselves, but assuming that transitions between long segments contribute more to the response than shorter ones produces the observed activation vs. segment-duration relation (left).
FM/TONECNST
STS Only
0
50
100
150
200
12 25 45 85 160 300
SOA (ms)
Sup
rath
resh
old
STS Only
0.00
0.15
0.30
0.45
0.60
0.75
Supr
athr
esho
ld V
oxel
sMTG/STS P-Value
Type 0.5994Hemi 0.3127Rate <.0001
Type x Hemi 0.9396Type x Rate 0.3772Hemi x Rate 0.0034
Type x Hemi x Rate 0.4137
STG Only
0
50
100
150
200
250
300
350
400
450
12 25 45 85 160 300
SOA (ms)
Su
pra
thre
sh
old
Vo
xels
Left Hemi
Right Hemi
STG Only
0.00
0.15
0.30
0.45
0.60
0.75
Su
pra
thre
sh
old
Vo
xels Left Hemi
Right Hemi
STG P-Value
Type 0.0933Hemi 0.7514Rate <.0001
Type x Hemi 0.8152Type x Rate 0.0578Hemi x Rate 0.7211
Type x Hemi x Rate 0.3209
MEG study of spectral responsesto complex sounds
David Poeppel, Huan Luo, Dana Ritter, Anthony Boemio, Didier Depireux, Jonathan Simon
LH RH
Sen
sitiv
ity o
f
neu
rona
l ens
embl
es
Asymmetric sampling in time (AST) hypothesispredicts electrophysiological asymmetries in specific frequency bands, gamma (25-55Hz) and theta (3-8Hz) ….
… because the hypothesized temporal quantizationis reflected as oscillatory activity.
25 250[40Hz 4Hz]
Size of temporal integration windows (ms)[Associated oscillatory frequency (Hz)]
25 250[40Hz 4Hz]
Flow chart
LH
RH RMS
Gamma BandPass
Filter
Theta BandPass
Filter
RMSGamma for LH
Gamma for RH
Theta for LH
Theta for RH
Multi-taperspectral analysis
Result
Power ratio in specific frequency bands
•
• The difference is much greater in Theta band (low frequency band) and RH activation in Theta band is greater than LH
(P(L)/(P(L)+P(R)))
Kaiser Remetz Elliptic
Gamma 0.4769 0.4751 0.4733
Theta 0.3958 0.3965 0.4210
Distribution of spectral responses
Outline
(1) Fractionating the problem in space:
Towards a functional anatomy of speech perception
(2) Fractionating the problem in time:
Towards a functional physiology of speech perception
- A hypothesis about the quantization of time • AST model
- Psychophysical evidence for temporal integration• FM sweeps and click trains: 20-30ms integration• AV processing in McGurk: 200ms integration• Interaction of temporal windows
- Imaging evidence• fMRI: temporal sensitivity and lateralization• MEG spectral lateralization
STG (bilateral)acoustic-phonetic
speech codespMTG (left)
sound-meaning interface
Area Spt (left)auditory-motor interface
pIFG/dPM (left)articulatory-based
speech codes
Hickok & Poeppel (2000), Trends in Cognitive SciencesHickok & Poeppel (in press), Cognition
LH RH
Analysesrequiring hightemporal resolution
Analysesrequiring high spectralresolutionformant transitions
e.g. intonation contours
e.g.
25[40Hz 4Hz]
250
Size of temporal integration windows (ms)[Associated oscillatory frequency (Hz)]
LH RH
Pro
po
rtio
n o
f
ne
uro
na
l en
sem
ble
s
25[40Hz 4Hz]
250
Symmetric representation of spectro-temporal receptive fields in primary auditory cortex
a. Physiological lateralization
b. Functional lateralization
Temporally asymmetric elaboration of perceptual representations in non-primary cortex
Asymmetric sampling in time (AST) builds on anatomical symmetry but permits functional asymmetry
Conclusion
The input signal (e.g. speech) must interface with higher-order symbolic representations of different types (e.g. segmental representations relevant to lexical access and supra-segmental representations relevant to interpretation).
These higher-order representation categories appear to be lateralized (e.g. segmental phonology/LH, phrasal prosody/RH).
The timing-based asymmetry provides a possible cortical ‘logistical’ or ‘administrative’ device that helps create representations of the appropriate granularity.
If this is on the right track, syllable is - at least for perception -as elementary a unit as feature/segment. Both are basic.
Analysis-by-synthesis I
Hypothesize- and test models
Analysis
Synthesis
Peripheral auditoryprocessing
Segmentation andlabeling
spectralrepresentation
Lexical accesscode
Long-term memory:Abstract lexical repr.
Recoding
acoustic-phoneticmanifestations ofwords
contextualinformation
MATCHINGPROCESS
BEST LEXICALCANDIDATE
Where do the candidatesfor synthesis come from?
Analysis-by-synthesis II
Analysis-by-synthesis model of lexical hypothesis generation and verification (adapted and extended from Klatt, 1979)
spectral analysis
analysis-by-synthesis verification;
“internal forward model”
speechwaveform
segmental analysis
lexical search
synt./seman. analysis
peripheral and central ‘neurogram’
partial feature matrix
lexical hypotheses
predicted subsequent items
best- scoring lexical candidates
acceptable word string
Analysis-by-synthesis III
spectral analysis
analysis-by-synthesis verification;
“internal forward model”
speechwaveform
segmental analysis
lexical search
synt./seman. analysis
peripheral and central ‘neurogram’
partial feature matrix
lexical hypotheses
predicted subsequent items
best- scoring lexical candidates
acceptable word string
auditorycortex
pSTG?MTG?ITG?
frontal areas (articulatory codes) - l IFG, premotor temporo-parietal areas?