A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a...
-
Upload
janis-gilbert -
Category
Documents
-
view
214 -
download
1
Transcript of A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a...
A koktelparti effektus
Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet?
Mik a koktelparti effektus faktorai es dimenzioi?
Lehetseges-e a kutatonak a realis helyzetet leegyszerusitenie es leszukitenie ahhoz, hogy igy parametrikus kiserleteket vegezzen el? Lehetseges-e az ilyen kiserletek eredmenyeit visszavezetni a teljes, leegyszerusitetlen realis helyzethez?
Segitseg jon Albert Bregmantol (“Auditory Scene Analysis”, 1990)
“Stream segregation” – hangzo folyamatok elkulonitese
Ket fele elkkulonites:
1 automatikus, primitiv (periferikusan eredo, alulrol-felfele halado)
2 sema-altal meg hatarozott (magas szinten eredo, felulrol-lefele halado)
Csoportositasi elv (=grouping principle)
hangokat vagy hangok komponenseit akkor tekintjuk egy forrasbol eredonek, ha csoportosithatjuk oket kozos jellegzetesseg(ek) alapjan, pl. ugyanazon alaphang felhangjai, vagy ugyanolyan idoburkolat, vagy ugyanolyan beesesi szog, stb.
The “cocktail-party effect:”
• (trying to) follow one particular talker’s speech in a crowd
3
2
1
0.5
4
kHz
3
2
1
0.5
4
kHz
Auditory Segregation:• Definitions• The psychophysical space of auditory
segregation dimensions• Part I: -- The problem of dimensionality
-- 1D data: discrimination in informational masking
--Prediction of 2D segregation from 1D informational masking estimates
• Part II: -- Correlation between pairs of segregation dimensions computed
from obtained and predicted 2D data
THE "COCKTAIL PARTY EFFECT":
One speech source (=the "target") is segregated from other simultaneous speech sources
FACT: Simultaneous speech sources differ along multiple
dimensions Differences along dimensions have to be
resolved Values on all dimensions have to be correctly
associated with a given source
DEFINITION OF SEGREGATION: Two simultaneous sounds that differ along two dimensions are segregated when (1) the differences along both dimensions can be resolved and (2) the correct values of each dimension are associated with either sound
Thus, if Speaker “A” utters “X” and Speaker “B” utters “Y” , saying that “AX”“BY”
indicates segregation, but “AY”“BX” does not
f0 lo
F hi
f0 lo
F low
(Interval 1) (Interval 2)
Stimulus:
Response:
f0 hi
F low
f0hi
F hi
“High f0 pitch’s formant went
from low to high”Correct
f0 lo : low pitch; f0 hi: high pitch
F
: high formant ;Fhi
Right :low formant
Dimensions: pitch and (unique) formant peak frequency
THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE:
“WHAT” “WHEN”
“WHERE”
0OAzimuth
0O Elevation
Frequency (Hz)
400 500 600
Am
plit
ude
700
RandomMasker(P = PMSK)
or
150 ms 150 ms
Signal (P = PSIG)
300 ms
mS
mM
(Subject’s own HRTFs)
F (spectral region)
f0 (pitch)
Normalized temporal structure difference units
Normalized spectral difference units
Normalizedspatial difference
units
1
3
3
32
1
2
1
2
Segregation thresholds
Segregation
(d'=3)
Fusion
(d'=1)
ExperimentalAudiology
ResearchVAMartinez CA
(t)
()
(f)
THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE: “WHAT” “WHEN” “WHERE”
Outside the “WHAT”/ “WHEN”/ “WHERE” space:
SEGREGATIONInside the “WHAT”/ “WHEN”/ “WHERE space: FUSIONBetween the “WHAT”/ “WHEN”/ “WHERE
dimensions: TRADE-OFF
TRADE-OFF:The Heisenberg-Gabor principle
f t = k
extended: f t = k or f t [(1-ft )(1-f )(1-t )]
-1 = k
Are the three dimensions orthogonal?
Why is orthogonality (or correlation) important?
Can we determine the correlation between the dimensions?
Questions:
f0 lo
Left
f0 lo
Right
(Interval 1) (Interval 2)
Stimulus:
Response:
f0 hi
Right
f0hi
Left
“High f0 pitch went from
Right to Left”: Correct
f0 lo : low pitch; f0 hi: high pitch
: left of midline;Left
Right :right of midline
Dimensions: pitch and azimuth
Hypothesis:
1D resolution in “informational noise” is a prerequisite for segregation, where “informational noise” could be:
Informational noise:
• Pitch: many f0’s each with many components (same location and flat envelope)
• Location: many locations (same spectrum/pitch and flat envelope)
• Envelope structure: random pattern of bursts (same spectrum/pitch and location)
1. Informational masking within one dimension between streams2. Interference of information between dimensions
Goal:
Compare thresholds obtained for different dimensions
0.01 0.10 1.00f0/f0 (Hz)
0
5
10
15
20S
/N a
t Th r
e sho
ld (
d B)
S3S2S1
SUBJECT
Pitch diff. (3-comp. signals)
Informational maskers
Spectrum < 1 kHz
S3S2S1
SUBJECT
10 100Azimuth (deg)
0
7
14
21
28
35S
/N a
t Th r
e sho
ld (
d B)
Azimuth diff. (multicomp. signals)
Informational maskers
Rhythmic pattern (3-comp. signal)
Informational maskers
0 4 8 12Weighted AM-Depth S/N (dB)
5
7
9
11
13
15S
/N a
t Th r
e sho
ld (
d B)
S3S2S1
SUBJECT
Diff. rhythmic patterns (3-comp. signals)
Finding: because the masking functions are (quasi-) linear in log, i.e.,
b log D constant ,informational masking in 1D resolution seems to obey the power law
Db = C
• Use b obtained from 1D informational masking results to transform 2D thresholds D into informational masking S/N thresholds in dB
2D segregation on dimensions D1, D2 can be predicted from one-dimensional observations through the trade-off
D1 D2 = k
or
b1 log D1 = log k – b2 log D2
Since
b log D constant ,
and informational masking in 1D resolution approximately obeys the power law
Db = C ,
b1 b2
Spectrum < 1 kHz
Azimuth vs. rhythm in 1D (predicted)
7 8 9 10 11 12 131-D INFO MASKING FOR TEMP FLUCT (dB)
50
60
70
801-
D I
NF
O M
AS
KIN
G F
OR
AZ
IMU
TH
RE
SO
L (d
B)
S3S2S1
SUBJECT
Spectrum 1<2.5 kHz
5 10 15 201-D INFO MASKING FOR f0/f0 RESOL
0
10
20
30
401-
D I
NF
O M
AS
KIN
G F
OR
AZ
IM R
ES
OL
(dB
)
S3S2S1
SUBJECT
Azimuth vs. frequency in 1D (predicted)
Frequency vs. rhythm in 1D (predicted)
7 8 9 10 11 12 131-D INFO MASKING FOR TEMP FLUCT (dB)
0
5
10
151-
D I
NF
O M
AS
KIN
G F
OR
f0/
f 0 R
ESO
L
S3S2S1
SUBJECT
Now let’s see real 2D segregation data
• First use x/x scales for both dimensions
• Then show the same data with both scales transformed to dB as indicated by the 1D informational masking data
Spectrum < 2.4 kHz
Azimuth vs. Pitch (rhythm same)
3 4 5 6 7 8 9 102D Pitch Segreg. Info. Masking (dB)
10
15
20
25
30
2D A
zim
u th
Se g
reg .
Inf
o. M
ask i
n g (
dB)
s3s2s1
SUBJECT
0.1 1.0f0/f0 (Pitch)
10
100
Azi
mut
h S
egre
g . T
hres
h . (
deg )
s3s2s1
SUBJECT
Rhythm vs. Spectrum/Pitch (azimuth same)
2D INFO. MASK. FOR SPECTR./PITCH SEGREG. (dB)
Average fmod = 4.375 Hz
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8SPECTR/PITCH f/f
0
1
2
3
4
5
6
TE
MP.
FLU
CT .
SE
GR
EG
. T
HR
ES
H.
(dB
)
S3S2S1
SUBJECT
10 12 14 16 18 2010
20
30
40
50
60
70
2D I
NF
O.
MA
SK
. F
OR
RH
YT
HM
SE
GR
EG
. (d
B)
S3S2S1
SUBJECT
Spectrum < 2.4 kHz
Azimuth vs. rhythm (pitch/spectrum same)
5 6 7 8 9 10TEMP FLUCT. FOR SEGREGATION (dB)
10
100
AZ
IMU
TH
TH
RE
SH
. F
OR
SE
GR
EG
. (D
EG
)
S3S2S1
SUBJECT
4 5 6 7 8 9 10 11TEMP FLUCT. FOR SEGREGATION (dB)
0
10
20
30
40
INF
O.
MA
SK
. T
HR
ES
H.
FO
R A
ZIM
. S
EG
RE
G.
(dB
)
S3S2S1
SUBJECT
Now let us compare predicted and obtained slopes of informational masking of one dimension by another:
The difference between predicted and observed slopes will be estimated by changing the angle between the x and y axes of the 1D data lines until they overlap with the 2D data lines.
The difference between predicted (=orthogonal) and obtained 2D slopes for each subject thus provides an estimate of the correlation between segregation information carried by a particular pair of dimensions in the “cocktail-party” effect for that subject
Spectrum < 2.4 kHz
pred./orth.
1-D INFO MASKING FOR AZIM RESOL (dB)
obs.=0.220
=0.152
=0.003
5 10 15 201-D INFO MASKING FORf0/f0RESOL
0
10
20
30
40
S3S2S1
SUBJECT
Azimuth vs. Pitch (spectrum and rhythm same)
6 7 8 9 10 11 12 131-D INFO MASKING FOR TEMP FLUCT (dB)
40
50
60
70
80
1-D
IN
FO
MA
SK
ING
FO
R A
ZIM
UT
H R
ES
OL
(dB
)
S3S2S1
SUBJECT
S3S2S1
SUBJECT
Spectrum <1kHz
obs.
pred./orth.
=0.217
=0.017
=0.251
Azimuth vs. rhythm (pitch and spectrum same)
=0.307
=0.340=0.053
pred./orth.
obs.
5 7 9 11 13 151-D INFO MASKING FOR TEMP FLUCT (dB)
0
20
40
60
1-D
IN
FO
MA
SK
ING
FO
R f
0/f 0
RES
OL
S3S2S1
SUBJECT
S3S2S1
SUBJECT
Spectrum/Pitch vs. Rhythm (location same)
Spectrum 1< kHz
• By and large, segregation cues provided by the three cardinal dimensions are not independent
• To segregate two streams, listeners will obtain cues from whatever dimension yields them the most easily
Conclusions
• Non-optimal choice of cues leads to interference between streams and between dimensions
• Segregation is likely to be helped by highlighting streams rather than by aiding the processing of a given dimension