Piotr Majdak, Robert Baumgartner, and Bernhard Laback · Piotr Majdak, Robert Baumgartner, and...

1
1. INTRODUCTION The ability to localize sound sources in sagittal planes (top, down, front, back, see Fig. 1) varies considerably across listeners. The reasons for being a better or poorer sound localizer are unclear yet. The directional acoustic spectral features, described by directional transfer functions (DTFs), also vary considerably across listeners. We investigated to what extent the listener-specific quality of directional cues provided by the DTFs contrib- utes to the listener-specific sound-lo- calization performance. We used a model of sagittal-plane sound localization, calibrated to a condition when listening with others’ ears following a complete re-calibra- tion to the tested DTFs. 2. METHODS Localization model (Baumgartner et al., 2014): Inputs: DTFs for the filtering of the target sound Template DTFs the listener is calibrated to Sensitivity S: listener's efficiency in processing localization cues (S=0: extremly good localizer; S=2: extremly poor localizer) Output: probability of responding at a polar angle given model parameters Structure (simplified, see Fig. 2): Peripheral processing (the same for the target and template): Filtering of incoming sound by a binaural pair of DTFs; Spectral analysis: Gamma- tone filter bank, temporal average, logarithmic amplitude; Simulation of DCN functionality by positive spectral gradient extraction → Internal cue considering spectral rising edges only Comparison process (target vs. each template): Absolute differences of internal cue averaged across frequency bands → Distance metric Spatial mapping: Distance metric → Monaural similarity (listener-spe- cific ability to discriminate spectral cues) → Binaural similarity (binau- ral weighting according to target's lateral angle) → Sensorimotor map- ping (Gaussian response scatter constant in elevation accounting for lat- eral compression of polar dimension) → Probability mass vector (PMV, normalization assuming the discrete distribution of similarity indices be- ing proportional to distribution of polar-angle responses) Evaluated for various effects of modifications of DTFs or target sounds on localization performance (Baumgartner et al., 2014) Implemented in the Auditory Modeling Toolbox: baumgartner2014 Listener-specific localization responses from Goupell et al. (2010) and Maj- dak et al. (2013): Subjects: 23 normal-hearing listeners Stimuli: 500 ms white noise frozen token; filtered by listener-specific free- field DTFs measured at distance of 1.2 m for elevations from −30° to 80° and azimuths all around the listener (spacing: 2.5° within ±45°; 5° else) Apparatus: virtual visual environment, manual pointer response Visual training (ego-shooter game) and acoustic training (with feedback) Localization test: at least 300 trials (without feedback, mixed with other conditions of the corresponding study) Data analysis: Quadrant error rate (QE): Relative occurrence of target-to-response devia- tions > 90°, i.e., typically front/back confusions RMS local polar errors (PE): Combined measure of accuracy and precision of local responses (i.e., QE removed) Predicted QE and PE: expectancy values from the model PMVs Relation between actual and predicted performance: Pearson's correlation coefficient, r Model calibration: Listener-specific DTFs for template and incoming sound PMVs: for each listener, for all target directions, and for varying sensitivity S ranging from 0 to 2 in steps of 0.1 (Fig. 3) QEs and PEs: for each lis- tener, as functions of sensi- tivity (Fig. 4). Optimal sensitivity: smallest squared sum of residues be- tween the actual and pre- dicted performance (Fig. 5A) 3. RESULTS DTF sets and the sensitivity S systemati- cally permuted across the members of the listener group: Each listener indexed by k S k is the sensitivity of the k-th listener D k is the DTF set of the k-th listener PEs and QEs predicted as functions of S k and D k (see Fig. 6) Reference: Optimal listener-specific sensitivity (Fig. 5A) Effect of listener-specific DTFs: Same sensitivity S k for all listeners Fig. 5B: Sensitivity for best correlation considering PE and QE Fig. 5C: Average sensitivity (c.f. Lan- gendijk and Bronkhorst, 2002) Effect of listener-specific sensitivity: The same DTF set D k for all listeners Fig. 5D: DTF set showing best correla- tion (DTF set of listener k = 2) Comparison of variance caused by one of the two factors: Listener-specific sensitivity: SD of errors for listener-constant DTF Listener-specific DTF sets: SD of errors for listener-constant sensitivity DTF set contributed less than sensitivity to the performance variability of the group (see Fig. 7). On average, the factor sensitivity caused more than twice as much variability as the factor DTF set. 4. CONCLUSIONS The listener-specifically calibrated model predictions yielded a correlation between actual and predicted performance of 0.91. The permutation of the listener-specific sensitivity affected the predicted lo- calization performance much more (correlation: 0.22) than the permutation of the DTF sets (correlation: 0.82). This suggests that the across-listener variability in sagittal-plane localization performance is only marginally attributable to the quality of directional cues in human DTFs. Rather, the sensitivity parameter, supposed to represent the listener's effi- ciency in processing directional cues (e.g., spectral-shape sensitivity, Andéol et al., 2013), appears to be more important. This finding might be relevant for developing non-spatial experimental mea- sures serving as predictors of listener-specific sound-localization perfor- mance. 5. REFERENCES Baumgartner, R., Majdak, P., Laback, B. (2014).”Modeling sound-source localization in sagittal planes for human listeners.” J Acoust Soc Am 136, 791-802. Langendijk, E. H. A. and Bronkhorst, A. W. (2002). “Contribution of spectral cues to human sound localization,” J Acoust Soc Am, 112, 1583–1596. Majdak, P., Walder, T., and Laback, B. (2013). “Effect of long-term training on sound localization performance with spectrally warped and band-limited head-related transfer functions,” J Acoust Soc Am, 134, 2148–2159. Goupell, M. J., Majdak, P., and Laback, B. (2010). “Median-plane sound localization as a function of the number of spectral channels using a channel vocoder,” J Acoust Soc Am, 127, 990–1001. Andéol, G., Macpherson, E. A., and Sabin, A. T. (2013). “Sound localization in noise and sensitiv- ity to spectral shape,” Hear Res, 304, 20–27. Listener-Specific Sound-Localization Performance: A Matter of Better Ears? Piotr Majdak, Robert Baumgartner, and Bernhard Laback Acoustics Research Institute, Austrian Academy of Sciences, Austria 38 th Annual Mid- Winter Meeting of the Association for Research in Otolaryngology February 21-25 2015 Baltimore, MD Electronic copy Corresponding author: Piotr Majdak, Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, A-1040 Wien, Austria E-Mail: [email protected] http://www.kfs.oeaw.ac.at This work was supported by the Austrian Science Fund (FWF P 24124). Fig. 3: Actual and modeled localization. Actual localization responses (circles) and modeled re- sponse probabilities (PMVs, brightness encoded) calculated for three sensitivities S and four exem- plary listeners indexed by k. Fig. 1: Interaural-polar coordinate system. Polar angle Lateral angle Fig. 6: Localization performance depends on the sensitivity and DTF set. Predicted PEs and QEs as functions of the sensitivity of k-th listener (Sk) and DTF set of k-th listener (Dk). The listener group was sorted such that the sensitivity increases with increasing k and the same sorting order was used for Dk. Fig. 7: DTF set contributes less than sensitivity to the performance variability of the group. SDs of PE and QE as functions of either listener-constant DTF set calculated for listener-specific sensitivities (Sk varied, blue squares) or the listener- constant sensitivity calculated for listener-specific DTF sets (DTF varied, red cir- cles). The abscissa is sorted by the ascending listener-specific sensitivity Sk. Fig. 2: Simplified structure of the localization model from Baumgartner et al. (2014). The incoming target sound is peripherally processed, compared to an internal template set, and mapped yielding the probability for responding at a given polar angle. Blue arrows: free parameters of the corresponding sections. Fig. 4: Predicted localization performance de- pends on the sensitivity. PEs and QEs as func- tions of S for four exemplary listeners (color from Fig. 3). Lines: model predictions. Symbols: actual performance obtained in the localization experiment (placement on the abscissa: optimal listener-specific sensitivity Sk). Fig. 5: Predicted versus actual localization performance. Predicted PEs and QEs as functions of the actual PEs and QEs, respectively, for each listener. (A) Optimal listener-specific sensitivities Sk. (B) Listener-constant sensitivity yielding best correlation for PE and QE, S = 1.05. (C) Listener- constant sensitivity corresponding to across-listener average, S = 0.70. (D) Listener-specific sensi- tivity Sk and the same DTF set (k = 2) for all listeners. The correlation coefficient is denoted by r. Response probabilities Sensitivity Incoming sound DTF set Template set Peripheral Processing Comparison Process Spatial Mapping Peripheral Processing

Transcript of Piotr Majdak, Robert Baumgartner, and Bernhard Laback · Piotr Majdak, Robert Baumgartner, and...

Page 1: Piotr Majdak, Robert Baumgartner, and Bernhard Laback · Piotr Majdak, Robert Baumgartner, and Bernhard Laback Acoustics Research Institute, Austrian Academy of Sciences, Austria

1. INTRODUCTION

➢ The ability to localize sound sources in sagittal planes (top, down, front, back, see Fig. 1) varies considerably across listeners. The reasons for being a better or poorer sound localizer are unclear yet.

➢ The directional acoustic spectral features, described by directional transfer functions (DTFs), also vary considerably across listeners.

➢ We investigated to what extent the listener-specific quality of directional cues provided by the DTFs contrib-utes to the listener-specific sound-lo-calization performance.

➢ We used a model of sagittal-plane sound localization, calibrated to a condition when listening with others’ ears following a complete re-calibra-tion to the tested DTFs.

2. METHODS

➢ Localization model (Baumgartner et al., 2014):• Inputs: ▫ DTFs for the filtering of the target sound▫ Template DTFs the listener is calibrated to▫ Sensitivity S: listener's efficiency in processing localization cues

(S=0: extremly good localizer; S=2: extremly poor localizer)

• Output: probability of responding at a polar angle given model parameters

• Structure (simplified, see Fig. 2):

▫ Peripheral processing (the same for the target and template): Filtering of incoming sound by a binaural pair of DTFs; Spectral analysis: Gamma-tone filter bank, temporal average, logarithmic amplitude; Simulation of DCN functionality by positive spectral gradient extraction → Internal cue considering spectral rising edges only

▫ Comparison process (target vs. each template): Absolute differences of internal cue averaged across frequency bands → Distance metric

▫ Spatial mapping: Distance metric → Monaural similarity (listener-spe-cific ability to discriminate spectral cues) → Binaural similarity (binau-ral weighting according to target's lateral angle) → Sensorimotor map-ping (Gaussian response scatter constant in elevation accounting for lat-eral compression of polar dimension) → Probability mass vector (PMV, normalization assuming the discrete distribution of similarity indices be-ing proportional to distribution of polar-angle responses)

• Evaluated for various effects of modifications of DTFs or target sounds on localization performance (Baumgartner et al., 2014)

• Implemented in the Auditory Modeling Toolbox: baumgartner2014

➢ Listener-specific localization responses from Goupell et al. (2010) and Maj-dak et al. (2013):

• Subjects: 23 normal-hearing listeners

• Stimuli: 500 ms white noise frozen token; filtered by listener-specific free-field DTFs measured at distance of 1.2 m for elevations from −30° to 80° and azimuths all around the listener (spacing: 2.5° within ±45°; 5° else)

• Apparatus: virtual visual environment, manual pointer response

• Visual training (ego-shooter game) and acoustic training (with feedback)

• Localization test: at least 300 trials (without feedback, mixed with other conditions of the corresponding study)

➢ Data analysis:

• Quadrant error rate (QE): Relative occurrence of target-to-response devia-tions > 90°, i.e., typically front/back confusions

• RMS local polar errors (PE): Combined measure of accuracy and precision of local responses (i.e., QE removed)

• Predicted QE and PE: expectancy values from the model PMVs

• Relation between actual and predicted performance: Pearson's correlation coefficient, r

➢ Model calibration:• Listener-specific DTFs for

template and incoming sound

• PMVs: for each listener, for all target directions, and for varying sensitivity S ranging from 0 to 2 in steps of 0.1 (Fig. 3)

• QEs and PEs: for each lis-tener, as functions of sensi-tivity (Fig. 4).

• Optimal sensitivity: smallest squared sum of residues be-tween the actual and pre-dicted performance (Fig. 5A)

3. RESULTS

➢ DTF sets and the sensitivity S systemati-cally permuted across the members of the listener group:

• Each listener indexed by k

• Sk is the sensitivity of the k-th listener

• Dk is the DTF set of the k-th listener

• PEs and QEs predicted as functions of Sk and Dk (see Fig. 6)

• Reference: Optimal listener-specific sensitivity (Fig. 5A)

➢ Effect of listener-specific DTFs:• Same sensitivity Sk for all listeners

• Fig. 5B: Sensitivity for best correlation considering PE and QE

• Fig. 5C: Average sensitivity (c.f. Lan-gendijk and Bronkhorst, 2002)

➢ Effect of listener-specific sensitivity:• The same DTF set Dk for all listeners• Fig. 5D: DTF set showing best correla-

tion (DTF set of listener k = 2)

➢ Comparison of variance caused by one of the two factors:

▫ Listener-specific sensitivity:

• SD of errors for listener-constant DTF

▫ Listener-specific DTF sets:

• SD of errors for listener-constant sensitivity

• DTF set contributed less than sensitivity to the performance variability of the group (see Fig. 7). On average, the factor sensitivity caused more than twice as much variability as the factor DTF set.

4. CONCLUSIONS

➢ The listener-specifically calibrated model predictions yielded a correlation between actual and predicted performance of 0.91.

➢ The permutation of the listener-specific sensitivity affected the predicted lo-calization performance much more (correlation: 0.22) than the permutation of the DTF sets (correlation: 0.82).

➢ This suggests that the across-listener variability in sagittal-plane localization performance is only marginally attributable to the quality of directional cues in human DTFs.

➢ Rather, the sensitivity parameter, supposed to represent the listener's effi-ciency in processing directional cues (e.g., spectral-shape sensitivity, Andéol et al., 2013), appears to be more important.

➢ This finding might be relevant for developing non-spatial experimental mea-sures serving as predictors of listener-specific sound-localization perfor-mance.

5. REFERENCES

Baumgartner, R., Majdak, P., Laback, B. (2014).”Modeling sound-source localization in sagittal planes for human listeners.” J Acoust Soc Am 136, 791-802.

Langendijk, E. H. A. and Bronkhorst, A. W. (2002). “Contribution of spectral cues to human sound localization,” J Acoust Soc Am, 112, 1583–1596.

Majdak, P., Walder, T., and Laback, B. (2013). “Effect of long-term training on sound localization performance with spectrally warped and band-limited head-related transfer functions,” J Acoust Soc Am, 134, 2148–2159.

Goupell, M. J., Majdak, P., and Laback, B. (2010). “Median-plane sound localization as a function of the number of spectral channels using a channel vocoder,” J Acoust Soc Am, 127, 990–1001.

Andéol, G., Macpherson, E. A., and Sabin, A. T. (2013). “Sound localization in noise and sensitiv-ity to spectral shape,” Hear Res, 304, 20–27.

Listener-Specific Sound-Localization Performance: A Matter of Better Ears?

Piotr Majdak, Robert Baumgartner, and Bernhard LabackAcoustics Research Institute, Austrian Academy of Sciences, Austria

38th Annual Mid-Winter

Meeting of the

Association for Research inOtolaryngology

February 21-25 2015

Baltimore, MD

Electronic copy

Corresponding author: Piotr Majdak, Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, A-1040 Wien, Austria

E-Mail: [email protected] http://www.kfs.oeaw.ac.at

This work was supported by the Austrian Science Fund (FWF P 24124).

Fig. 3: Actual and modeled localization. Actual localization responses (circles) and modeled re-sponse probabilities (PMVs, brightness encoded) calculated for three sensitivities S and four exem-plary listeners indexed by k.

Fig. 1: Interaural-polar coordinate system.

Polar angle

Lateral angle

Fig. 6: Localization performance depends on the sensitivity and DTF set. Predicted PEs and QEs as functions of the sensitivity of k-th listener (Sk) and DTF set of k-th listener (Dk). The listener group was sorted such that the sensitivity increases with increasing k and the same sorting order was used for Dk.

Fig. 7: DTF set contributes less than sensitivity to the performance variability of the group. SDs of PE and QE as functions of either listener-constant DTF set calculated for listener-specific sensitivities (Sk varied, blue squares) or the listener-constant sensitivity calculated for listener-specific DTF sets (DTF varied, red cir-cles). The abscissa is sorted by the ascending listener-specific sensitivity Sk.

Fig. 2: Simplified structure of the localization model from Baumgartner et al. (2014). The incoming target sound is peripherally processed, compared to an internal template set, and mapped yielding the probability for responding at a given polar angle. Blue arrows: free parameters of the corresponding sections.

Fig. 4: Predicted localization performance de-pends on the sensitivity. PEs and QEs as func-tions of S for four exemplary listeners (color from Fig. 3). Lines: model predictions. Symbols: actual performance obtained in the localization experiment (placement on the abscissa: optimal listener-specific sensitivity Sk).

Fig. 5: Predicted versus actual localization performance. Predicted PEs and QEs as functions of the actual PEs and QEs, respectively, for each listener. (A) Optimal listener-specific sensitivities Sk. (B) Listener-constant sensitivity yielding best correlation for PE and QE, S = 1.05. (C) Listener-constant sensitivity corresponding to across-listener average, S = 0.70. (D) Listener-specific sensi-tivity Sk and the same DTF set (k = 2) for all listeners. The correlation coefficient is denoted by r.

Responseprobabilities

Sensitivity

Incomingsound

DTFset Template

set

PeripheralProcessing

ComparisonProcess

SpatialMapping

PeripheralProcessing