Laboratory for Experimental ORL K.U.Leuven, Belgium Dept. of Electrotechn. Eng. ESAT/SISTA...
-
Upload
paulina-clarke -
Category
Documents
-
view
243 -
download
0
Transcript of Laboratory for Experimental ORL K.U.Leuven, Belgium Dept. of Electrotechn. Eng. ESAT/SISTA...
Laboratory for Experimental ORLK.U.Leuven, Belgium
Dept. of Electrotechn. Eng. ESAT/SISTAK.U.Leuven, Belgium
Combining noise reduction and binaural cue preservation for hearing
aids:
MWF-ITFMultichannel Wiener Filter with Interaural Transfer Function
Van den Bogaert T., Doclo S., Moonen M. and Wouters J.
ICASSP 2007
download at https://gilbert.med.kuleuven.be/~u0041407/2007ICASSP.pdf Contact: [email protected]
20-04-2007
ICASSP 2007 Van den Bogaert et al. 2
Overview
• Problem statement: binaural hearing aids, noise reduction and preservance of binaural cues
• Multichannel Wiener Filter approaches: – MWF: a standard N-microphone Multi-channel Wiener
Filter approach– MWF-ITF: Extension of MWF to an MWF with
integrated Interaural Transfer Function.– Experimental results: SRT and Localization
• Objective measures• Perceptual measures (N=5)
20-04-2007
ICASSP 2007 Van den Bogaert et al. 3
Problem statement
• Hearing impairment reduction of speech intelligibility in background noise (even with amplification)– Signal processing to selectively enhance useful speech signal– Multiple microphones available: spectral + spatial processing– Most hearing impaired are fitted with hearing aids at both
ears
• Binaural hearing: everything relating to hearing simultaneously (vs bilateral) with two ears – Binaural cues, in addition to spectral and temporal cues, play
an important role in binaural noise reduction and sound localization. (important to preserve these cues)
– It has been reported that current bilateral noise reduction systems have a negative impact on binaural hearing [Van den Bogaert et al. 2005, Van den Bogaert et al. 2006]
20-04-2007
ICASSP 2007 Van den Bogaert et al. 4
Problem statement
Bilateral system Binaural system
+More microphones = more performance?
- Need of binaural link
- If adaptive typically no controlon left/right proc.
Hard to preserve interaural cues(mic mismatch, imperfections, ...)-
more control on left/right processing?+
20-04-2007
ICASSP 2007 Van den Bogaert et al. 5
Problem statement
• Main binaural cues– Interaural phase or interaural time differences
• ITD range: from 10µs to 700µs• Of the signal f<1300Hz and/or on the low frequency
envelope for complex sounds
– Interaural level differences • ILD range: from 1dB up to more than 30dB• Physically significant f>2000Hz
IPD/ITD
ILDSignal source
20-04-2007
ICASSP 2007 Van den Bogaert et al. 6
Design criteria binaural noise reduction for HA’s:
- Maximize noise reduction by using all available microphone signals (binaural link assumed), 2 outputs needed
- Preserve the binaural cues
- Limit the amount of speech distortion
(HA constraints: Robustness of the system, low complexity, …)
Problem statement
20-04-2007
ICASSP 2007 Van den Bogaert et al. 7
Overview of binaural noise reduction techniques
• Different approaches– BSS methods (Robert Aichner ICASSP 3days ago)– Fixed beamforming [e.g. Desloge 1997]
• Low complexity• Limited performance, only speech cues may be preserved (in ideal situations)
– CASA based techniques [e.g. Wittkop 2003]• Perfect preservation of speech/noise cues• Mostly for 2 microphones, “spectral substraction” like problems
– Adaptive beamforming, based on GSC structure passing the low freq part of the signal unproc. [e.g. Welker 1997]
• Preserves parts of the binaural cues• Substantial drop in noise reduction
– Binaural multi-channel wiener filter [e.g. Doclo 2002 Spriet 2004]• Speech cues are preserved• No assumptions about positions of sources and microphones• Noise cues may be distorted
Extension of MWF :
preservation of binaural speech and noise cues without substantially compromising noise reduction
performance
20-04-2007
ICASSP 2007 Van den Bogaert et al. 8
Multichannel Wiener Filter
)(wN
)(wS
Speaker
Noise
1W )(
)(0Z
)(1Z
)(1Y )(0Y
W )(0
Filtered output:
)()()(
)()()(
111
000
wVwXwY
wVwXwY
mmm
mmm
Speech component
Noise component
2M microphones:
)1...(0 Mm
Goal: to estimate the speech component at the reference microphone of each hearing aid (r0, r1) typically the front omnidirectional one:
Standard multichannel Wiener filter
0WJMSE
A hearing aid listening scenario
20-04-2007
ICASSP 2007 Van den Bogaert et al. 9
Multichannel Wiener Filter
0
1
2 2
0, 0 0
11, 1
( )H H
r
HHr
XJ E
X
W X W VW
W VW X
Speech distortion Noise reductionTrade off parameter
Introduce trade off parameter noise reduction/speech distortion
1=SDWW R r
0WJMSE
To control or reduce speech distortion rewrite cost function:
Speech distortion weighted multichannel Wiener filter
In standard hearing aid beamforming, avoiding speech distortion is typically done by calibrating the speech reference path and removing the speech component in the noise ref path
20-04-2007
ICASSP 2007 Van den Bogaert et al. 10
Multichannel Wiener Filter
0
1
= , = ,x v M xx y v
M x v x
R R 0 rR r R R R
0 R R r
• Depends on second-order statistics of speech and noise, no assumption of speech and noise source (can be integrated in VAD)
• Perfectly preserves the interaural cues of the speech component since in the left and right hearing aid an estimate is made of the speech component in the front microphone of this hearing aids.
• Shifts the interaural cues of the noise component to the cues of the speech component !!!!
1=SDWW R r
Estimate, f(VAD)
Add term related to binaural cues of noise component to the MWF cost functionPossible cues: ITD, ILD, Interaural Transfer Function (ITF)
ITF-MWF
20-04-2007
ICASSP 2007 Van den Bogaert et al. 11
Multichannel Wiener Filter – ITF extension
0
1
2 2
0, 0 0
11, 1
2 2
0 1 0 1
( ) =H H
r
tot HHr
H x H H v Hin in
XJ E
X
E ITF E ITF
W X W VW
W VW X
W X W X W V W V
Under assumption of a single noise source
You can do this for the speech and noise component
0 10
1 1 1
*0, 1,0, 0 1
*1, 1 11, 1,
( , )
( , )
r rrv vin
r vr r
E V VV r rITF
V r rE V V
R
R
Goal: the ITF of the noise component at the output = ITF at the input
Performance and influence of beta and alpha on Loc and SNR improvement performance?
20-04-2007
ICASSP 2007 Van den Bogaert et al. 12
Experimental results
• Identification of HRTFs:– Binaural recordings on CORTEX MK2 artificial head – 2 omni-directional microphones on each hearing aid (d=1cm) – Hrtfs measured = -90:15:90, 90:30:270, 1m from head – Conditions: T60=140 ms (T60=590 ms added) fs=16 kHz, =1
• Objective evaluation:– AI weighted SNR improvement– ITD and ILD error
• Perceptual evaluation:– Headphone exp with record. hrtfs– SRT measurements (50% Sp. Intell)– Localization using prerecorded hrtfs,
S and N components are send seperately through the fixed filter, localize S and N in the room werethe hrtfs were recorded
20-04-2007
ICASSP 2007 Van den Bogaert et al. 13
Experimental resultsLeft input
signalsRight input
signals
( ) ( ) ( ) Y X VFFT FFT
0 0 0x vZ Z Z 1 1 1x vZ Z Z Left output Right output
IFFT IFFT
Frequency-domain filtering
Off-line computation of
statistics
VAD
( ), ( )v x R R
Calculate filters
for this specific sc.
0
1
( )( ) =
( )
WW
W, ,
Stored filters are converged for a condition Sx Ny with Sx=speechweighted noise from angle x and Ny=babble noise from angle y
20-04-2007
ICASSP 2007 Van den Bogaert et al. 14
Experimental results: objective evaluation
• Error measures correlated with design criteria:– Maximize speech intelligibility: Intelligibility weighted
SNR improvement (left/right)
– Minimize interaural cue distortion• ILD of speech and noise component
• ITD of speech and noise component
L i L ii
SNR I SNR importance of i-th frequencybin for speech intelligibility
x i x ii
ITD I ITD
1
* *0, 1, 0 10
{ } { }x i r r x xITD E X X E Z Z low-pass filter 1500 Hz
x xx out i in i
i
ILD ILD ILD
20-04-2007
ICASSP 2007 Van den Bogaert et al. 15
00.05
0.10.15
0.2
0
0.2
0.4
0
5
10
15
ILD error speech component
IL
D [
dB]
00.05
0.10.15
0.2
0
0.2
0.4
0
5
10
15
ILD error noise component
IL
D [
dB]
00.05
0.10.15
0.2
0
0.2
0.4
0
0.5
1
1.5
ITD error speech component
IT
D [
rad]
00.05
0.10.15
0.2
0
0.2
0.4
0
0.5
1
1.5
ITD error noise component
IT
D [
rad]
objective evaluation: localization
S0N60
20-04-2007
ICASSP 2007 Van den Bogaert et al. 16
00.05
0.10.15
0.2
00.1
0.20.3
0.40.5
0
5
10
15
20
25
SNR improvement left ear
S
NR
w [d
B]
00.05
0.10.15
0.2
00.1
0.20.3
0.40.5
0
5
10
15
20
25
SNR improvement right ear
SN
Rw [d
B]
objective evaluation: SRT
L i L ii
SNR I SNR
β
β
α
α
S0N60
Additions:For T60=0.59s Left perf. S0N60 drops to 5dB SNR AI, right perf drops to 7dB SNR AI noise reductionGoing from 2 to 4 microphones gives a gain of about 3dB SNR AI to 9dB SNR AI compared to 2 microphone performance
T60=0.14s
T60=0.14s
importance of i-th frequencyfor speech intelligibility
20-04-2007
ICASSP 2007 Van den Bogaert et al. 17
SRT: perceptual evaluation
VU noise 60 deg, alpha=0
9,00
11,00
13,00
15,00
0,0 0,1 0,3 1,0 10,0
Beta
SR
T im
pro
ve
me
nt
– Adaptive SRT procedure to find 50% Speech Recept Threshold
– S0N60, dutch VU sentences, T60=140ms
– average SRT without processing = -9.2 dB
– SRT improvements in the range 11-13 dB
Binaural speech intelligibility advantage because of spatial seperation speech and noise component does not seem to compensate for loss in SNR improvement
Addition: performance drops to around 6dB SRT gain if T60=590ms (S0N90 tested for N=2)
20-04-2007
ICASSP 2007 Van den Bogaert et al. 18
Localization: perceptual evaluation
• Condition SxN0: Speech arrives from angle x, with x from -90° till +90° in steps of 30°, noise arrives from 0 degrees.
• Perceptual procedure: calculate MWF filters offline trained on spatial condition SxNy. Now run a telephone ring arriving from angle x and angle y seperately through the filters and store the result. Play these wav files under headphones to the subject and ask to localize the telephone signal.
20-04-2007
ICASSP 2007 Van den Bogaert et al. 19
Localization: perceptual evaluation
-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90
-90
-75
-60
-45
-30
-15
0
15
30
45
60
75
90
-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90
-90
-75
-60
-45
-30
-15
0
15
30
45
60
75
90
-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90
-90
-75
-60
-45
-30
-15
0
15
30
45
60
75
90
-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90
-90
-75
-60
-45
-30
-15
0
15
30
45
60
75
90
SxN0
alfa=0beta=0
Localization of Sx Localization of N0
alfa=0beta=10
20-04-2007
ICASSP 2007 Van den Bogaert et al. 20
Localization: perceptual evaluation
SxN0
Loc error Sx in SxN0 Loc error N0 in SxN0
0
10
20
30
40
50
60
70
80
-90 -60 -30 0 30 60 90
x(°)
e rr o
r (°
0
10
20
30
40
50
60
70
80
(°)
-10
0
10
20
30
40
50
60
70
80
0 0,1 0,3 1 10 100
beta
(°)
alpha=0
alpha=0,5
0
10
20
30
40
50
60
70
80
90
100
-90 -60 -30 0 30 60 90
x (°)
err o
r(°
)
0 0,1 0,3 1 10 100
beta
be ta=0
be ta=0,1
be ta=0,3
be ta=1
be ta=10
be ta=100
Loc error Sx, 5 subjects
Loc error N0, 5 subjects
20-04-2007
ICASSP 2007 Van den Bogaert et al. 21
• Sum of localisation errors Sx and N0
• Parameters can be tuned to achieve better overal localization performance -> at the cost of some noise reduction
• There is a correlation between physical and perceptual evaluation, even for localization. However error measures far from perfect. (do not include diffuseness, …)
Localization: perceptual evaluation
Loc error Sx + Loc error N0 5 subjects SxN0
0
10
20
30
40
50
60
70
80
0 0,1 0,3 1 10 100
beta
(°)
a l p h a = 0
a l p h a = 0 , 5
20-04-2007
ICASSP 2007 Van den Bogaert et al. 22
Conclusions
• (Speech distortion weighted) MWF preserves the speech cues, not the noise cues
• MWF-ITF enables, by constraining the filters W to an area where noise ITF is preserved, a trade off between preservance of speech and noise cues and noise reduction performance (a solution but not the perfect solution: multiple spectral overlapping noise sources, ...)
• Preserving localization cues did not show a large benefit (due to the spatial seperation of speech and noise) / reduction (due to the extra constraints set on W) in SRT score.
20-04-2007
ICASSP 2007 Van den Bogaert et al. 23
Acknowledgements
download at https://gilbert.med.kuleuven.be/~u0041407/2007ICASSP.pdf Contact: [email protected]
20-04-2007
ICASSP 2007 Van den Bogaert et al. 24
objective evaluation: SRT (additions)
• The gain of going from 2 mics on one HA to 3 or 4 mics (low reverb):– Single Noise Scenario:
• 2 mic performance: 5 to 19 dB SNR AI improvement• 2 to 3 mics: +2/+5 dB SNR AI improvement• 3 to 4 mics: +1/+4 dB SNR AI improvement• (max if noise source is at position of 2 mics -> adding a
good SNR signal as 3rd or 4th microphone)
– Multiple noise sources (3 noise sources)• 2 mic performance: around 7 dB SNR AI improvement• 2 to 3 mics: +2dB SNR AI improvement• 3 to 4 mics: +2dB SNR AI improvement
20-04-2007
ICASSP 2007 Van den Bogaert et al. 25
Localization: objective evaluation
-80 -60 -40 -20 0 20 40 60 800
10
20
30
40
50
60
Angle speech source
ITD
err
or [
%]
ITD error speech (VU_man + auditec 0deg, = 0, SNR=0dB)
-80 -60 -40 -20 0 20 40 60 800
10
20
30
40
50
60
Angle speech source
ITD
err
or [
%]
ITD error noise (VU_man + auditec 0deg, = 0, SNR=0dB)
beta = 0
beta = 0.1
beta = 0.3
beta = 1
beta = 10
beta = 100
SxN0
large changes direction of speech component to noise component increase weight (cf. physical and perceptual evaluation)