Unsupervised Signal Processing_Channel Equalization and Source Separation
Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source...
Transcript of Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source...
Source Separation Tutorial Mini-Series ISpeech enhancement algorithms
Eunjoon Cho
Stanford University, EE
April 2nd, 2013
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 1 / 41
Speech enhancement: A source separation perspective
Source separation: Decoupling of two or more sources with no, littleor some prior information.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41
Speech enhancement: A source separation perspective
Source separation: Decoupling of two or more sources with no, littleor some prior information.
Speech enhancement is a natural application for source separation.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.
A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.
Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
Speech enhancement: Separate sources
Based on an estimate of the noise, how can we estimate the speech.
Spectral subtraction [Bol79]Wiener filtering: MMSE Estimator [LO79]Spectral Amplitude MMSE Estimator [EM84]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 6 / 41
Frequency domain approaches
Discuss approaches in the frequency domain: Y (ω) = X(ω) +D(ω)
Overall flow of STFT processing
y(n)w(n − mR)
ym(n) Ym(w)
Xm(w)x(n) xm(n)Overlap add IFT
FT
Estimation
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 7 / 41
Spectral subtraction
|Ym(ω)|
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Spectral subtraction
|Ym(ω)|, |Dm(ω)|
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Spectral subtraction
|Ym(ω)|, E [|Dm(ω)|]
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Spectral subtraction
|Xm(ω)| = |Ym(ω)| − E [|Dm(ω)|]
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Spectral subtraction
|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Spectral subtraction
|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}∠Xm(ω) = ∠Ym(ω)
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
Power spectral subtraction
|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}
The power spectral subtraction: α = 2
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41
Power spectral subtraction
|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}
The power spectral subtraction: α = 2
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
Gain for power spectral subtraction
Hm(ω) =
√γ(ω)− 1
γ(ω)
0 5 10 15 20−25
−20
−15
−10
−5
0
SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 11 / 41
Gain for general spectral subtraction
|Xm(ω)|α = |Ym(ω)|α − E [|Dm(ω)|α]Different gain functions for various α
Hm(ω) =|Xm(ω)||Ym(ω)|
=
(γ(ω)α/2 − 1
γ(ω)α/2
)1/α
0 5 10 15 20−50
−45
−40
−35
−30
−25
−20
−15
−10
−5
0
SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
α = 1
α = 2
α = 3
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 12 / 41
Example of spectral subtraction
Noisy speech
Power spectral subtraction
Magnitude spectral subtraction
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 13 / 41
Musical noise
Q. Why do we get musical noise?
A1. Inaccurate estimate of unknown variables.
From Ym(ω) = Xm(ω) +Dm(ω),
|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))
|Dm(ω)|2 ≈ E[|Dm(ω)|2
]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41
Musical noise
Q. Why do we get musical noise?
A1. Inaccurate estimate of unknown variables.
From Ym(ω) = Xm(ω) +Dm(ω),
|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))
|Dm(ω)|2 ≈ E[|Dm(ω)|2
]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41
Noise estimation
Simple method is to average power spectra when there is no speechactivity |Dm(ω)|2 = E
[|Ym(ω)|2
]= E
[|Dm(ω)|2
]Need an accurate voice activity detector (VAD)Issues with non-stationary noise (babble noise) conditions
Issues with |Dm(ω)|2 = E[|Dm(ω)|2
]The noise power spectrum (periodogram) has high variance withrespect to the underlying power spectral density
20 40 60 80 100 120−30
−25
−20
−15
−10
−5
0
5
FFT Index
Po
we
r (d
B)
Periodogram
True power
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 15 / 41
Cross term spectrum
2Re{Xm(ω)D∗m(ω)} requires the phase information which is difficult
to estimate
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−80
−60
−40
−20
0
20
40
Freq (Hz)
Pow
er
Magnitude (
dB
)
Frame 18
|X(ω)|2
|Y(ω)|2
|2Re(X(ω)D*(ω))|
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 16 / 41
Musical noise
Q. Why do we get musical noise?
A1. In accurate estimate of unknown variables.
A2. How we engineer situations when we have a bad estimate.
Half rectify negative values.
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 17 / 41
Artifacts from half-wave rectifying
1
1Image from [Bol79]Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 18 / 41
Oversubtraction [BSM79]
Over-subtract the noise estimate to reduce noise peaks
|Xm(ω)|2 = |Ym(ω)|2 − αE[|Dm(ω)|2
]Comes at the expense of attenuating the underlying signal
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 1, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 3, Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 19 / 41
Oversubtraction [BSM79]
Fill in valleys at frequencies to mask residue noise
|Xm(ω)|2 = max{|Ym(ω)|2 − αE[|Dm(ω)|2
],
βE[|Dm(ω)|2
]}
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, β = 0.001, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, β = 0.5, Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 20 / 41
Oversubtraction [BSM79]
α should be dependent on the frame segmental SNR (γ)
Less attenuation (small α) for high SNR, and more attenuation (largeα) for low SNR
−10 −5 0 5 10 15 20 250
1
2
3
4
5
6
7
8
SNR (dB)
α
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 21 / 41
Example of over spectral subtraction
Noisy speech
Power spectral subtraction
Spectral over subtraction: α = 15, β = 0.01
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 22 / 41
Wiener filter
Find optimal linear filter that outputs the desired signal (clean speech)
e(n) = x(n)− x(n) = x(n)−M−1∑k=0
hky(n− k)
Find h∗ that minimizes E[e2(n)
]by solving
∂E[e2(n)]∂h = 0
h∗ = R−1yy ryx = (Rxx + Rdd)
−1rxx
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 23 / 41
Wiener filter in frequency domain
If we assume a non-causal IIR filter, using the convolution theorem,i.e., x(n) ∗ h(n)↔ X(w)H(w)
E(ω) = X(w)−H(w)Y (w)
If we minimize E[|E(ω)|2
]with respect to H(ω), we have
H(ω) =E [X(ω)Y ∗(ω)]
E [|Y (w)|2] =E [X(ω)(X(ω)∗ +D(ω)∗)]
E [|Y (w)|2]
=E[|X(ω)|2
]E [|Y (ω)|2] =
E[|X(ω)|2
]E [|X(ω)|2] + E [|D(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 24 / 41
Parametric wiener filters
Wiener filter
H(ω) =E[|X(ω)|2
]E [|Y (ω)|2] =
E[|X(ω)|2
]E [|X(ω)|2] + E [|D(ω)|2]
More generally,
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 25 / 41
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
Wiener filter gain
If we replace E[|X(ω)|2] = |Y (ω)|2 − E[|D(ω)|2
]Wiener filter
H(ω) =E[|X(ω)|2]
E[|X(ω)|2] + E [|D(ω)|2] =γ(ω)− 1
γ(ω)
, where γ(ω) = |Y (ω)|2E[|D(ω)|2] .
The square root wiener filter = power spectral subtraction
H(ω) 12=
√γ(ω)− 1
γ(ω)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 27 / 41
Wiener filter gain
0 5 10 15 20−45
−40
−35
−30
−25
−20
−15
−10
−5
0
A−posteriori SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
Wiener filter
Square root wiener filter(i.e., Power spectral subtraction
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 28 / 41
Connection with spectral subtraction
If α 6= 1, using the same method
|X(ω)|2 = |Y (ω)|2 − αE[|D(ω)|2
]which is the spectral over subtraction method
Note the Wiener filter is zero-phase, and thus ∠X(ω) = ∠Y (ω), justlike the spectral subtraction method
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 29 / 41
MMSE-STSA Estimator
Suggested by Eprahim and Malah [EM84]
Estimator that minimizes the mean square error of the spectralmagnitude
Given X(ωk) = Xkej∠X(ωk),
minE[(Xk − Xk)
2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 30 / 41
Comparison of MMSE-STSA Estimator with Wiener filter
1 MMSE in the complex spectrum vs. magnitude spectrum
Wiener: minE[(X(ωk)− X(ωk))
2)]
MMSE-STSA: minE[(Xk − Xk)
2]
2 Linear assumption vs. assumption on distribution of Xk
Wiener: minE[(X(ωk)−H(ωk)Y (ωk))
2]
MMSE-STSA: minE[(Xk − Xk)
2], where expectation is taken over
p(Y (ωk), Xk)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 31 / 41
MMSE-STSA Estimator
minE[(Xk − Xk)
2]
From Bayesian statistics the optimal MMSE estimator is,
Xk = E [Xk|Y (ωk)]
=
∫ ∞0
xkp(xk|Y (ωk))dxk
=
∫∞0 xkp(Y (ωk)|xk)p(xk)dxk
p(Y (ωk))
We need knowledge on the distribution of X(ωk) and Y (ωk)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 32 / 41
Distribution assumption for X(wk), D(wk), and Y (wk)
Fourier transform coefficients (of both speech and noise) are Gaussiandistributed.
From central limit theorem: Y (ωk) =∑N−1
n=0 y(n)e−jωkn
CLT holds for weakly dependent signals tooThe variance of the distribution E|Y (ωk)|2 is time varying
X(wk) ∼ N (0, E[|X(wk)|2
])
D(wk) ∼ N (0, E[|D(wk)|2
])
Y (wk) ∼ N (0, E[|X(wk)|2
]+ E
[|D(wk)|2
])
Xk ∼ Rayleigh(σ), with σ =√E [|X(wk)|2] /2
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 33 / 41
Spectral gain of MMSE-STSA estimator
The spectral gain can be represented with two variables.
The a-priori SNR: ξk =E[|X(ωk)|2]E[|D(ωk)|2]
The a-posteriori SNR: γk = |Y (ωk)|2E[|D(ωk)|2]
Using a temporary variable, νk =ξk
1+ξkγk,
Xk =
√π
2
√νkγk
exp(−νk
2
) [(1 + νk)I0
(νk2
)+ νkI1
(νk2
)]Yk
= G(ξk, γk)Yk
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 34 / 41
Gain as a function of a-priori SNR
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 35 / 41
Estimating the a-priori SNR
ξk =E[|X(ωk)|2
]E [|D(ωk)|2]
Instantaneous SNR: ξk =|Y (ωk)|2−E[|D(ωk)|2]
E[|D(ωk)|2]= |Y (ωk)|2
E[|D(ωk)|2]− 1
Decision directed approach
ξk(m) = aX2k(m− 1)
E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]
− 1
)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 36 / 41
Effect of smoothed SNR
Instantaneous SNR:
ξk = γ(ωk)− 1 =|Y (ωk)|2
E [|D(ωk)|2]− 1
Decision directed approach
ξk(m) = aX2k(m− 1)
E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]
− 1
)
0 20 40 60 80 100 120 140 160 180−50
−40
−30
−20
−10
0
10
20
30
Time
SN
R (
dB
)
ξ
k
γk − 1
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 37 / 41
Examples with decision directed a-priori SNR estimation
Noisy speech
MMSE-STSA (Instantaneous SNR)
MMSE-STSA (Decision Directed SNR)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 38 / 41
Examples with decision directed a-priori SNR estimation
Noisy speech
Wiener Filter (Instantaneous SNR)
Wiener Filter (Decision Directed SNR)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 39 / 41
Summary
Model noise when speech is absent. |D(ω)| = E [|D(ω)|]Separate speech by applying gain on the noisy spectrum.
1 Spectral subtraction: |X(ω)| = |Y (ω)| − |D(ω)|2 Wiener filter: X(ω) =
E[|X(ω)|2]E[|X(ω)|2]+E[|D(ω)|2]Y (ω)
3 STSA-MSME: X(ω) = G(ξ(ω), γ(ω))Y (ω)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 40 / 41
References I
Steven Boll, Suppression of acoustic noise in speech using spectralsubtraction, Acoustics, Speech and Signal Processing, IEEETransactions on 27 (1979), no. 2, 113–120.
M. Berouti, M. Schwartz, and J. Makhoul, Enhancement of speechcorrupted by acoustic noise, IEEE Int. Conf. Acoust., Speech, SignalProcessing (ICASSP ’79), 1979, pp. 208–211.
Y. Ephraim and D. Malah, Speech enhancement using a minimummean-square error short-time spectral amplitude estimator, IEEETransactions on Acoustics, Speech and Signal Processing (1984),1109–1121.
Jae S Lim and Alan V Oppenheim, Enhancement and bandwidthcompression of noisy speech, Proceedings of the IEEE 67 (1979),no. 12, 1586–1604.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 41 / 41