SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS...
-
Upload
bernadette-dawson -
Category
Documents
-
view
217 -
download
2
Transcript of SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS...
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS
Jain-De,Lee
Emad M. Grais Hakan Erdogan
17th International Conference on Digital Signal Processing,2011
Outline INTRODUCTION
NON-NEGATIVE MATRIX FACTORIZATION
SIGNAL SEPARATION AND MASKING
EXPERIMENTS AND DISCUSSION
CONCLUSION
Introduction
There are two main stages of this work– Training stage– Separation stage
Using NMF with different types of masks to improve the separation process
– The separation process faster– NMF with fewer iterations
Introduction
Problem formulation– The observe a signal x(t) ,which is the mixture of two
sources s(t) and m(t)
– Assume the sources have the same phase angle as the mixed
),(),(),( ),(),(),(
),(),(),(ftMjftSjftXj eftMeftSeftX
ftMftSftX
Where (t , f) be the STFT of x(t)
X = S + M
Non-negative Matrix Factorization
Non-negative matrix factorization algorithm
Minimization problem
Different cost functions C of NMF– Euclidean distance– KL divergence
mddnmn WBV ][][][
),(min,
BWVCWB
subject to elements of B,W 0≧
Non-negative Matrix Factorization
Euclidean distance cost function
KL divergence cost function
Multiplicative Update Algorithm
ji
jijiWB
BWVBWVC,
2,,
,))((),(min
ji
jijiji
jiji
WBBWV
BW
VVBWVC
,,,
,
,,
,))(
)(log(),(min
1
T
T
BWB
VB
WWT
T
W
WWB
V
BB
1
Non-negative Matrix Factorization
The magnitude spectrogram S and M are calculated by NMF
Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time
musicmusicTrain
speechspeechTrain
WBM
WBS
Signal Separation and Masking
The NMF is used decompose the magnitude spectrogram matrix X
The initial spectrograms estimates for speech and music signals are respectively calculated as follows
WBBX musicspeech ][
Mmusic
Sspeech
WBM
WBS
~
~
Where WS and WM are submatrices in matrix W
Signal Separation and Masking
Use the initial estimated spectrograms and to build a mask as follows
Source signals reconstruction
S~
M~
PP
P
MS
SH ~~
~
XHM
XHS
)1(ˆ
ˆ
Where 1 is a matrix of ones is element-wise multiplication
Signal Separation and Masking
Two specific values of p correspond to special masks– Wiener filter(soft mask)
– Hard mask
22
2
~~
~
MS
SHWiener
)~~
~(
22
2
MS
SroundH hard
Signal Separation and Masking
The value of the mask versus the linear ratio for different values of p
Experiments and Discussion
Simulation– 16kHz sampling rate– Speech
• Training speech data-540 short utterances• Testing speech data-20 utterances
– Music• 38 pieces for training• 1 piece for testing
– Hamming window-512 point– FFT size-512 point
Experiments and Discussion Performance measurement of the separation
Experiments and Discussion
Experiments and Discussion
Experiments and Discussion
Conclusion The family of masks have a parameter to control the
saturation level
The proposed algorithm gives better results and facilitates to speed up the separation process