Hybrid NMF APSIPA2014 invited

Post on 21-Apr-2017

319 views 1 download

Transcript of Hybrid NMF APSIPA2014 invited

Hybrid Multichannel Signal Separation Using Supervised Nonnegative Matrix Factorization

Daichi Kitamura, (The University of Tokyo, Japan)

Hiroshi Saruwatari, (The University of Tokyo, Japan)

Satoshi Nakamura, (Nara Institute of Science and Technology, Japan)

Yu Takahashi, (Yamaha Corporation, Japan)

Kazunobu Kondo, (Yamaha Corporation, Japan)

Hirokazu Kameoka, (The University of Tokyo, Japan)

東京大学, YAMAHA

2

Outline• 1. Research background• 2. Conventional methods

– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF

• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method

• 4. Experiments– Closed data experiment– Open data experiment

• 5. Conclusions

3

Outline• 1. Research background• 2. Conventional methods

– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF

• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method

• 4. Experiments– Closed data experiment– Open data experiment

• 5. Conclusions

4

Research background• Signal separation have received much attention.

• Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area.

• Supervised NMF (SNMF) achieves the highest separation performance.

• To improve its performance, SNMF-based multichannel signal separation method is required.

• Automatic music transcription• 3D audio system, etc.

Applications

Separate!

Separate the target signal from multichannel signals with high accuracy.

5

Outline• 1. Research background• 2. Conventional methods

– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF

• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method

• 4. Experiments– Closed data experiment– Open data experiment

• 5. Conclusions

6

• NMF can extract significant spectral patterns.

– Basis matrix has frequently-appearing spectral patterns in .

NMF [Lee, et al., 2001]

Amplitude

Am

plitu

de

Observed matrix(spectrogram)

Basis matrix(spectral patterns)

Activation matrix(Time-varying gain)

Time

: Number of frequency bins: Number of time frames: Number of bases

Time

Freq

uenc

y

Freq

uenc

y

Basis

7

• SNMF – Supervised spectral separation method

Supervised NMF [Smaragdis, et al., 2007]

Separation process Optimize

Training process

Supervised basis matrix (spectral dictionary)

Sample sounds of target signal

Fixed

Sample sound

Target signal Other signalMixed signal

8

Problems of SNMF• SNMF is only for a single-channel signal

– For multichannel signal, SNMF cannot use information between channels.

• When many interference sources exist, separation performance of SNMF markedly degrades.

Separate

Residual components

9

• Multichannel NMF – is a natural extension of NMF for a multichannel signal– uses spatial information for the clustering of bases to

achieve the unsupervised separation task.

Multichannel NMF [Sawada, et al., 2013]

Problems: Multichannel NMF involve strong dependence on initial values and lack robustness.

Microphone array

10

Outline• 1. Research background• 2. Conventional methods

– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF

• 3. Proposed method– Motivation and strategy– SNMF with spectrogram restoration and its Hybrid method

• 4. Experiments– Closed data experiment– Open data experiment

• 5. Conclusions

11

• Sawada’s multichannel NMF– is unified method to solve spatial and spectral separations.– Maximizes a likelihood:

– For supervised situation, target spectral patterns is given.

– Too much difficult to solve (lack robustness)– Computationally inefficient (much computational time)

Motivation and strategy

Spatial direction of target signal

Source components of all signals

Target Other

Observed spectrograms

12

• Proposed hybrid method– divides the problems as follows:

– The spatial separation should be carried out with classical D.O.A. estimation methods.• These methods are very efficient and stable.

– Divide and conquer method

Motivation and strategy

Unsupervised spatial separation

Supervised spectral separation

Approximation

Classical D.O.A. estimation SNMF-based method

13

Directional clustering [Araki, et al., 2007]

• Directional clustering– Unsupervised spatial separation method– k-means clustering (fast and stable)

• Problems– Artificial distortion arises owing to the binary masking.

Right

L R

CenterLeft

L R

Center

Binary masking

Input signal (stereo) Separated signal

1 

1 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

1  1 1

 0 

0 

1 

0 

0 

0 

0 

0 

1  1 1

  1 1 

1 

Freq

uenc

y

Time

C 

C 

C 

R  L R

 C 

L 

L 

L 

R 

R 

C 

C  C C

 R 

R 

C 

R 

R 

L 

L 

L 

C C C  C C

 C 

Freq

uenc

y

Time

Binary maskSpectrogram

Entry-wise product

14

Proposed method: hybrid separation• Hybrid separation method

Input stereo signal

Spatial separation method (Directional clustering)

SNMF-based separation method(SNMF with spectrogram restoration)

Separated signal

L R

15

SNMF with spectrogram restoration

: Holes

Time

Freq

uenc

y

Separated cluster Spectral holes (lost components)

The proposed SNMF treats these holes as unseen observationsSupervised basis

Extrapolate the fittest bases

(dictionary of target signal)

Fix up

16

SNMF with spectrogram restoration

Center RightLeftDirection

sour

ce c

ompo

nent

z

(b)

Center RightLeftDirection

sour

ce c

ompo

nent (a)

Target

Center RightLeftDirection

sour

ce c

ompo

nent (c)

Extrapolated components

Freq

uenc

y of

Freq

uenc

y of

Freq

uenc

y of

After

Input

After

signal

directionalclustering

super-resolution-based SNMF

Binary masking

Time

Freq

uenc

yObserved spectrogram

Target

Interference

Time

Time

Freq

uenc

y

Extrapolate

Freq

uenc

y

Separated cluster

Reconstructed data

Supervised spectral bases

Directional clustering

SNMF with spectrogram restoration

17

• The divergence is defined at all grids except for the holes by using the Binary mask matrix .

Decomposition model and cost function

Decomposition model: Supervised bases (Fixed)

: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm

Cost function:

: Binary masking matrix obtained from directional clustering

18

• The divergence is defined at all grids except for the holes by using the Binary mask matrix .

Decomposition model and cost function

Decomposition model: Supervised bases (Fixed)

: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm

Cost function:

: Binary masking matrix obtained from directional clustering

Binary index to exclude the holes

19

• The divergence is defined at all grids except for the holes by using the Binary mask matrix .

Decomposition model and cost function

Decomposition model: Supervised bases (Fixed)

: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm

Regularization term

Cost function:

: Binary masking matrix obtained from directional clustering

Binary index to exclude the holes

20

• The divergence is defined at all grids except for the holes by using the Binary mask matrix .

Decomposition model and cost function

Decomposition model: Supervised bases (Fixed)

: Entries of matrices, , and , respectively: Weighting parameters,: Binary complement, : Frobenius norm

Regularization termPenalty term[Kitamura, et al. 2014]

Cost function:

: Binary masking matrix obtained from directional clustering

Binary index to exclude the holes

21

• : -divergence [Eguchi, et al., 2001]

– EUC-distance

– KL-divergence

– IS-divergence

Generalized divergence: b -divergence

The best criterion for signal separation [Kitamura, et al., 2014]

22

• We used two -divergences for the main cost and the regularization cost as and .

Decomposition model and cost function

Decomposition model:

Cost function: Supervised bases (Fixed)

23

Update rules• We can obtain the update rules for the optimization of

the variables matrices , , and .

Update rules:

24

Outline• 1. Research background• 2. Conventional methods

– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Multichannel NMF

• 3. Proposed method– SNMF with spectrogram restoration and its Hybrid method

• 4. Experiments– Closed data experiment– Open data experiment

• 5. Conclusions

25

• Mixed signal includes four melodies (sources).• Three compositions of instruments

– We evaluated the average score of 36 patterns.

Experimental condition

Center

12 3

Left Right

Target source

Supervision signal

24 notes that cover all the notes in the target melody

Dataset Melody 1 Melody 2 Midrange BassNo. 1 Oboe Flute Piano TromboneNo. 2 Trumpet Violin Harpsichord FagottoNo. 3 Horn Clarinet Piano Cello

26

14121086420

SD

R [d

B]

43210bNMF

• Signal-to-distortion ratio (SDR)– total quality of the separation, which includes the degree of

separation and absence of artificial distortion.

Experimental result: closed data

Good

Bad

Conventional SNMF(single-channel SNMF)

Proposed hybrid method

Directional clustering

Supervised Multichannel NMF [Sawada]

KL-divergence EUC-distance

27

SNMF with spectrogram restoration• SNMF with spectrogram restoration has two tasks.

• The optimal divergence for source separation is KL-divergence ( ).

• In contrast, a divergence with higher value is suitable for the basis extrapolation.

Source separation

SNMF with spectrogram restoration

Basis extrapolation

28

Trade-off: separation and restoration• The optimal divergence for SNMF with spectrogram

restoration and its hybrid method is based on the trade-off between separation and restoration abilities.

-10-8-6-4-20

Am

plitu

de [d

B]

543210Frequency [kHz]

-10-8-6-4-20

Am

plitu

de [d

B]

543210Frequency [kHz]

Sparseness: strong Sparseness: weak

Per

form

ance

Separation

Total performance of the hybrid method

Restoration

0 1 2 3 4

29

• Closed data experiment– used different Tone generator for training and test signals

Experimental condition

Supervision signal

24 notes that cover all the notes in the target melody

Provided by Tone generator A

Provided by Tone generator B (more real sound)

+ back ground noise (SNR = 10 dB)

Center

12 3

Left Right

Target source

30

1086420-2-4

SD

R [d

B]

43210bNMF

• Signal-to-distortion ratio (SDR)– total quality of the separation, which includes the degree of

separation and absence of artificial distortion.

Experimental result: open data

Good

Bad

Conventional SNMF(single-channel SNMF)

Proposed hybrid method

Directional clustering

Supervised Multichannel NMF [Sawada]

KL-divergence EUC-distance

31

Conclusions• We proposed a hybrid multichannel signal separation

method combining directional clustering and SNMF with spectrogram restoration.

• There is a trade-off between separation and restoration abilities.

Thank you for your attention!

Demonstration is available!