Depth estimation of sound images using directional clustering and activation-shared nonnegative...

35
Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan)

Transcript of Depth estimation of sound images using directional clustering and activation-shared nonnegative...

Page 1: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Depth Estimation of Sound Images Using

Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura

(Nara Institute of Science and Technology, Japan)

Page 2: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Outline

Background and related study

Problem and purpose

Proposed method 1

- Depth estimation based on DOA distribution

Proposed method 2

- Activation shared nonnegative matrix factorization

Experiments

Conclusions

2

Page 3: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Background

With the advent of 3D TV, the reproduction of 3D image is realized.

Viewer feels uncomfortable due to mismatch of images.

Problem Picture image Sound image

: Sound image

3D TV

3

To solve this problem, sound field reproduction techniquehave been studied actively.

can present the “direction” and “depth” of the sound images to the listener.

3D sound reproduction system has not been established yet.

Page 4: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Related study: wave field synthesis

WFS allows us to create sound images at the front of loudspeakers.

Wave Field Synthesis (WFS)

Sound field reproduction

Representation "depth“ of sound images

[A. J. Berkhout, et al., 1993]

…… …

Listener

4

Drawback of WFS×

Source separation

Localization estimation of sound images

1

2

These information have been lost in existing contents by down-mix.

Up-mixing method are required.

Sound image

Mixed signal → individual source

WFS requires the primary source information of sound images.

1. Individual sound source2. Localization information

Page 5: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Mixed multi-channel signal

Wave fieldSynthesis

Stereo contents Spatial sound reproduction

Spatial sound system using existing contents

Flow of proposed up-mixer

DepthestimationNew depth estimation

Sound sourceseparation

1

Directionalestimation

Depth estimation of sound images has not been proposed

Conventionalmethod

2This study

5

Page 6: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Related study: directional clustering [Araki, et al., 2007]

6:Source component :Spatial representative vector

L-ch

inp

ut

sign

al

R-ch input signal

L-ch

inp

ut

sign

al

R-ch input signal

Normalization Clustering

Mixed stereo signal

L-ch

inp

ut

sign

al

R-ch input signal

Individual sources of each cluster

: Fourier transform : Inverse Fourier transform

1

Page 7: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Outline

Background and related study

Problem and purpose

Proposed method 1

- Depth estimation based on DOA distribution

Proposed method 2

- Activation-shared multichannel NMF

Experiments

Conclusions

7

Page 8: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Problem and purpose

8

Depth estimation method using direction of arrival (DOA) distribution

Proposed method

Establishing new depth estimation method

How can we get depth information?

Purpose

Problem WFS requires specific localization information of individual sound sources to reproduce a sound field.

Up-mixer

Directional estimation method have been developed.Directional estimation based on VBAP [Hirata, et al., 2011]

Page 9: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Outline

Background and related study

Problem and purpose

Proposed method 1

- Depth estimation based on DOA distribution

Proposed method 2

- Activation-shared multichannel NMF

Experiments

Conclusions

9

Page 10: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

→ “Direction of arrival” of sound wavesWe estimate the depth using the DOA distribution.

Center RightLeft

Freq

uen

cyo

f so

urc

e co

mp

on

ents

Direction of arrival

Directional clustering Weighted DOA histogram

DOA

Amplituderatio of

10

Directional information

Weighting term

Proposed method 1: depth estimation based on DOA

Mixed signal

Individual sources

Magnitude of each vector

Page 11: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Proposed method 1: depth estimation based on DOA

11

sou

rce

com

po

nen

tFr

equ

ency

of

sou

rce

com

po

nen

tFr

equ

ency

of

Direction of arrival

Close

Far

Observed DOA histogram becomes smooth shape

Difference of DOA shape corresponding to source distance

Observed DOA distribution of the target source can be used as a cue for depth estimation.

Observed DOA histogram becomes spiky shape

Close source

Direction of arrival

Far source

In sound fields, when a sound source is far from the listener, sound waves arrive from various directions owing to sound diffusion.

Page 12: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

12

Generalized Gaussian distribution: GGD [Box, et al., 1973]

Proposed method 1: modeling of DOA distribution

βshape = 2: Gaussian distribution PDF

βshape = 1: Laplaciandistribution PDF

Definition of GGD

Flexible family of probability density function (PDF)

To model DOA, we propose a new modeling method using GGD.

Shape of GGD changes depending on βshape.

Page 13: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

13

Modeling of DOA distribution based on GGD parameter

Proposed method 1: modeling of DOA distribution

Close

Direction of arrival

sourc

e c

om

pon

ents

Fre

qu

en

cy o

f

Far

Source is close⇔ βshape is smallSource is Far⇔ βshape is large

We propose a new depth estimation based on GGD.

Shape parameter βshape

is utilized as metric.

Page 14: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Proposed method 2: problem in proposed method 1

Problem of signal processing L-

ch

R-ch

Small noise components are enhanced.

L-ch

in

pu

t si

gnal

R-ch input signalBinaural – recorded

Normalization problem

14

DOA

Freq

uen

cy o

f so

urc

e co

mp

on

ents

CenterRightLeft

Background noise and artificial distortion generatedby signal processing interfere with DOA histogram.

Activation-shared multichannel NMFFeature extraction

Noise

×

Page 15: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Outline

Background and related study

Problem and purpose

Proposed method 1

- Depth estimation based on DOA distribution

Proposed method 2

- Activation-shared multichannel NMF

Experiments

Conclusions

15

Page 16: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Proposed method 2: activation-shared multichannel NMF

16

Time

Fre

qu

ency

AmplitudeFr

eq

uen

cy

Am

plit

ud

e

Time

Ω: Number of frequency bins𝑇: Number of time frames𝐾: Number of bases

Nonnegative matrix factorization: NMF [Lee, et al., 2001]

Activation matrix(Time-varying gain)

Basis matrix(Spectral patterns)

Observed matrix(Spectrogram)

— is a sparse representation.— can extract significant features from the observed matrix.

The sparse representation provides high performance

for noise reduction, compression, and feature extraction.

We eliminate background noise and artificial distortion.

Page 17: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

17

L-chNMF

R-chNMF

Conventional NMFs generate an artificial fluctuation.

Directional information

DOA informationis disturbed.

Conventional NMF

Proposed method 2: problem of conventional NMF

NMFs are applied in

parallel

AmplituderatioBases are trained

uncorrelated.

Page 18: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

18

This reduces dimensionality of input signal while maintaining directional information.

Cost function

Activation matrixis shared through

all channels

Activation-shared multichannel NMFProposed method

: cost function, : β-divergence, : entries of matrices

L-chNMF

R-chNMF

Proposed method 2: activation-shared multichannel NMF

Page 19: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

- divergence [Eguchi, et al., 2001]

: Euclidean distance

: Generalized Kullback-Leibler divergence

: Itakura–Saito divergence

Generalized divergence of variable corresponding to .

19

Proposed method 2: activation-shared multichannel NMF

Page 20: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

20

Using-divergence

Proposed method 2: activation-shared multichannel NMF

Auxiliary function method is an optimization scheme that uses the upper bound function.

1. Design the auxiliary function for as .

2. Minimize the original cost functions indirectly

by minimizing the auxiliary functions.

Derivation of optimal variables

Page 21: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

The first and second terms become convex or concave

functions with respect to value.

concave

convex

convex

concave

convex

concave

21

Proposed method 2: activation-shared multichannel NMF

Cost function

Page 22: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Convex: Jensen’s inequality

Concave: tangent line inequality: Convex function

: Concavefunction

22

Proposed method 2: activation-shared multichannel NMF

Cost function

Upper bound function of each term is defined by applying

Page 23: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

The update rules for optimization are obtained from the derivative of auxiliary function w.r.t. each objective variable.

23

are entriesof matrices .

Proposed method 2: activation-shared multichannel NMF

Update rules

Page 24: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Flow of proposed depth estimation method

Input stereo signal

L-ch R-ch

STFT

Cluster RCluster C Cluster L

Weighted DOA histogram

estimationDepth

estimationDepth

estimationDepth

shared NMFActivation-

Direction of arrivalWe can estimate depth information by calculate shape parameter of DOA histogram.

Fre

qu

ency

of

sou

rce

co

mp

on

en

ts

Direction of arrival

Direction of arrival

shared NMFActivation-

shared NMFActivation-

24

Fre

qu

ency

of

sou

rce

co

mp

on

en

tsFr

eq

uen

cy o

fso

urc

e c

om

po

ne

nts

Page 25: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Outline

Background and related study

Problem and purpose

Proposed method 1

- Depth estimation based on DOA distribution

Proposed method 2

- Activation-shared multichannel NMF

Experiments

Conclusions

25

Page 26: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Experimental conditions

26

Conditions

Mixed stereo signals consist of 3 instruments.

Target source is located center with 7 distances.

Combination related to direction is 6 patterns.

Mixing source parameter

Test source 1

Test source 2

Test source 3

Reverberation time

NMF beta

NMF basis: Interference source

: Target source

at intervals

Conventional method 2

Conventional method 1

Proposed method

Weighted DOA histogram(Not processed by NMF)

Processed by conventional NMF

Processed by proposed NMF

Page 27: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Real source Image source

Geometry of image method

Time index

Am

plit

ud

eExample of room impulse response

Experimental conditions

Technique of simulating room impulse response

Volume of room Source location Microphone location Absorption coefficient

– can be set arbitrarily

Reference sound sources were generated using

image method.

Image method[Allen, et al., 1979]

27

Page 28: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

28

Experimental results

Results 1

・ Results of conventional methods have no agreement with the oracle (image method).・ Results of proposed method correctly estimates distance of the target source.

: Interference source

: Target source

Target source: VocalInterference source (left): PianoInterference source (right): Guitar

Data set 1

Page 29: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

29

Data set 1 2 3 4 5 6

Target source

Interference source (left)

Interference source (right)

Vocal

Piano

Guitar

Vocal

Guitar

Piano

Guitar

Piano

Vocal

Guitar

Vocal

Piano

Piano

Vocal

Guitar

Piano

Guitar

Vocal

Conventional method 1 0.350 0.532 0.154 0.277 0.602 0.496

Conventional method 2 0.189 0.165 0.044 -0.037 0.426 0.157

Proposed method 0.986 0.925 0.777 0.651 0.791 0.856

Experimental results: correlation coefficient

Correlation coefficient between reference valueand estimated value

• Strong relation between the estimated value of proposed method and the distance of the target source is indicated.

• The efficacy of the proposed method is confirmed.

Table Correlation coefficient of each method

Results 2

Page 30: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Conclusions

30

We proposed a new depth estimation method of sound source in mixed signal using the shape of DOA distribution.

The shape of DOA distribution is modeling by GGD.

We also proposed a new feature extraction method for the multichannel signal, activation-shared multichannel NMF.

The result of the experiment indicated the efficacy of the proposed method.

Page 31: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

31

Page 32: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Derivation of parameter βshape

Kurtosis of DOA histogram

we propose a closed-form parameter estimationalgorithm based on some approximation and kurtosis.

th moment of GGD

: Observed DOA histogram : Gamma function

×

32

Relation equation of kurtosis and shape parameter

The maximum-likelihood based shape parameter estimation has no closed-form solution in GGD.

Page 33: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

Modified Stirling's formula

There is no exact closed-form solution of the inverse function.×Approximation of gamma function

Take a logarithm

33

Derivation of parameter βshape

Introduce Modified String’s formula

Page 34: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

This results in the following quadratic equation of to be solved

closed-form estimate of shape parameter

Preparation of depth estimation method is completed.

we can derive the closed-form estimation

34

Derivation of parameter βshape

Page 35: Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

35

L-chNMF

R-chNMF

Preliminary experiment

Fluctuation are generated in DOA Direction of arrival [degree]

L-chNMF

R-chNMF

(Individually applied) conventional NMF

(Activation-shared) proposed NMF

WeightedDOA histogram

Center cluster DOAof mixed source(3 instrument)Direction of arrival [degree]

Direction of arrival [degree]

Feature extractionwhile maintaining

directional information

Proposed method 2: activation-shared multichannel NMF

Example of DOA histogram