Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative...
-
Upload
naistis -
Category
Technology
-
view
723 -
download
3
description
Transcript of Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative...
![Page 1: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/1.jpg)
Depth Estimation of Sound Images Using
Directional Clustering and Activation-Shared
Nonnegative Matrix Factorization
Tomo Miyauchi, Daichi Kitamura,
Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
![Page 2: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/2.jpg)
Outline
� Background and related study
� Problem and purpose
� Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation shared nonnegative matrix factorization
� Experiments
� Conclusions
2
![Page 3: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/3.jpg)
Background
With the advent of 3D TV, the reproduction of 3D image is realized.
Viewer feels uncomfortable due to mismatch of images.
Problem Picture image Sound image
: Sound
image
3D TV
3
To solve this problem, sound field reproduction technique
have been studied actively.
can present the “direction” and “depth” of
the sound images to the listener.
3D sound reproduction system has not been established yet.
![Page 4: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/4.jpg)
Related study: wave field synthesis
WFS allows us to create sound
images at the front of loudspeakers.
Wave Field Synthesis (WFS)
Sound field reproduction
Representation "depth“
of sound images
[A. J. Berkhout, et al., 1993]
…… …
Listener
4
Drawback of WFS×
Source separation
Localization estimation of
sound images
1
2
These information have been lost in
existing contents by down-mix.
Up-mixing method are required.
↓
Sound image
Mixed signal → individual source
WFS requires the primary source
information of sound images.
1. Individual sound source
2. Localization information
![Page 5: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/5.jpg)
Mixed multi-
channel signal
Wave field
Synthesis
Stereo contents Spatial sound
reproduction
Spatial sound system using existing contents
Flow of proposed up-mixer
Depth
estimation
New depth
estimation
Sound source
separation
1
Directional
estimation
Depth estimation of sound images has not been proposed
Conventional
method2
This study
5
![Page 6: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/6.jpg)
Related study: directional clustering [Araki, et al., 2007]
6:Source component :Spatial representative vector
L-ch
inp
ut
sig
na
l
R-ch input signal
L-ch
in
pu
t si
gn
al
R-ch input signal
Normalization Clustering
Mixed stereo signal
L-ch
inp
ut
sig
na
l
R-ch input signal
Individual sources of each cluster
: Fourier transform : Inverse Fourier transform
1
![Page 7: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/7.jpg)
Outline
� Background and related study
� Problem and purpose
� Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
� Experiments
� Conclusions
7
![Page 8: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/8.jpg)
Problem and purpose
8
Depth estimation method using
direction of arrival (DOA) distribution
Proposed method
Establishing new depth estimation method
How can we get depth information?
Purpose
Problem WFS requires specific localization information of
individual sound sources to reproduce a sound field.
Up-mixer
Directional estimation method have been developed.Directional estimation based on VBAP [Hirata, et al., 2011]
![Page 9: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/9.jpg)
Outline
� Background and related study
� Problem and purpose
� Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
� Experiments
� Conclusions
9
![Page 10: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/10.jpg)
→ “Direction of arrival” of sound waves
We estimate the depth using the DOA distribution.
Center RightLeft
Fre
qu
en
cyo
f so
urc
e c
om
po
ne
nts
Direction of arrival
Directional clustering Weighted DOA histogram
DOA
Amplitude
ratio of
10
Directional information
Weighting term
Proposed method 1: depth estimation based on DOA
Mixed signal
Individual sources
Magnitude of each vector
![Page 11: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/11.jpg)
Proposed method 1: depth estimation based on DOA
11
sou
rce
co
mp
on
en
tF
req
ue
ncy
of
sou
rce
co
mp
on
en
tF
req
ue
ncy
of
Direction of arrival
Close
Far
Observed DOA histogram
becomes smooth shape
Difference of DOA shape corresponding to source distance
Observed DOA distribution of the target source
can be used as a cue for depth estimation.
Observed DOA histogram
becomes spiky shape
Close source
Direction of arrival
Far source
� In sound fields, when a sound source is far from the listener, sound waves
arrive from various directions owing to sound diffusion.
![Page 12: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/12.jpg)
12
Generalized Gaussian distribution: GGD [Box, et al., 1973]
Proposed method 1: modeling of DOA distribution
βshape = 2: Gaussian
distribution PDF
βshape = 1: Laplacian
distribution PDF
Definition of GGD
Flexible family of probability
density function (PDF)
� To model DOA, we propose a new modeling method using GGD.
Shape of GGD changes
depending on βshape.
![Page 13: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/13.jpg)
13
Modeling of DOA distribution based on GGD parameter
Proposed method 1: modeling of DOA distribution
Close
Direction of arrival
so
urc
e c
om
po
ne
nts
Fre
qu
en
cy o
f
Far
Source is close⇔ βshape is small
Source is Far⇔ βshape is large
We propose a new depth estimation based on GGD.
Shape parameter βshape
is utilized as metric.
![Page 14: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/14.jpg)
Proposed method 2: problem in proposed method 1
Problem of
signal processing
L-ch
R-ch
Small noise components
are enhanced.
L-ch
in
pu
t si
gn
al
R-ch input signalBinaural – recorded
Normalization problem
14
DOA
Fre
qu
en
cy o
f so
urc
e c
om
po
ne
nts
CenterRightLeft
� Background noise and artificial distortion generated
by signal processing interfere with DOA histogram.
Activation-shared multichannel NMFFeature extraction
Noise
×
![Page 15: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/15.jpg)
Outline
� Background and related study
� Problem and purpose
� Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
� Experiments
� Conclusions
15
![Page 16: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/16.jpg)
Proposed method 2: activation-shared multichannel NMF
16
Time
Fre
qu
en
cy
Amplitude
Fre
qu
en
cy
Am
pli
tud
e
Time
Ω: Number of frequency bins
�: Number of time frames
�: Number of bases
Nonnegative matrix factorization: NMF [Lee, et al., 2001]
Activation matrix
(Time-varying gain)
Basis matrix
(Spectral patterns)
Observed matrix
(Spectrogram)
— is a sparse representation.
— can extract significant features from the observed matrix.
� The sparse representation provides high performance
for noise reduction, compression, and feature extraction.
We eliminate background noise and artificial distortion.
![Page 17: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/17.jpg)
17
L-chNMF
R-chNMF
� Conventional NMFs
generate an artificial
fluctuation.
Directional
information
DOA information
is disturbed.
Conventional NMF
Proposed method 2: problem of conventional NMF
NMFs are
applied in
parallel
Amplitude
ratioBases are trained
uncorrelated.
![Page 18: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/18.jpg)
18
This reduces dimensionality of
input signal while maintaining
directional information.
Cost function
Activation matrix
is shared through
all channels
Activation-shared multichannel NMFProposed method
: cost function, : β-divergence, : entries of matrices
L-chNMF
R-chNMF
Proposed method 2: activation-shared multichannel NMF
![Page 19: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/19.jpg)
- divergence [Eguchi, et al., 2001]
: Euclidean distance
: Generalized Kullback-Leibler divergence
: Itakura–Saito divergence
Generalized divergence of variable corresponding to .
19
Proposed method 2: activation-shared multichannel NMF
![Page 20: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/20.jpg)
20
Using-divergence
Proposed method 2: activation-shared multichannel NMF
Auxiliary function method is an optimization
scheme that uses the upper bound function.
1. Design the auxiliary function for as .
2. Minimize the original cost functions indirectly
by minimizing the auxiliary functions.
Derivation of optimal variables
![Page 21: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/21.jpg)
The first and second terms become convex or concave
functions with respect to value.
concave
convex
convex
concave
convex
concave
21
Proposed method 2: activation-shared multichannel NMF
Cost function
![Page 22: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/22.jpg)
� Convex: Jensen’s inequality
� Concave: tangent line inequality: Convex
function
: Concave
function22
Proposed method 2: activation-shared multichannel NMF
Cost function
Upper bound function of each term is defined by applying
![Page 23: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/23.jpg)
جئجئ
� The update rules for optimization are obtained from the
derivative of auxiliary function w.r.t. each objective variable.
23
are entries
of matrices .
Proposed method 2: activation-shared multichannel NMF
Update rules
![Page 24: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/24.jpg)
جئجئ
Flow of proposed depth estimation method
Input stereo signal
L-ch R-ch
STFT
Cluster RCluster C Cluster L
Weighted DOA histogram
estimationDepth
estimationDepth
estimationDepth
shared NMFActivation-
Direction of arrivalWe can estimate depth information by
calculate shape parameter of DOA histogram.
Fre
qu
en
cy o
f
sou
rce
co
mp
on
en
ts
Direction of arrival
Direction of arrival
shared NMFActivation-
shared NMFActivation-
24
Fre
qu
en
cy o
f
sou
rce
co
mp
on
en
ts
Fre
qu
en
cy o
f
sou
rce
co
mp
on
en
ts
![Page 25: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/25.jpg)
Outline
� Background and related study
� Problem and purpose
� Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
� Experiments
� Conclusions
25
![Page 26: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/26.jpg)
Experimental conditions
26
Conditions
� Mixed stereo signals
consist of 3 instruments.
� Target source is located
center with 7 distances.
� Combination related to
direction is 6 patterns.
Mixing source parameter
Test source 1
Test source 2
Test source 3
Reverberation time
NMF beta
NMF basis: Interference source
: Target source
at intervals
Conventional method 2
Conventional method 1
Proposed method
Weighted DOA histogram
(Not processed by NMF)
Processed by conventional NMF
Processed by proposed NMF
![Page 27: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/27.jpg)
Real source Image source
Geometry of image method
Time index
Ampl
itude
Example of room impulse response
Experimental conditions
Technique of simulating
room impulse response
� Volume of room
� Source location
� Microphone location
� Absorption coefficient
– can be set arbitrarily
Reference sound sources
were generated using
image method.
Image method
[Allen, et al., 1979]
27
![Page 28: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/28.jpg)
ҏجئ
28
Experimental results
Results 1
・ Results of conventional methods have no agreement with the oracle (image method).
・ Results of proposed method correctly estimates distance of the target source.
: Interference source
: Target source
Target source: Vocal
Interference source (left): Piano
Interference source (right): Guitar
Data set 1
![Page 29: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/29.jpg)
ҏجئ
29
Data set 1 2 3 4 5 6
Target source
Interference source (left)
Interference source (right)
Vocal
Piano
Guitar
Vocal
Guitar
Piano
Guitar
Piano
Vocal
Guitar
Vocal
Piano
Piano
Vocal
Guitar
Piano
Guitar
Vocal
Conventional method 1 0.350 0.532 0.154 0.277 0.602 0.496
Conventional method 2 0.189 0.165 0.044 -0.037 0.426 0.157
Proposed method 0.986 0.925 0.777 0.651 0.791 0.856
Experimental results: correlation coefficient
Correlation coefficient
between reference value
and estimated value
• Strong relation between the estimated value of proposed
method and the distance of the target source is indicated.
• The efficacy of the proposed method is confirmed.
Table Correlation coefficient of each method
Results 2
![Page 30: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/30.jpg)
Conclusions
30
�We proposed a new depth estimation method of
sound source in mixed signal using the shape of DOA
distribution.
� The shape of DOA distribution is modeling by GGD.
�We also proposed a new feature extraction method
for the multichannel signal, activation-shared
multichannel NMF.
� The result of the experiment indicated the efficacy of
the proposed method.
![Page 31: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/31.jpg)
䩐�
31
![Page 32: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/32.jpg)
Derivation of parameter βshape
Kurtosis of DOA histogram
we propose a closed-form parameter estimation
algorithm based on some approximation and kurtosis.
th moment of GGD
: Observed DOA histogram : Gamma function
×
32
Relation equation of kurtosis and shape parameter
The maximum-likelihood based shape parameter
estimation has no closed-form solution in GGD.
![Page 33: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/33.jpg)
جئجئ
Modified Stirling's formula
There is no exact closed-form solution of the inverse function.×
Approximation of
gamma function
Take a logarithm
33
Derivation of parameter βshape
Introduce Modified String’s formula
![Page 34: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/34.jpg)
جئجئ
This results in the following quadratic equation of to be solved
closed-form estimate of shape parameter
Preparation of depth estimation method is completed.
we can derive the closed-form estimation
34
Derivation of parameter βshape
![Page 35: Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization](https://reader034.fdocuments.net/reader034/viewer/2022042613/549ec2f9ac79594c768b4868/html5/thumbnails/35.jpg)
جئجئ
35
L-chNMF
R-chNMF
Preliminary experiment
Fluctuation are
generated in DOA Direction of arrival [degree]
L-chNMF
R-chNMF
(Individually applied)
conventional NMF
(Activation-shared)
proposed NMF
Weighted
DOA histogramCenter cluster DOA
of mixed source
(3 instrument)Direction of arrival [degree]
Direction of arrival [degree]
Feature extraction
while maintaining
directional information
Proposed method 2: activation-shared multichannel NMF
Example of
DOA histogram