Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis...

Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on

Multistage Independent Component AnalysisMultistage Independent Component Analysis

Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on

Multistage Independent Component AnalysisMultistage Independent Component Analysis

Hiroshi SARUWATARIHiroshi SARUWATARI,,

Tsuyoki NISHIKAWA, and Kiyohiro SHIKANOTsuyoki NISHIKAWA, and Kiyohiro SHIKANO

Graduate School of Information ScienceGraduate School of Information Science

Nara Institute of Science and TechnologyNara Institute of Science and Technology

Background of blind source separation (BSS)

Overview of Independent Component Analysis (ICA) Time-domain ICA (TDICA) Frequency-domain ICA (FDICA)

Disadvantages of TDICA and FDICA

Proposal of Multistage ICA

Experimental results under real acoustic condition

Conclusion

ContentsContents

BackgroundBackground

Hands-free speech recognition systemHands-free speech recognition system

Target SpeechMicrophone

Speech recognition system

？Interference

Interference is also observed at microphone.Speech recognition performance

significantly degrades.

Is it fine tomorrow?

Goal

Microphone array Receiver which consists of multiple elements To enhance target speech or reduce interference

Problem of microphone array processing A priori information is required.

Directions of arrival of the sound sources Breaks of target speech for filter adaptation

Background (Cont’d)Background (Cont’d)

Realization of high quality hands-free speech interface system

Problem of Microphone ArrayProblem of Microphone Array

• Delay and Sum： to produce narrow beam-pattern

• Adaptive： to update filters to reduce the noise

θ

TargetHigh sidelobe gains noise.

θ

Set target direction.

It is necessary to observe only noise for filter learning.

Null

Approach taken to estimate source signals only from the observed mixed signals. Any information about source directions and acoustic c

onditions is not required. Independent component Analysis (ICA) is mainly used.

Previous works on ICA J. Cardoso, 1989 C. Jutten, 1990 (Higher-order decorrelation) P. Common, 1994 (define the term “ICA”) A. Bell et al., 1995 (infomax)

Blind Source Separation (BSS)Blind Source Separation (BSS)

Microphone2

Microphone1MutuallyMutuallyIndependentIndependent KnownKnown

ICA-Based BSSICA-Based BSS

Speaker2

Speaker1

Good Morning!

Hello!

Observedsignal1

Observedsignal2Source2

Source1

To estimate source signalsTo estimate source signals

No a priori information

(unsupervised adaptive filtering)

BSS for Instantaneous mixtureBSS for Instantaneous mixture

)(

)(

)(

)( 11

1

111

tx

tx

ts

ts

AA

AA

LKLKL

K

Linearly Mixing ProcessLinearly Mixing Process

Mixing Matrix Source Observed

Separation ProcessSeparation ProcessSeparated Unmixing Matrix

)(

)(

)(

)( 1

1

1111

tx

tx

WW

WW

ty

ty

LKLK

L

K

Independent?Independent?

Cost Function

Optimize

Various Criterion for ICAVarious Criterion for ICA

• Decorrelation– To minimize correlation among signals

in multiple time durations

• Nonlinear function 1– To minimize higher-order correlation

• Nonlinear function 2– To assume p.d.f of sources

Separated Signal：　 T21 )(),...,()( tytyt y

diag)()(E T tt yy diag)()(E T tt yy

diag)()(E T3 tt yy diag)()(E T3 tt yy

diag)()(E T tt yyΦ diag)()(E T tt yyΦ

:Φ Sigmoid function

Cost Function for Nonlinear Function 2Cost Function for Nonlinear Function 2

),,( 1 Kyyp Kullback-Leibler (KL) divergence between 　　　　　　　　 and

K

k kyp1)(

1. Joint Entropy of y 2. Sum of marginal entropy of ky

・Minimized when are mutually independent・ This can be achieved by maximization of because hardly changes.

ky);( WH Y

);( WYH k

K

kk

K

k k

WYHWH

dyp

ppWKL

1

1

);();(

)(

)(log)()(

Y

yy

y

Derivation for Nonlinear Function 2Derivation for Nonlinear Function 2

1TT

T1T

T1T

)(E

)(E)(

)()()()(

WyyI

xyW

xxyWWW

W

y

x

dxpKL

)(WKL

Nonlinear Function 2 　⇒　 To be diagonalized

where

W

This can be approximated by Sigmoid Function in speech signal.

T

1

1 )(log...,,

)(log)(

K

K

yyp

yyp

y

To update along the negative gradient of

Why Instantaneous Mixture Model?Why Instantaneous Mixture Model?

• ICA for Instantaneous Mixture model→ Mixing matrix is represented as real-valued.→ No assumption for time delay among

　　microphones and room reflections.

It is a useful example to show how ICA can work, but only mathematical “Toy model”.

Is it applicable to sound signals in real acoustic environment?

NO!

ICA for Convolutive Mixture Model (1)ICA for Convolutive Mixture Model (1)

• In application to microphone array,– each received signal has a time delay which corresponds

to the sound direction and position of each element.– mixing matrix A is not simple scalar-valued coefficient,

but is represented as “convolution filter”.

)(

)(

)(

)(

)()(

)()( 11

1

111

tx

tx

ts

ts

tAtA

tAtA

LKLKL

K

Mixing Process in Real EnvironmentMixing Process in Real Environment

Mixing Matrix (FIR-Filter) Source Observed

Convolution

⇒We should use FIR filter for separation process.

ICA for Convolutive Mixture Model (2)ICA for Convolutive Mixture Model (2)

)(

)(

)(

)(

)()(

)()( 11

1

111

fX

fX

fS

fS

fAfA

fAfA

LKLKL

K

Mixing Process in Frequency DomainMixing Process in Frequency Domain

Complex-Valued Mixing Matrix Source observed

⇒We only solve the complex-valued instantaneous mixture

　　 in each subband independently.

To simplify the convolutive mixture down to instantaneous mixtures by the frequency tra

nsform

Permutation Problem in FDICAPermutation Problem in FDICA

Permutation and Gain DeterminacyPermutation and Gain Determinacy

・ ICA is conducted in each subband independently.

・ Ordering and Scaling of outputs are arbitrary in ICA.

ICA at 1 kHz

ICA at 1 kHz

2 kHzICA

1 kHzICA

FDICA

S1

S2

S1×0.5

S2×1

S2×0.4

S1×1.1

Solutions for Permutation ProblemSolutions for Permutation Problem

・ To use correlation among outputs waveforms

(Murata et al. 1998)

・ To use directivity pattern of W

(Kurita and Saruwatari, 2000)

・ To use correlation among unmixing matrices in 　 neighboring frequency bins

(Parra et al, 2000, Asano et al, 2001)

・ To use correlation among outputs waveforms

(Murata et al. 1998)

・ To use directivity pattern of W

(Kurita and Saruwatari, 2000)

・ To use correlation among unmixing matrices in 　 neighboring frequency bins

(Parra et al, 2000, Asano et al, 2001)

Problem in ICA-Based BSSProblem in ICA-Based BSS

It is necessary to achieve robust BSSIt is necessary to achieve robust BSS method against reverberation.method against reverberation.

It is necessary to achieve robust BSSIt is necessary to achieve robust BSS method against reverberation.method against reverberation.

・ Separation performance under reverberant

　 conditions significantly degrades.Why? Reverberation in typical roomReverberation in typical room

==2400-tap FIR-filter in 8 kHz sampling2400-tap FIR-filter in 8 kHz sampling

⇒⇒The number of parameters is too large.The number of parameters is too large.

Conventional ApproachesConventional Approaches

Frequency-Domain ICA (FDICA) To estimate the separation coefficients every

frequency bin in the frequency domain

Time-Domain ICA (TDICA) To estimate the separation FIR filter in the

time domain

FDICA-Based BSSFDICA-Based BSS

f<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability

<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability

In conventional dereverberation processing, dereverberation performance is improved as the number of subbands (filter length) is enlarged.

Performance of FDICAPerformance of FDICA

In FDICA, as the number of subbands is enlarged,

is source-separation performance also improved?

In FDICA, as the number of subbands is enlarged,


<Speculation>

To investigate the relation between the number of subbands and source-separation performance

Experimental SetupExperimental Setup

Sound source: 2-male and 2-female speech from ASJ corpus (12 combinations) Evaluation: Noise Reduction Rate

= Output SNR [dB] – Input SNR [dB]

• Interelement spacing is 4 cm.• Reverberation time is 300 ms.

FDICA AlgorithmFDICA Algorithm

Iterative learning algorithm (Amari, 1996):

where

YYYY ofpartimaginary :ofpartreal : (I)(R) ,

η: Step size parameter

Relation between Number of Subbands Relation between Number of Subbands and Separation Performanceand Separation Performance

6.16.6

7.2 7.68.5

7.4

9.4

3.0

0

2

4

6

8

10

32 64 128 256 512 1024 2048 4096Number of Subbands

Noi

se R

educ

tion

Rat

e [d

B]

Separation performance significantly degradesSeparation performance significantly degrades

<Disadvantages><Disadvantages> When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001). Separation performance in FDICA is saturated before reaching a sufficient performance.

<Disadvantages><Disadvantages> When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001). Separation performance in FDICA is saturated before reaching a sufficient performance.

Relation between Number of Subbands aRelation between Number of Subbands and Correlationnd Correlation

0

0.02

0.04

0.06

0.08

0.1

32 64 128 256 512 1024 2048 4096Number of Subbands

Cor

rela

tion

am

ong

Sig

nals

Higher correlationHigher correlation

If we increase the number of subbandsIf we increase the number of subbandstoo much, the correlation between too much, the correlation between narrow-band signals increases.narrow-band signals increases.

If we increase the number of subbandsIf we increase the number of subbandstoo much, the correlation between too much, the correlation between narrow-band signals increases.narrow-band signals increases.

Trade-off Relation between Independence and Trade-off Relation between Independence and Robustness against ReverberationRobustness against Reverberation

Number of Subbands LargeSmall

No

ise

Red

uct

ion

Rat

e

Low-Low-IndependenceIndependence

High-Independence Robust against Reverberation

Poor againstReverberation

TDICA-Based BSSTDICA-Based BSS

<Advantages> To treat the fullband signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point

<Advantages> To treat the fullband signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point

In conventional dereverberation processing, dereverberation performance is improved as the filter length is lengthened.

Performance of TDICAPerformance of TDICA

In TDICA, as the filter length is lengthened,


In TDICA, as the filter length is lengthened,


<Speculation>

To investigate the relation between the filter length and source-separation performance

TDICA AlgorithmTDICA Algorithm

Cost function

B

b

bb

BzQ

1

)()( )0(detlog)0( diagdetlog2

1))(( yy RRw

minimize

wheret

bbb nttn T)()()( )()()( yyRy

)()()()()( )(1

0

)()( tzktkt bK

k

bb xWxwy

1

0

)()(K

k

kzkz wW

)()( )()( kttz bbk xx

TDICA Algorithm (Cont’d)TDICA Algorithm (Cont’d)

Iterative equation of separation filter

Natural gradient of Q(w(z))

)()()(

))(()( T zz

k

zQkΔ ii

i

ii WW

w

Ww

)()()(1 kΔkk iii www

where α is the step-size parameter

Proposed Iterative Equation of TDICAProposed Iterative Equation of TDICA

Iterative equation of separation filter (TDICA1)

)()()()0(diag

)()0()(

)t()(1)(

1

)(1)(1

kzk

kB

k

iibb

B

b

bbi

wWRR

RRw

yy

yy

Evaluate only correlation of same time

Iterative equation of separation filter (TDICA2)

)()()(diag)0(diag

)()0(diag)(

)t()(1)(

1

)(1)(1

kzk

kB

k

iibb

B

b

bbi

wWRR

RRw

yy

yy

Expand to evaluate correlation of different time

5.8

4.4

2.8

0.90.30.10.30.40.30.2

0.60.2

7.8

1.7

0.4 0.3

0

1

2

3

4

5

6

7

8

10 20 50 100 200 500 1000 2000Filter Length [taps]

Noi

se R

educt

ion R

ate

[dB]

Results of TDICA1 and TDICA2Results of TDICA1 and TDICA2

TDICA 1 (only correlation of same time) TDICA 2 (correlation of different time)

It is necessary to evaluate correlation of different times to achieve a superior performance. Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.

It is necessary to evaluate correlation of different times to achieve a superior performance. Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.

<Discussions>

Advantages To simplify the convolutive

mixture down to instantaneous mixtures

Easy to converge the separation filter with high stability

Disadvantages Independence assumption

collapses in each narrow-band.

Separation performance is saturated before reaching a sufficient performance.

Problems and SolutionProblems and Solution

FDICA TDICA Advantages

To treat the fullband speech signals where the independence assumption of sources usually holds

High-convergence possibility near the optimal point

Disadvantages Iterative rule for filter learnin

g is complicated. Convergence degrades unde

r reverberant conditions.

ComplementComplement

Second-stageFirst-stage

To use advantages of FDICA and TDICA together

Multistage ICA (MSICA) combining FDICA and TDICA

Multistage ICA (MSICA) combining FDICA and TDICA

Separation Procedure of MSICASeparation Procedure of MSICA

FDICA TDICAMixing system

Separated signals of FDICA are regarded as the input signals for TDICA. Residual cross-talk components of FDICA can be removed by TDICA.

Separated signals of FDICA are regarded as the input signals for TDICA. Residual cross-talk components of FDICA can be removed by TDICA.

To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA

Comparison among separation performances of TDICA, FDICA, and MSICA

Comparison among Each ICAComparison among Each ICA

10.2 10.1 10.4 10.6

12.5 12.7

0.9

2.8

4.4

5.8

7.8

1.7

0.4 0.3

10.011.0

0

2

4

6

8

10

12

14

10 20 50 100 200 500 1000 2000Filter Length [taps]

Noi

se R

educt

ion R

ate

[dB] TDICA MSICA FDICA

Relation between Filter Length Relation between Filter Length and Separation Performanceand Separation Performance

9.4 Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.

Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.

<Discussions>

To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA

Comparison among separation performances of TDICA, FDICA, and MSICA

Comparison among Each ICAComparison among Each ICA

Comparison ResultsComparison Results

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5 6 7 8 9 10 11 12

Combination of Speakers

Noi

se R

educ

tion

Rat

e [d

B] TDICA FDICA MSICAAverage: TDICA: 5.9 dB， FDICA: 9.4 dB，MSICA： 12.1 dBAverage: TDICA: 5.9 dB， FDICA: 9.4 dB，MSICA： 12.1 dB

Separation performance of MSICA is superior to

those of TDICA and FDICA.Combination of FDICA and TDICA is inherently

effective for improving the separation performance.

Separation performance of MSICA is superior to

those of TDICA and FDICA.Combination of FDICA and TDICA is inherently

effective for improving the separation performance.

<Discussions>

Spectral Distortion (TDICA)Spectral Distortion (TDICA)

10 taps: No whitening, 1000 taps: whitening

Spectral Distortion (FDICA, MSICA)Spectral Distortion (FDICA, MSICA)

FDICA: No whitening, MSICA: No whitening

Sound Demonstration of MSICASound Demonstration of MSICA

Reverberation time: 300 ms

Mixed speech (Female+Male)

Separated speech (Female)

Separated speech (Male)

FemaleFemale

40°

MaleMale

-30°

ConclusionConclusion

Disadvantages of FDICA and TDICA Separation performance of FDICA is saturated

before reaching a sufficient performance. Source separation using long filter fails in TDICA

because the iterative rule for FIR-filter learning is complicated.

MSICA combining FDICA and TDICA In TDICA part in MSICA, separation

performance is improved even with the long filter.

Separation performance of MSICA is superior to those of TDICA and FDICA.

Future WorkFuture Work

Further evaluation in real environment Robustness under reverberant conditions Larger array with more than 2-element

To apply BSS to speech recognition system Improvement of convergence speedOn-line and real-time algorithm

ICA and BSS: Where do we go?ICA and BSS: Where do we go?

We should go to NARA-city!

Why?

44thth International Symposium on International Symposium onICA and BSS (ICA2003), in NARAICA and BSS (ICA2003), in NARA

Date: April 1-4, 2003 Place: Nara, JAPAN Scientific Areas:

ICA and Factor Analysis, PCA etc. Blind source separation Blind and semi-blind equalization and deconvolution Blind identification Any signal processing application related with ICA

URL: http://ica2003.jp/

Analysis ConditionsAnalysis Conditions

Filter length10 (TDICA)

1000 (MSICA)

Number of blocks B

3 (TDICA)9 (MSICA)

ICA-iteration 500

Number of subbands 1024 point

Frame shift 16 point

ICA-iteration 30

FDICA, FDICA part in MSICA

TDICA, TDICA part in MSICA

Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis...

Documents

Transcript of Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis...