Independent Component Analysis

28
. . . Independent Component Analysis for Blind Source Separation Tatsuya Yokota Tokyo Institute of Technology Jan. 31, 2012 Jan. 31, 2012 1/28

description

It is a seminar slide in my laboratory.

Transcript of Independent Component Analysis

Page 1: Independent Component Analysis

.

.

. ..

.

.

Independent Component Analysis for Blind SourceSeparation

Tatsuya Yokota

Tokyo Institute of Technology

Jan. 31, 2012

Jan. 31, 2012 1/28

Page 2: Independent Component Analysis

Outline

.. .1 Blind Source Separation

.. .2 Independent Component Analysis

.. .3 Experiments

.. .4 Summary

Jan. 31, 2012 2/28

Page 3: Independent Component Analysis

What’s a Blind Source Separation

Blind Source Separation is a method to estimate original signals from observedsignals which consist of mixed original signals and noise.

Jan. 31, 2012 3/28

Page 4: Independent Component Analysis

Example of BSS

BSS is often used for Speech analysis and Image analysis.

Jan. 31, 2012 4/28

Page 5: Independent Component Analysis

Example of BSS (cont’d)

BSS is also very important for brain signal analysis.

Jan. 31, 2012 5/28

Page 6: Independent Component Analysis

Model Formalization

The problem of BSS is formalized as follow:The matrix

X ∈ Rm×d (1)

denotes original signals, where m is number of original signals, and d is dimensionof one signal.We consider that the observed signals Y ∈ Rn×d are given by linear mixing systemas

Y = AX + E, (2)

where A ∈ Rn×m is the unknown mixing matrix and E ∈ Rn×d denotes a noise.Basically, n ≥ m.The goal of BSS is to estimate A and X so that X provides unknown originalsignal as possible.

Jan. 31, 2012 6/28

Page 7: Independent Component Analysis

Kinds of BSS Methods

Actually, degree of freedom of BSS model is very high to estimate A and X.Because there are a huge number of combinations (A,X) which satisfyY = AX + E.Therefore, we need some constraint to solve the BSS problem such as:

PCA : orthogonal constraint

SCA : sparsity constraint

NMF : non-negativity constraint

ICA : in-dependency constraint

In this way, there are many methods to solve the BSS problem depending on theconstraints. What we use is depend on subject matter.The Non-negative Matrix Factorization(NMF) was introduced in my previousseminar. We can get its solution by the alternating least squares algorithm.Today, I will introduce another method the Independent Component Analysis.

Jan. 31, 2012 7/28

Page 8: Independent Component Analysis

Independent Component Analysis

.The Cocktail Party Problem..

.

. ..

.

.

x1(t) = a11s1(t) + a12s2(t) + a13s3(t) (3)

x2(t) = a21s1(t) + a22s2(t) + a23s3(t) (4)

x3(t) = a31s1(t) + a32s2(t) + a33s3(t) (5)

x is an observed signal, and s is an original signal. We assume that {s1, s2, s3}are statistically independent of each other.

.The model of ICA..

.

. ..

.

.

Independent Component Analysis (ICA) is to estimate the independentcomponents s(t) from x(t).

x(t) = As(t) (6)

Jan. 31, 2012 8/28

Page 9: Independent Component Analysis

Approach

.Hypothesis of ICA..

.

. ..

.

.

...1 {si} are statistically independent of each other,

p(s1, s2, . . . , sn) = p(s1)p(s2) · · · p(sn). (7)

...2 {si} follow the Non-Gaussian distribution.If {si} follows the Gaussian distribution, then ICA is impossible.

...3 A is a regular matrix.Therefore, we can rewrite the model as

s(t) = Bx(t), (8)

where B = A−1. It is only necessary to estimate B so that {si} areindependent.

Jan. 31, 2012 9/28

Page 10: Independent Component Analysis

Whitening and ICA

.Definition of White signal..

.

. ..

.

.

White signals are defined as any z which satisfies conditions of

E[z] = 0, E[zzT ] = I. (9)

First, we show an example of original independent signals and observed signal asfollow:

(a) source (s1, s2) (b) observed (x1, x2)

Observed signals x(t) are given by x(t) = As(t).ICA give us the original signals s(t) by s(t) = Bx(t).

Jan. 31, 2012 10/28

Page 11: Independent Component Analysis

Whitening and ICA (cont’d)

Whitening is useful for preprocessing of ICA.First, we apply the whitening to observed signals x(t).

(c) observed (x1, x2) (d) whitening (z1, z2)

The whitening signals are denoted as (z1, z2), and they are given by

z(t) = V x(t), (10)

where V is a whitening matrix for x. Model becomes

s(t) = Uz(t) = UV x(t) = Bx(t), (11)

and U is an orthogonal transform matrix. We can say that the whiteningsimplifies the ICA problem. So it is only necessary to estimate U .

Jan. 31, 2012 11/28

Page 12: Independent Component Analysis

Non-Gaussianity and ICA

Non-Gaussianity is a measure of in-dependency.According to the central limit theorem, the Gaussianity of x(t) must be largerthan s(t).Now, we put bTi as mixing vector, si(t) = bTi x(t). We want to maximize theNon-Gaussianity of (bTi x(t)). Then such b is a part of solution B.For example, there are following two vector b′ and b. We can say that b is betterthan b′.

Jan. 31, 2012 12/28

Page 13: Independent Component Analysis

Maximization of Kurtosis

Kurtosis is a measures of Non-Gaussianity. Kurtosis is defined by

kurt(y) = E[y4]− 3(E[y2])2. (12)

We assume that y is white (i.e. E[y] = 0, E[y2] = 1 ), then

kurt(y) = E[y4]− 3. (13)

We can solve the ICA problem by

b = maxb|kurt(bTx(t))|. (14)

Figure: Kurtosis

Jan. 31, 2012 13/28

Page 14: Independent Component Analysis

Fast ICA algorithm based on Kurtosis

We consider z is a white signal given from x. And we consider to maximize theabsolute value of kurtosis as

maximize |kurt(wTz)|, s.t. wTw = 1. (15)

Differential of |kurt(wTz)| is given by

∂|kurt(wTz)|∂w

=∂

∂w

∣∣E{(wTz)4} − 3E{(wTz)2}2∣∣ (16)

=∂

∂w

∣∣E{(wTz)4} − 3{||w||2}2∣∣ (because E(zzT ) = I) (17)

= 4sign[kurt(wTz)][E{z(wTz)3} − 3w||w||2

](18)

Jan. 31, 2012 14/28

Page 15: Independent Component Analysis

Fast ICA algorithm based on Kurtosis (cont’d)

According to the gradient method, we can obtain following algorithm:.Gradient algorithm based on Kurtosis..

.

. ..

.

.

w ← w +∆w, (19)

w ← w

||w||, (20)

∆w ∝ sign[kurt(wTz)][E{z(wTz)3} − 3w

]. (21)

We can see that above algorithm converge when w ∝ ∆w. And w and −w areequivalent solution, so we can obtain another algorithm:.Fast ICA algorithm based on Kurtosis..

.

. ..

.

.

w ← E{z(wTz)3} − 3w, (22)

w ← w

||w||. (23)

It is well known as a fast convergence algorithm for ICA !!Jan. 31, 2012 15/28

Page 16: Independent Component Analysis

Example

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

(a) subgaussian

-4

-2

0

2

4

-4 -2 0 2 4

(b) supergaussian

Figure: Example of ICA

Jan. 31, 2012 16/28

Page 17: Independent Component Analysis

Issue of Kurtosis

Kurtosis has a fatal issue that it is very weak with the outliers. BecauseKurtosis is a fourth order function.Following figure depicts the result of kurtosis based ICA with outlier. The rates ofoutliers is only 2 %.

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

Figure: With outliers (20 : 1000)

Jan. 31, 2012 17/28

Page 18: Independent Component Analysis

Neg-entropy based ICA

Kurtosis is very weak with outliers.Hence, the Neg-entropy is often used for ICA. In strictly, the approximation ofneg-entropy is often used, because it is robust for outliers.Neg-entropy is defined by

J(y) = H(yGauss)−H(y), (24)

where

H(y) = −∫

py(η) log py(η)dη, (25)

and yGauss is a Gaussian distribution of µ = E(y) and σ =√E((y − µ)2).

If y follows Gaussian distribution, then J(y) = 0.

Jan. 31, 2012 18/28

Page 19: Independent Component Analysis

Fast ICA algorithm based on Neg-entropy

The approximation procedure of neg-entropy is complex, then it is omitted here.We just introduce the fast ICA algorithm based on neg-entropy:

.Fast ICA algorithm based on Neg-entropy..

.

. ..

.

.

w ← E[zg(wTz)]− E[g′(wTz)]w (26)

w ← w

||w||(27)

where we can select functions g and g′ from...1 g1(y) = tanh(a1y) and g′1(y) = a1(1− tanh2(a1y)),...2 g2(y) = y exp(−y2/2) and g′2(y) = (1− y2) exp(−y2/2),...3 g3(y) = y3 and g′3(y) = 3y2.

1 ≤ a1 ≤ 2.Please note that (g3, g

′3) is equivalent to Kurtosis based ICA.

Jan. 31, 2012 19/28

Page 20: Independent Component Analysis

Examples

We can see that neg-entropy based ICA is robust for outliers.

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

(a) Kurtosis based

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

(b) Neg-entropy based (using g1)

Figure: With outliers (20 : 1000)

Jan. 31, 2012 20/28

Page 21: Independent Component Analysis

Experiments: Real Image 1

(a) newyork

(b) shanghai

Figure: Original Signals

(a) ob 1 (b) ob 2

Figure: Observed Signals

(a) estimated signal 1

(b) estimated signal 2

Figure: Estimated Signals

Jan. 31, 2012 21/28

Page 22: Independent Component Analysis

Experiments: Real Image 2

(a) buta

(b) kobe

Figure: Original Signals

(a) ob 1 (b) ob 2

Figure: Observed Signals

(a) estimated signal 1

(b) estimated signal 2

Figure: Estimated Signals

Jan. 31, 2012 22/28

Page 23: Independent Component Analysis

Experiments: Real Image 2 (using filtering)

(a) buta

(b) kobe

Figure: Original Signals

(a) ob 1 (b) ob 2

Figure: Observed Signals

(a) estimated signal 1

(b) estimated signal 2

Figure: Estimated Signals

Jan. 31, 2012 23/28

Page 24: Independent Component Analysis

Experiments: Real Image 3 (using filtering)

(a) nyc (b) sha

(c) rock (d) pig

(e) obs1 (f) obs2

(g) obs3 (h) obs4

Figure: Ori. & Obs.

(a) estimated signal 1 (b) estimated signal 2

(c) estimated signal 3 (d) estimated signal 4

Figure: Estimated Signals

Jan. 31, 2012 24/28

Page 25: Independent Component Analysis

Approaches of ICA

In this research area, many method for ICA are studied and proposed as follow:...1 Criteria of ICA [Hyvarinen et al., 2001]

Non-Gaussianity based ICA*

Kurtosis based ICA*Neg-entropy based ICA*

MLE based ICAMutual information based ICANon-linear ICATensor ICA

...2 Solving Algorithm for ICA

gradient method*fast fixed-point algorithm* [Hyvarinen and Oja, 1997]

(‘*’ were introduced today.)

Jan. 31, 2012 25/28

Page 26: Independent Component Analysis

Summary

I introduced about BSS problem and basic ICA techniques (Kurtosis,Neg-entropy).

Kurtosis is weak with outliers.

Neg-entropy is proposed as a robust measure of Non-Gaussianity.

I conducted experiments of ICA using Image data.

In some case, worse results are obtained.

But I solved this issue by using differential filter.

This technique is proposed in [Hyvarinen, 1998].

We knew that the differential filter is very effective for ICA.

Jan. 31, 2012 26/28

Page 27: Independent Component Analysis

Bibliography I

[Hyvarinen, 1998] Hyvarinen, A. (1998).Independent component analysis for time-dependent stochastic processes.

[Hyvarinen et al., 2001] Hyvarinen, A., Karhunen, J., and Oja, E. (2001).Independent Component Analysis.Wiley.

[Hyvarinen and Oja, 1997] Hyvarinen, A. and Oja, E. (1997).A fast fixed-point algorithm for independent component analysis.Neural Computation, 9:1483–1492.

Jan. 31, 2012 27/28

Page 28: Independent Component Analysis

Thank you for listening

Jan. 31, 2012 28/28