Independent Component Analysis

.

.

. ..

.

.

Independent Component Analysis for Blind SourceSeparation

Tatsuya Yokota

Tokyo Institute of Technology

Jan. 31, 2012

Jan. 31, 2012 1/28

Outline

.. .1 Blind Source Separation

.. .2 Independent Component Analysis

.. .3 Experiments

.. .4 Summary

Jan. 31, 2012 2/28

What’s a Blind Source Separation

Blind Source Separation is a method to estimate original signals from observedsignals which consist of mixed original signals and noise.

Jan. 31, 2012 3/28

Example of BSS

BSS is often used for Speech analysis and Image analysis.

Jan. 31, 2012 4/28

Example of BSS (cont’d)

BSS is also very important for brain signal analysis.

Jan. 31, 2012 5/28

Model Formalization

The problem of BSS is formalized as follow:The matrix

X ∈ Rm×d (1)

denotes original signals, where m is number of original signals, and d is dimensionof one signal.We consider that the observed signals Y ∈ Rn×d are given by linear mixing systemas

Y = AX + E, (2)

where A ∈ Rn×m is the unknown mixing matrix and E ∈ Rn×d denotes a noise.Basically, n ≥ m.The goal of BSS is to estimate A and X so that X provides unknown originalsignal as possible.

Jan. 31, 2012 6/28

Kinds of BSS Methods

Actually, degree of freedom of BSS model is very high to estimate A and X.Because there are a huge number of combinations (A,X) which satisfyY = AX + E.Therefore, we need some constraint to solve the BSS problem such as:

PCA : orthogonal constraint

SCA : sparsity constraint

NMF : non-negativity constraint

ICA : in-dependency constraint

In this way, there are many methods to solve the BSS problem depending on theconstraints. What we use is depend on subject matter.The Non-negative Matrix Factorization(NMF) was introduced in my previousseminar. We can get its solution by the alternating least squares algorithm.Today, I will introduce another method the Independent Component Analysis.

Jan. 31, 2012 7/28

Independent Component Analysis

.The Cocktail Party Problem..

.

. ..

.

.

x1(t) = a11s1(t) + a12s2(t) + a13s3(t) (3)

x2(t) = a21s1(t) + a22s2(t) + a23s3(t) (4)

x3(t) = a31s1(t) + a32s2(t) + a33s3(t) (5)

x is an observed signal, and s is an original signal. We assume that {s1, s2, s3}are statistically independent of each other.

.The model of ICA..

.

. ..

.

.

Independent Component Analysis (ICA) is to estimate the independentcomponents s(t) from x(t).

x(t) = As(t) (6)

Jan. 31, 2012 8/28

Approach

.Hypothesis of ICA..

.

. ..

.

.

...1 {si} are statistically independent of each other,

p(s1, s2, . . . , sn) = p(s1)p(s2) · · · p(sn). (7)

...2 {si} follow the Non-Gaussian distribution.If {si} follows the Gaussian distribution, then ICA is impossible.

...3 A is a regular matrix.Therefore, we can rewrite the model as

s(t) = Bx(t), (8)

where B = A−1. It is only necessary to estimate B so that {si} areindependent.

Jan. 31, 2012 9/28

Whitening and ICA

.Definition of White signal..

.

. ..

.

.

White signals are defined as any z which satisfies conditions of

E[z] = 0, E[zzT ] = I. (9)

First, we show an example of original independent signals and observed signal asfollow:

(a) source (s1, s2) (b) observed (x1, x2)

Observed signals x(t) are given by x(t) = As(t).ICA give us the original signals s(t) by s(t) = Bx(t).

Jan. 31, 2012 10/28

Whitening and ICA (cont’d)

Whitening is useful for preprocessing of ICA.First, we apply the whitening to observed signals x(t).

(c) observed (x1, x2) (d) whitening (z1, z2)

The whitening signals are denoted as (z1, z2), and they are given by

z(t) = V x(t), (10)

where V is a whitening matrix for x. Model becomes

s(t) = Uz(t) = UV x(t) = Bx(t), (11)

and U is an orthogonal transform matrix. We can say that the whiteningsimplifies the ICA problem. So it is only necessary to estimate U .

Jan. 31, 2012 11/28

Non-Gaussianity and ICA

Non-Gaussianity is a measure of in-dependency.According to the central limit theorem, the Gaussianity of x(t) must be largerthan s(t).Now, we put bTi as mixing vector, si(t) = bTi x(t). We want to maximize theNon-Gaussianity of (bTi x(t)). Then such b is a part of solution B.For example, there are following two vector b′ and b. We can say that b is betterthan b′.

Jan. 31, 2012 12/28

Maximization of Kurtosis

Kurtosis is a measures of Non-Gaussianity. Kurtosis is defined by

kurt(y) = E[y4]− 3(E[y2])2. (12)

We assume that y is white (i.e. E[y] = 0, E[y2] = 1 ), then

kurt(y) = E[y4]− 3. (13)

We can solve the ICA problem by

b = maxb|kurt(bTx(t))|. (14)

Figure: Kurtosis

Jan. 31, 2012 13/28

Fast ICA algorithm based on Kurtosis

We consider z is a white signal given from x. And we consider to maximize theabsolute value of kurtosis as

maximize |kurt(wTz)|, s.t. wTw = 1. (15)

Differential of |kurt(wTz)| is given by

∂|kurt(wTz)|∂w

=∂

∂w

∣∣E{(wTz)4} − 3E{(wTz)2}2∣∣ (16)

=∂

∂w

∣∣E{(wTz)4} − 3{||w||2}2∣∣ (because E(zzT ) = I) (17)

= 4sign[kurt(wTz)][E{z(wTz)3} − 3w||w||2

](18)

Jan. 31, 2012 14/28

Fast ICA algorithm based on Kurtosis (cont’d)

According to the gradient method, we can obtain following algorithm:.Gradient algorithm based on Kurtosis..

.

. ..

.

.

w ← w +∆w, (19)

w ← w

||w||, (20)

∆w ∝ sign[kurt(wTz)][E{z(wTz)3} − 3w

]. (21)

We can see that above algorithm converge when w ∝ ∆w. And w and −w areequivalent solution, so we can obtain another algorithm:.Fast ICA algorithm based on Kurtosis..

.

. ..

.

.

w ← E{z(wTz)3} − 3w, (22)

w ← w

||w||. (23)

It is well known as a fast convergence algorithm for ICA !!Jan. 31, 2012 15/28

Example

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

(a) subgaussian

-4

-2

0

2

4

-4 -2 0 2 4

(b) supergaussian

Figure: Example of ICA

Jan. 31, 2012 16/28

Issue of Kurtosis

Kurtosis has a fatal issue that it is very weak with the outliers. BecauseKurtosis is a fourth order function.Following figure depicts the result of kurtosis based ICA with outlier. The rates ofoutliers is only 2 %.

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

Figure: With outliers (20 : 1000)

Jan. 31, 2012 17/28

Neg-entropy based ICA

Kurtosis is very weak with outliers.Hence, the Neg-entropy is often used for ICA. In strictly, the approximation ofneg-entropy is often used, because it is robust for outliers.Neg-entropy is defined by

J(y) = H(yGauss)−H(y), (24)

where

H(y) = −∫

py(η) log py(η)dη, (25)

and yGauss is a Gaussian distribution of µ = E(y) and σ =√E((y − µ)2).

If y follows Gaussian distribution, then J(y) = 0.

Jan. 31, 2012 18/28

Fast ICA algorithm based on Neg-entropy

The approximation procedure of neg-entropy is complex, then it is omitted here.We just introduce the fast ICA algorithm based on neg-entropy:

.Fast ICA algorithm based on Neg-entropy..

.

. ..

.

.

w ← E[zg(wTz)]− E[g′(wTz)]w (26)

w ← w

||w||(27)

where we can select functions g and g′ from...1 g1(y) = tanh(a1y) and g′1(y) = a1(1− tanh2(a1y)),...2 g2(y) = y exp(−y2/2) and g′2(y) = (1− y2) exp(−y2/2),...3 g3(y) = y3 and g′3(y) = 3y2.

1 ≤ a1 ≤ 2.Please note that (g3, g

′3) is equivalent to Kurtosis based ICA.

Jan. 31, 2012 19/28

Examples

We can see that neg-entropy based ICA is robust for outliers.

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

(a) Kurtosis based

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

(b) Neg-entropy based (using g1)

Figure: With outliers (20 : 1000)

Jan. 31, 2012 20/28

Experiments: Real Image 1

(a) newyork

(b) shanghai

Figure: Original Signals

(a) ob 1 (b) ob 2

Figure: Observed Signals

(a) estimated signal 1

(b) estimated signal 2

Figure: Estimated Signals

Jan. 31, 2012 21/28

Experiments: Real Image 2

(a) buta

(b) kobe


(a) ob 1 (b) ob 2





Jan. 31, 2012 22/28

Experiments: Real Image 2 (using filtering)

(a) buta

(b) kobe


(a) ob 1 (b) ob 2





Jan. 31, 2012 23/28

Experiments: Real Image 3 (using filtering)

(a) nyc (b) sha

(c) rock (d) pig

(e) obs1 (f) obs2

(g) obs3 (h) obs4

Figure: Ori. & Obs.

(a) estimated signal 1 (b) estimated signal 2

(c) estimated signal 3 (d) estimated signal 4


Jan. 31, 2012 24/28

Approaches of ICA

In this research area, many method for ICA are studied and proposed as follow:...1 Criteria of ICA [Hyvarinen et al., 2001]

Non-Gaussianity based ICA*

Kurtosis based ICA*Neg-entropy based ICA*

MLE based ICAMutual information based ICANon-linear ICATensor ICA

...2 Solving Algorithm for ICA

gradient method*fast fixed-point algorithm* [Hyvarinen and Oja, 1997]

(‘*’ were introduced today.)

Jan. 31, 2012 25/28

Summary

I introduced about BSS problem and basic ICA techniques (Kurtosis,Neg-entropy).

Kurtosis is weak with outliers.

Neg-entropy is proposed as a robust measure of Non-Gaussianity.

I conducted experiments of ICA using Image data.

In some case, worse results are obtained.

But I solved this issue by using differential filter.

This technique is proposed in [Hyvarinen, 1998].

We knew that the differential filter is very effective for ICA.

Jan. 31, 2012 26/28

Bibliography I

[Hyvarinen, 1998] Hyvarinen, A. (1998).Independent component analysis for time-dependent stochastic processes.

[Hyvarinen et al., 2001] Hyvarinen, A., Karhunen, J., and Oja, E. (2001).Independent Component Analysis.Wiley.

[Hyvarinen and Oja, 1997] Hyvarinen, A. and Oja, E. (1997).A fast fixed-point algorithm for independent component analysis.Neural Computation, 9:1483–1492.

Jan. 31, 2012 27/28

Thank you for listening

Jan. 31, 2012 28/28

Independent Component Analysis

Technology

Transcript of Independent Component Analysis