Blind Source Separation by Independent Components Analysis

39
Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England [email protected]

description

Blind Source Separation by Independent Components Analysis. Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England [email protected]. The Problem . - PowerPoint PPT Presentation

Transcript of Blind Source Separation by Independent Components Analysis

Page 1: Blind Source Separation by Independent Components Analysis

Blind Source Separation by Independent Components

AnalysisProfessor Dr. Barrie W. Jervis

School of EngineeringSheffield Hallam University

England

[email protected]

Page 2: Blind Source Separation by Independent Components Analysis

The Problem

• Temporally independent unknown source signals are linearly mixed in an unknown system to produce a set of measured output signals.

• It is required to determine the source signals.

Page 3: Blind Source Separation by Independent Components Analysis

• Methods of solving this problem are known as Blind Source Separation (BSS) techniques.

• In this presentation the method of Independent Components Analysis (ICA) will be described.

• The arrangement is illustrated in the next slide.

Page 4: Blind Source Separation by Independent Components Analysis

Arrangement for BSS by ICA

s1

s2

:

sn

x1

x2

:

xn

u1

u2

:

un

y1=g1(u1)

y2=g2(u2)

:

yn=gn(un)

g(.)A W

Page 5: Blind Source Separation by Independent Components Analysis

Neural Network Interpretation• The si are the independent source signals,• A is the linear mixing matrix,• The xi are the measured signals,• W A-1 is the estimated unmixing matrix,• The ui are the estimated source signals or activations, i.e. ui

si,

• The gi(ui) are monotonic nonlinear functions (sigmoids, hyperbolic tangents),

• The yi are the network outputs.

Page 6: Blind Source Separation by Independent Components Analysis

Principles of Neural Network Approach

• Use Information Theory to derive an algorithm which minimises the mutual information between the outputs y=g(u).

• This minimises the mutual information between the source signal estimates, u, since g(u) introduces no dependencies.

• The different u are then temporally independent and are the estimated source signals.

Page 7: Blind Source Separation by Independent Components Analysis

Cautions I

• The magnitudes and signs of the estimated source signals are unreliable since– the magnitudes are not scaled– the signs are undefinedbecause magnitude and sign information is shared

between the source signal vector and the unmixing matrix, W.

• The order of the outputs is permutated compared wiith the inputs

Page 8: Blind Source Separation by Independent Components Analysis

Cautions II

• Similar overlapping source signals may not be properly extracted.

• If the number of output channels number of source signals, those source signals of lowest variance will not be extracted. This is a problem when these signals are important.

Page 9: Blind Source Separation by Independent Components Analysis

Information Theory I• If X is a vector of variables (messages) xi which

occur with probabilities P(xi), then the average information content of a stream of N messages is

N

ixPxPXH

12 )(log)()(

bits

and is known as the entropy of the random variable, X.

Page 10: Blind Source Separation by Independent Components Analysis

Information Theory II• Note that the entropy is expressible in terms

of probability.

• Given the probability density distribution (pdf) of X we can find the associated entropy.

• This link between entropy and pdf is of the greatest importance in ICA theory.

Page 11: Blind Source Separation by Independent Components Analysis

Information Theory III

• The joint entropy between two random variables X and Y is given by

yx

yxPyxPYXH,

2 ),(log),(),(

• For independent variables)()(),()()(),( yPxPyxPiffYHXHYXH

Page 12: Blind Source Separation by Independent Components Analysis

Information Theory IV

• The conditional entropy of Y given X measures the average uncertainty remaining about y when x is known, and is

)|(log),()|(,

2 xyPyxPXYHyx

• The mutual information between Y and X is)|()(),()()(),( XYHYHXYHXHYHXYI

• In ICA, X represents the measured signals, which are applied to the nonlinear function g(u) to obtain the outputs Y.

Page 13: Blind Source Separation by Independent Components Analysis

Bell and Sejnowski’s ICA Theory (1995) •Aim to maximise the amount of mutual information between the inputs X and the outputs Y of the neural network.

)|()(),( XYHYHXYI

(Uncertainty about Y when X is unknown)• Y is a function of W and g(u).

• Here we seek to determine the W which produces the ui si, assuming the correct g(u).

Page 14: Blind Source Separation by Independent Components Analysis

Differentiating:

)()|()(),( YHw

XYHw

YHww

XYI

(=0, since it did not come through W from X.)

So, maximising this mutual information is equivalent to maximising the joint output entropy,

),...()(....)(),...( 111 NNN yyIyHyHyyH which is seen to be equivalent to minimising the mutual information between the outputs and hence the ui, as desired.

Page 15: Blind Source Separation by Independent Components Analysis

The Functions g(u)

• The outputs yi are amplitude bounded random variables, and so the marginal entropies H(yi) are maximum when the yi are uniformly distributed - a known statistical result.

• With the H(yi) maximised, I(Y,X) = 0, and the yi uniformly distributed, the nonlinearity gi(ui) has the form of the cumulative distribution function of the probability density function of the si, - a proven result.

Page 16: Blind Source Separation by Independent Components Analysis

Pause and review g(u) and W

• W has to be chosen to maximise the joint output entropy H(Y,X), which minimises the mutual information between the estimated source signals, ui.

• The g(u) should be the cumulative distribution functions of the source signals, si.

• Determining the g(u) is a major problem.

Page 17: Blind Source Separation by Independent Components Analysis

One input and one output

• For a monotonic nonlinear function, g(x),

xyxfyf x

y

)()(

• Also dyyfyfyfEyH yyy )(ln)()(ln)(

• Substituting: )(lnln)( xfExyEyH x

(we only need to maximise this term) (independent of W)

Page 18: Blind Source Separation by Independent Components Analysis

• A stochastic gradient ascent learning rule is adopted to maximise H(y) by assuming

xy

wxy

xy

wwHw

1

ln

• Further progress requires knowledge of g(u). Assume for now, after Bell and Sejnowski, that g(u) is sigmoidal, i.e.

uey

1

1

• Also assume 0wwxu

Page 19: Blind Source Separation by Independent Components Analysis

Learning Rule: 1 input, 1 output

yw

yxw

w

ywxyyxy

w

ywyxy

21

and ,211: whichfrom

2111

1

0

Hence, we find:

Page 20: Blind Source Separation by Independent Components Analysis

Learning Rule: N inputs, N outputs

• Need

N

Nn

N

xy

xy

xy

xy

J

JJff

1

1

1

1

det

Jacobian the is where

,)( xy xy

• Assuming g(u) is sigmoidal again, we obtain:

Page 21: Blind Source Separation by Independent Components Analysis

vector unit the is where ,2

and ,21

0

1

1y1w

xyWW

TT

• The network is trained until the changes in the weights become acceptably small at each iteration.

• Thus the unmixing matrix W is found.

Page 22: Blind Source Separation by Independent Components Analysis

The Natural Gradient 1TW• The computation of the inverse matrix

is time-consuming, and may be avoided by rescaling the entropy gradient by multiplying it by)( 1WW T

• Thus, for a sigmoidal g(u) we obtain

Wuy11WyW TH )2()(

• This is the natural gradient, introduced by Amari (1998), and now widely adopted.

Page 23: Blind Source Separation by Independent Components Analysis

The nonlinearity, g(u)

• We have already learnt that the g(u) should be the cumulative probability densities of the individual source distributions.

• So far the g(u) have been assumed to be sigmoidal, so what are the pdfs of the si?

• The corresponding pdfs of the si are super-Gaussian.

Page 24: Blind Source Separation by Independent Components Analysis

Super- and sub-Gaussian pdfs

Gaussian

Super-Gaussian

Sub-Gaussian

is

iP s

* Note: there are no mathematical definitions of super- and sub-Gaussians

Page 25: Blind Source Separation by Independent Components Analysis

Super- and sub-Gaussians

Super-Gaussians: •kurtosis (fourth order central moment, measures the flatness of the pdf) > 0.•infrequent signals of short duration, e.g. evoked brain signals.

Sub-Gaussians •kurtosis < 0 •signals mainly “on”, e.g. 50/60 Hz electrical mains supply, but also eye blinks.

Page 26: Blind Source Separation by Independent Components Analysis

Kurtosis• Kurtosis = 4th order central moment =

3)4( 22

44

i

ixx

uEuEmxEm

and is seen to be calculated from the current estimates of the source signals.

• To separate the independent sources, information about their pdfs such as skewness (3rd. moment) and flatness (kurtosis) is required.

• First and 2nd. moments (mean and variance) are insufficient.

Page 27: Blind Source Separation by Independent Components Analysis

A more generalised learning rule• Girolami (1997) showed that tanh(ui) and -tanh(ui)

could be used for super- and sub-Gaussians respectively.

• Cardoso and Laheld (1996) developed a stability analysis to determine whether the source signals were to be considered super- or sub-Gaussian.

• Lee, Girolami, and Sejnowski (1998) applied these findings to develop their extended infomax algorithm for super- and sub-Gaussians using a kurtosis-based switching rule.

Page 28: Blind Source Separation by Independent Components Analysis

Extended Infomax Learning Rule

• With super-Gaussians modelled as )(sec)1,0()( 2 uhNup

and sub-Gaussians as a Pearson mixture model

22 ,,21)( NNup

the new extended learning rule is

Gaussian.sub,1Gaussian,super,1

,tanh

i

i

TT

kk

WuuuuK1W

Page 29: Blind Source Separation by Independent Components Analysis

Switching Decision

and the ki are the elements of the N-dimensional diagonal matrix, K, and

Gaussian.sub,1Gaussian,super,1

,tanh

i

i

TT

kk

WuuuuK1W

iiiii uuEuEuhEsignk )tanh()(sec 22

• Modifications of the formula for ki exist, but in our experience the extended algorithm has been unsatisfactory.

Page 30: Blind Source Separation by Independent Components Analysis

Reasons for unsatisfactory extended algorithm1) Initial assumptions about super- and sub-Gaussian

distributions may be too inaccurate.2) The switching criterion may be inadequate.

Alternatives• Postulate vague distributions for the source signals which are then developed iteratively during training.

• Use an alternative approach, e.g, statistically based, JADE (Cardoso).

Page 31: Blind Source Separation by Independent Components Analysis

Summary so far

• We have seen how W may be obtained by training the network, and the extended algorithm for switching between super- and sub-Gaussians has been described.

• Alternative approaches have been mentioned.• Next we consider how to obtain the source signals

knowing W and the measured signals, x.

Page 32: Blind Source Separation by Independent Components Analysis

Source signal determination• The system is:

si

unknown

xi

measured

uisi

estimated

yi

Mixing matrix A

Unmixing matrix W g(u)

•Hence U=W.x and x=A.S where AW-1, and US.

•The rows of U are the estimated source signals, known as activations (as functions of time).

•The rows of x are the time-varying measured signals.

Page 33: Blind Source Separation by Independent Components Analysis

Source Signals

NxMMxMNxMxWU .

MNM

N

MMN

M

MNM

N

xx

xxx

ww

www

uu

uuu

1

11211

1

11211

1

11211Channel number

Time, or sample number

Page 34: Blind Source Separation by Independent Components Analysis

Expressions for the Activations

212212121112

112112111111

MM

MM

xwxwxwuxwxwxwu

• We see that consecutive values of u are obtained by filtering consecutive columns of x by the same row of W.

1231232122112121 NNxwxwxwxwu

• The ith row of u is the ith row of w by the columns of x.

Page 35: Blind Source Separation by Independent Components Analysis

Procedure• Record N time points from each of M sensors, where N

5M.• Pre-process the data, e.g. filtering, trend removal.• Sphere the data using Principal Components Analysis

(PCA). This is not essential but speeds up the computation by first removing first and second order moments.

• Compute the ui si. Include desphering.• Analyse the results.

Page 36: Blind Source Separation by Independent Components Analysis

Optional Procedures I• The contribution of each activation at a

sensor may be found by “back-projecting” it to the sensor.

sWx 1

NM

M

NM

M

swxswx

swwswwx

sssswwwwww

21

222221

2222

211

221

2211

221

2121

22322211

21

221

21

11

112

111

,.

.0..0.

0000

0000

x

Page 37: Blind Source Separation by Independent Components Analysis

Optional Procedures II• A measured signal which is contaminated by artefacts or

noise may be extracted by “back-projecting” all the signal activations to the measurement electrode, setting other activations to zero. (An artefact and noise removal method).

NNN

M

N

N

M

M

swswxswswxswswwswswx

ssssssss

wwwwww

21

2211

212221

22121

2122

211

22111

211

2211

22111

2121

2232221

11312111

21

221

21

11

112

111

..,..

..0...

0000

x

Page 38: Blind Source Separation by Independent Components Analysis

Current Developments

• Overcomplete representations - more signal sources than sensors.

• Nonlinear mixing.• Nonstationary sources.• General formulation of g(u).

Page 39: Blind Source Separation by Independent Components Analysis

Conclusions

• It has been shown how to extract temporally independent unknown source signals from their linear mixtures at the outputs of an unknown system using Independent Components Analysis.

• Some of the limitations of the method have been mentioned.

• Current developments have been highlighted.