Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical...

Functional Brain Signal Processing: EEG & fMRI

Lesson 7

Kaushik Majumdar

Indian Statistical Institute Bangalore Center

[email protected]

M.Tech. (CS), Semester III, Course B50

EEG Coherence Measures

Cross-correlation. Covariance:

( , ) (( ( ))( ( )))x y E x E x y E y

EEG Feature Extraction

Features of EEG signals can be in myriad different forms, such as:

Amplitude Phase Fourier coefficients Wavelet coefficients, etc.

Two Most Fundamental Aspects of Machine Learning

Differentiation: decomposing the data into features, and

Integration: classification of those features.

Fisher’s Discriminant

Duda, Hart & Stork, 2006

Fisher’s Discriminant (cont.)

1 1 11 12 1

2 2 21 22 2

1 2

. . . .

. . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . .

T T

d

d

n d n n nd

y w x x x

y w x x x

y w x x x

There are n d-dimensional data vectors x1, ….., xn, out of which n1 vectors belong to a set D1 and n2 vectors belong to another set D2. n1 + n2 =n. w is a d-dimensional weight vector such that ||w|| = 1. That is w can apply rotation only. The rotation will have to be such that D1 and D2 are optimally separable by a projection on a straight line in the d-dimensional space.


Sample mean is an unbiased estimate of the population mean. So difference in mean ensures difference in population.

.

.

1

Tj i

Ti j

Din

x

m x

.

.

1 1

Tj i j i

T T Ti j j i

y Y Di i

m yn n

x

w x w m


1 2 1 2( )Tm m w m m

2 2( )i

i iy Y

s y m

Fisher’s discriminant employs that particular value of the expression for which the criterion function

Tw x

2

1 22 21 2

( )m m

Js s

w

is to be maximized.

D1 D2


( )( )i

Ti i i

D

x

S x m x m and1 2w S S S

Since , , {1,2}Tiy D i w x x and

2 2( )i

i iy Y

s y m

2 2( )

( )( )

i

i

T Ti i

D

Ti

T Ti i

D

s

x

x

w x w m

w S w

w x m x m w

Let us define


2 21 2 1 2

T T Ts s ww S w w S w w S wSimilarly 2 2

1 2 1 2

1 2 1 2

( ) ( )

( )( )

T T

T T

TB

m m

w m w m

w m m m m w

w S w

where

1 2 1 2( )( )TB S m m m m

Sw is called within class scatter matrix and SB is called between class scatter matrix.


( )T

BT

J w

w S ww

w S wJ(w) is always a scalar quantity and therefore

( )B f wS w S must hold for a scalar valued function f of a vector variable w, because wT(SB – f(w)Sw)w = 0.

Clearly, maximum f(w) will make J(w) maximum. Let maximum f(w) = . Then we can write

B wS w S w where w is the vector for which J(w) is maximum.

SBw is in direction of m1 – m2 (elaborated in the next slide). Also scale of w does not matter, only direction does. So we can write


1 2 wS w m mor

11 2( ) ww S m m

Note that1 2 1 2

1 2 1 2

( )( )

( ){( ) }

TB

T

S w m m m m w

m m m m wHere all vectors are by default column vector, if not stated otherwise. So, all transpose operations give row vectors. (m1 – m2)T is a row vector and w is a column vector. Therefore the value within the second bracket above is a scalar. That is SBw = (m1 – m2)s, where s is a scalar. This implies SBw is in the direction of m1 – m2.

Dimensionality Reduction by Fisher’s Discriminant

From we get , where

is a d-dimensional identity matrix. and

are d-dimensional square matrices. For the purpose of classification (or pattern

recognition) we only need those eigenvectors

of whose associated eigenvalues are large enough. The rest of the vectors (and therefore dimensions) we can ignore.

B wS w S w 1B I wS S

I BS

wS

1B

wS S

Logistic Regression

Ty b w x

1 1( ; , )

1 exp( ) 1 exp( )Tp b

y b

x w

w x

Logistic Regression (cont.)

Parra et al., NeuroImage, 22: 342 – 452, 2005

p(y)

1 - p(y)

Logistic Regression vs. Fisher’s Discriminant

Theoretically it has been shown that logistic regression is shown to be between one half and two thirds as effective as normal discrimination for statistically interesting values of parameters (B. Effron, The efficiency of logistic regression compared to normal discriminant analysis, JASA (1975) 892-898).


1

( ) ( )D

j ij i ji

y t w x t b

exp( ( ))

1 exp( ( ))j

jj

y tp

y t

1

11 exp( ( ))j

j

py t

1

N

jj

p to be maximized, N is number of data points


11

exp( ( ))( ,......, , ) log

1 exp( ( ))

Nj

Dj j

y tL w w b

y t

Note that is a monotonically increasing function and so any set which increases will lead us closer to the optimal value of . Even if we take and the end result for EEG signal separation for target and non-target or for different targets will almost be similar to the case when a convergence technique for as described is followed. The two classes of data will be separated by the hyperplane normal to and the perpendicular distance of the hyperplane from origin is . In other words the equation of the hyperplane is .

)exp(1

)exp(

x

x

Logistic Regression vs. Fisher’s Discriminant

FD projects the multidimensional data on a line, whose orientation is such that the separation of the projected data becomes maximum on that line.

LR assigns probability distribution to the two different data sets in a way that the distribution approaches 1 on one class and 0 on another, exponentially fast.

This makes LR a better separator or classifier than FD.

References

R. Q. Quiroga, A. Kraskov, T. Kreuz and P. Grassberger, On performance of differnet synchronization measures in real data: a case study on EEG signals, Phys. Rev. E, 65(4): 041903, 2002.

R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 4e, John Wiley & Sons, New York, 2007, p. 117 – 121.

THANK YOU

This lecture is available at http://www.isibang.ac.in/~kaushik

Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical...

Documents

Transcript of Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical...