Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical...
-
Upload
eileen-oconnor -
Category
Documents
-
view
217 -
download
0
Transcript of Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical...
Functional Brain Signal Processing: EEG & fMRI
Lesson 7
Kaushik Majumdar
Indian Statistical Institute Bangalore Center
M.Tech. (CS), Semester III, Course B50
EEG Coherence Measures
Cross-correlation. Covariance:
( , ) (( ( ))( ( )))x y E x E x y E y
EEG Feature Extraction
Features of EEG signals can be in myriad different forms, such as:
Amplitude Phase Fourier coefficients Wavelet coefficients, etc.
Two Most Fundamental Aspects of Machine Learning
Differentiation: decomposing the data into features, and
Integration: classification of those features.
Fisher’s Discriminant
Duda, Hart & Stork, 2006
Fisher’s Discriminant (cont.)
1 1 11 12 1
2 2 21 22 2
1 2
. . . .
. . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . .
T T
d
d
n d n n nd
y w x x x
y w x x x
y w x x x
There are n d-dimensional data vectors x1, ….., xn, out of which n1 vectors belong to a set D1 and n2 vectors belong to another set D2. n1 + n2 =n. w is a d-dimensional weight vector such that ||w|| = 1. That is w can apply rotation only. The rotation will have to be such that D1 and D2 are optimally separable by a projection on a straight line in the d-dimensional space.
Fisher’s Discriminant (cont.)
Sample mean is an unbiased estimate of the population mean. So difference in mean ensures difference in population.
.
.
1
Tj i
Ti j
Din
x
m x
.
.
1 1
Tj i j i
T T Ti j j i
y Y Di i
m yn n
x
w x w m
Fisher’s Discriminant (cont.)
1 2 1 2( )Tm m w m m
2 2( )i
i iy Y
s y m
Fisher’s discriminant employs that particular value of the expression for which the criterion function
Tw x
2
1 22 21 2
( )m m
Js s
w
is to be maximized.
D1 D2
Fisher’s Discriminant (cont.)
( )( )i
Ti i i
D
x
S x m x m and1 2w S S S
Since , , {1,2}Tiy D i w x x and
2 2( )i
i iy Y
s y m
2 2( )
( )( )
i
i
T Ti i
D
Ti
T Ti i
D
s
x
x
w x w m
w S w
w x m x m w
Let us define
Fisher’s Discriminant (cont.)
2 21 2 1 2
T T Ts s ww S w w S w w S wSimilarly 2 2
1 2 1 2
1 2 1 2
( ) ( )
( )( )
T T
T T
TB
m m
w m w m
w m m m m w
w S w
where
1 2 1 2( )( )TB S m m m m
Sw is called within class scatter matrix and SB is called between class scatter matrix.
Fisher’s Discriminant (cont.)
( )T
BT
J w
w S ww
w S wJ(w) is always a scalar quantity and therefore
( )B f wS w S must hold for a scalar valued function f of a vector variable w, because wT(SB – f(w)Sw)w = 0.
Clearly, maximum f(w) will make J(w) maximum. Let maximum f(w) = . Then we can write
B wS w S w where w is the vector for which J(w) is maximum.
SBw is in direction of m1 – m2 (elaborated in the next slide). Also scale of w does not matter, only direction does. So we can write
Fisher’s Discriminant (cont.)
1 2 wS w m mor
11 2( ) ww S m m
Note that1 2 1 2
1 2 1 2
( )( )
( ){( ) }
TB
T
S w m m m m w
m m m m wHere all vectors are by default column vector, if not stated otherwise. So, all transpose operations give row vectors. (m1 – m2)T is a row vector and w is a column vector. Therefore the value within the second bracket above is a scalar. That is SBw = (m1 – m2)s, where s is a scalar. This implies SBw is in the direction of m1 – m2.
Dimensionality Reduction by Fisher’s Discriminant
From we get , where
is a d-dimensional identity matrix. and
are d-dimensional square matrices. For the purpose of classification (or pattern
recognition) we only need those eigenvectors
of whose associated eigenvalues are large enough. The rest of the vectors (and therefore dimensions) we can ignore.
B wS w S w 1B I wS S
I BS
wS
1B
wS S
Logistic Regression
Ty b w x
1 1( ; , )
1 exp( ) 1 exp( )Tp b
y b
x w
w x
Logistic Regression (cont.)
Parra et al., NeuroImage, 22: 342 – 452, 2005
p(y)
1 - p(y)
Logistic Regression vs. Fisher’s Discriminant
Theoretically it has been shown that logistic regression is shown to be between one half and two thirds as effective as normal discrimination for statistically interesting values of parameters (B. Effron, The efficiency of logistic regression compared to normal discriminant analysis, JASA (1975) 892-898).
Logistic Regression (cont.)
1
( ) ( )D
j ij i ji
y t w x t b
exp( ( ))
1 exp( ( ))j
jj
y tp
y t
1
11 exp( ( ))j
j
py t
1
N
jj
p to be maximized, N is number of data points
Logistic Regression (cont.)
11
exp( ( ))( ,......, , ) log
1 exp( ( ))
Nj
Dj j
y tL w w b
y t
Note that is a monotonically increasing function and so any set which increases will lead us closer to the optimal value of . Even if we take and the end result for EEG signal separation for target and non-target or for different targets will almost be similar to the case when a convergence technique for as described is followed. The two classes of data will be separated by the hyperplane normal to and the perpendicular distance of the hyperplane from origin is . In other words the equation of the hyperplane is .
)exp(1
)exp(
x
x
Logistic Regression vs. Fisher’s Discriminant
FD projects the multidimensional data on a line, whose orientation is such that the separation of the projected data becomes maximum on that line.
LR assigns probability distribution to the two different data sets in a way that the distribution approaches 1 on one class and 0 on another, exponentially fast.
This makes LR a better separator or classifier than FD.
References
R. Q. Quiroga, A. Kraskov, T. Kreuz and P. Grassberger, On performance of differnet synchronization measures in real data: a case study on EEG signals, Phys. Rev. E, 65(4): 041903, 2002.
R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 4e, John Wiley & Sons, New York, 2007, p. 117 – 121.
THANK YOU
This lecture is available at http://www.isibang.ac.in/~kaushik