Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent...
Transcript of Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent...
![Page 1: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/1.jpg)
Independent Component AnalysisIndependent Component Analysis
![Page 2: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/2.jpg)
PR , ANN, & ML 2
Mixture Data
� Data that are mingled from multiple sources
� May not know how many sources
� May not know the mixing mechanism
� Good Representation
� Uncorrelated, information-bearing components
� PCA and Fisher’s linear discriminant
� De-mixing or separation
� ICA (Independent component analysis)
� How do they differ?
![Page 3: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/3.jpg)
PR , ANN, & ML 3
PCA vs. ICA� Independent events vs. Uncorrelated events
Knowing X1 doesn’t tell anything about X2
Knowing X1 does tell something about X2
x1
x2 x2
![Page 4: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/4.jpg)
PR , ANN, & ML 4
Uncorrelated vs. Independence
� Uncorrelated
� Global property
� Not valid under
nonlinear transform
� PCA requires
uncorrelation
� Independence
� Local property
� Valid for nonlinear
transform
� ICA assumes
independence
0)))(((:
))(())(())(,),(),((:
2211
112211
=−−
∀=
ExxExxEeduncorrelat
gxgExgExgxgxgEceindependen nnnn LL
![Page 5: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/5.jpg)
PR , ANN, & ML 5
Uncorrelated vs. Independence
� Independence is stronger, requiring every
possible function of x1 to be uncorrelated
with x2
� E((y1-E(y1))(y2-E(y2))=0 -> uncorrelated
� y2= y12 -> not independent
![Page 6: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/6.jpg)
PR , ANN, & ML 6
Uncorrelated vs. Independence
� Discrete variables X1 and X2
� (0,1), (0,-1),(1,0),(-1,0) all with ¼
probability
� X1 and X2 are uncorrelated
� E(x12x22)=0!=1/4=E(x12)E(x22)
![Page 7: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/7.jpg)
PR , ANN, & ML 7
ICA Limitation� Any symmetrical distribution of x1 and x2
around origin (centered at Ex1 and Ex2) is
uncorrelated
� Corollary: ICA does not apply to Gaussian
variables
� Because any orthogonal transform (rotation and
reflection) of Gaussian doesn’t change anything
![Page 8: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/8.jpg)
PR , ANN, & ML 8
Blind Source Separation
![Page 9: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/9.jpg)
PR , ANN, & ML 9
Blind Source Separation
� Brain imaging
� Different parts of brain emit signals that are mixed up in the sensors outside the bead
� Teleconferencing
� Different speakers talk at the same time that are mixed up in the microphones
� Geology
� Oil exploration with underground detonation and shock waves being registered at multiple sensors
![Page 10: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/10.jpg)
PR , ANN, & ML 10
Approaches
� Nonlinear de-correlation
� The de-correlated components are uncorrelated and the transformed de-correlated components are uncorrelated
� Minimum mutual information model
� Maximum non-Gaussianity
� Maximum non-Gaussianity
� Central limit theorem states more Gaussianitywith successive mixture
� Go above covariance matrix (kurtosis, a higher-order cumulant)
![Page 11: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/11.jpg)
PR , ANN, & ML 11
Mathematic Formulation
� si: sources, xj: mixtures
� A: mixture matrix
� W: de-mixing matrix
� Implication
� Cannot determine the variance of sources
� Cannot determine the ordering of source
![Page 12: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/12.jpg)
PR , ANN, & ML 12
A Simple Formulation
� Central Limit Theorem states that sum of
independent random variables tends to
Gaussian
� Non-Gaussianity is desired for each
independent component
![Page 13: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/13.jpg)
PR , ANN, & ML 13
A Simple Formulation
� Gaussian variables have zero Kurtosis
� Supergaussian: spiky pdf with heavy tails
(e.g., Laplace distribution)
� Subgaussian: flat pdf (e.g., uniform)
� Maximize magnitude of the Kurtosis
1)(3)())((3)()( 24224=−=−= xEifxExExExkurt
||2
2
1)( x
exp−
=
||2
2
1)( x
exp−
=
![Page 14: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/14.jpg)
PR , ANN, & ML 14
Math Framework:
2 variables 2 observations
� All variables, s and y, are of unit variance
� Z is constrained to the unit circle
� Maximum kurtosis at two directions that lie in � z1=1 (-1), z2=0 or
� z2=1 (-1) z1=0
� Through gradient search in w
� Drawback: noise sensitivity
)()(
)()()(
:st variableindependenFor
1
4
1
2121
xkurtaaxkurt
xkurtxkurtxxkurt
=
+=+
![Page 15: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/15.jpg)
PR , ANN, & ML 15
Information
� Recall some important concepts
� Random variable (x)
� Probability distribution on a random variable
� Amount of information, surprise, uncertainty
� Entropy (weighted, average)
1)(0 ≤==≤ kk xxpp
k
k
k pp
I log)1
log()( −=== xx
∑∑ −===k
kk
k
kkk ppxIpxIEH log)())(()(x
![Page 16: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/16.jpg)
PR , ANN, & ML 16
Entropy Basics
H(x)
H(y)
I(x;y)
H(x|y)
H(y|x)
H(x;y)
H[X,Y] = H[Y] + H[X|Y]
![Page 17: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/17.jpg)
PR , ANN, & ML 17
Mutual Information
H(x)
H(y)
I(x;y)
H(x|y)
H(y|x)
H(x;y)
![Page 18: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/18.jpg)
PR , ANN, & ML 18
Kullback-Leibler divergence
� Information divergence, relative entropy
� Measure of difference between two distributions, but it is not a metric
� Dp||q is positive and is zero if and only if p and q have the same distribution
� Can be a useful measurement of independence, if� p is joint probability
� q is marginal probability
� Then Dp||q is zero if and only if random variables are independent
� p = p(x,y) and q=p(x)p(y), the same as saying that x and y are independent
∑∑∑ −=+−==k
kk
k
kk
k k
kkqp pHqpHppqp
q
ppD )(),(logloglog)(|| x
)()( |||| xx pqqp DD ≠
![Page 19: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/19.jpg)
PR , ANN, & ML 19
Intuition
� Independence implies product of marginal
probabilities equals total probability
� The Kullback-Leibler divergence should be
minimized
)()(),,,(
))(())(())(,),(),((
121
112211
nn
nnnn
xpxpxxxp
xgpxgpxgxgxgp
LL
LL
=
=
∑∏ =
=
==k
i
ky
pp
iip
ppD
ky
kyyylog
~||
∑∏ =
=
==k
i
kyg
g
gpp
ii
gg p
ppD
)(
)(
)(|| log)~()(
ky
kyyy
![Page 20: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/20.jpg)
PR , ANN, & ML 20
Math Details
� A should minimize the mutual information
between the new signal H(Yi) and the
original signal H(X)
)()(
)log(det)()()(
)()()(
XHYH
AXHYHYI
AXY
XHXHXI
i
i
i
i
i
i
−=
−−=
=
−=
∑
∑
∑
![Page 21: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/21.jpg)
PR , ANN, & ML 21
Information Theoretic Approach
� Gaussian variable has the largest entropy among
all variables of equal variance
� Negentropy (non-Gaussianality) J is to be
maximized (Xgauss and X have the same variance)
� J(X) = H(Xgauss)-H(X)
� Difficulty: computing H requires pdf
� Estimation:
223)(
48
1)(
12
1)( xkurtxExJ +≈
![Page 22: Independent Component Analysis - UCSByfwang/courses/cs290i_prann/pdf/ica.pdf · Independent Component Analysis. PR , ANN, & ML 2 Mixture Data Data that are mingled from multiple sources](https://reader030.fdocuments.net/reader030/viewer/2022040410/5ecee4019648e02c7b7f94fe/html5/thumbnails/22.jpg)
PR , ANN, & ML 22
Maximum Entropy Approach
∏ ==
d
i i txptp1
))(())((x ∏ ==
d
i i txptp1
))(())((x
∏ ==
d
i i txptp1
))(())((x
∏ ==
d
i i txptp1
))(())((x
||
))(())((
)()(
J
sy
Wsy
tptp
tt
sy =
=)()( tt Axs =