Survey on ICA Technical Report, Aapo Hyvärinen, 1999. jagota/NCS.
-
Upload
charlene-lewis -
Category
Documents
-
view
215 -
download
0
Transcript of Survey on ICA Technical Report, Aapo Hyvärinen, 1999. jagota/NCS.
• 2nd-order methods• PCA / factor analysis
• Higher order methods• Projection pursuit / Blind deconvolution
• ICA • definitions• criteria for identifiability• relations to other methods• Applications
• Contrast functions• Algorithms
Outline
x = As + n
General model
Observations
Mixing matrix Noise
Latent variables, factors, independent components
Principal component analysis
• Find direction(s) where variance of wTx is maximized.
• Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues
• Closely related to PCA• x = As + n• Method of principal factors:
– Assumes knowledge of covariance matrix of the noise: E(nnT)
– PCA on: C = E(xxT)– E(nnT)
• Factors are not defined uniquely, but only up to a rotation
Factor analysis
• Projection pursuit• Redundancy reduction• Blind deconvolution
Requires assumption that data are not Gaussian
Higher order methods
• Find direction w, such that wTx has an ’interesting’ distribution
• Argued that interesting directions are those that show the least Gaussian distribution
Projection pursuit
Differential entropy
• Maximised when f is a Gaussian density• Minimize H(wTx) to find projection pursuit
directions (y = wTx)• Difficult to estimate the density of wTx
• Observe filtered version of s(t):
x(t) = s(t)*g(t)
• Find filter h(t), such that
s(t) = h(t)*x(t)
Blind deconvolution
Definition 1 (General definition)
ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence.
ICA definitions
Definition 2 (Noisy ICA)
ICA of a random vector x consists of estimating the following model for the data:
x = As + nwhere the latent variables si are assumed independent
Definition 3 (Noise-free ICA) x = As
ICA definitions
• ICA requires statistical independence• Distinguish between statistically independent
and uncorrelated variables• Statistically independent:
• Uncorrelated:
Statistical independence
• All the independent components, but one, must be non-Gaussian
• The number of observed mixtures must be at least as large the number of independent components, m >= n
• The matrix A must be of full column rank
Note: with m < n, A may still be indentifiable
Identifiability of ICA model
• Redundancy reduction• Noise free case
– Find ’interesting’ projections– Special case of projection pursuit
• Blind deconvolution• Factor analysis for non-Gaussian data• Related to non-linear PCA
Relations to other methods
• Blind source separation– Cocktail party problem
• Feature extraction• Blind deconvolution
Applications of ICA
ICA method = Objective function + Optimization algorithm
Objective (contrast) functions
• Multi-unit contrast functions– Find all independent components
• One-unit contrast functions– Find one independent component (at a time)
Mutual information
• Mutual information is zero if the yi are independent
• Difficult to estimate, approximations exist
• Find one vector, w, so that wTx equals one of the independent components, si
• Related to projection pursuit• Prior knowledge of number of independent
components not needed
One-unit contrast functions
• Difference between differential entropy of y and differential entropy of Gaussian variable with same variance
Negentropy
• If the yi are uncorrelated, the mutual information can be expressed as
• J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers
• Have x=As, want to find s=Wx• Preprocessing
– Centering of x– Sphering (whitening) of x
• Find transformation; v=Qx such that E(vvT)=I• Found via PCA / SVD
• Sphering does not solve problem alone
Algorithms
• Jutten-Herault– Cancel non-linear cross-correlations– Non-diagonal terms of W are updated according to
Algorithms (2)
– The yi are updated iteratively as y = (I+W)-1x
• Non-linear decorrelation
• Non-linear PCA• FastICA, ..., etc.
• Definitions of ICA• Conditions for identifiability of model• Relations to other methods• Contrast functions
– One-unit / multi-unit– Mutual information / Negentropy
• Applications of ICA• Algorithms
Summary