Principal Component Analysis and Independent Component Analysis in Neural Networks

Principal Component Analysis and Independent Component Analysis in Neural NetworksDavid GleichCS 152 Neural Networks6 November 2003

TLAsTLA Three Letter AcronymPCA Principal Component AnalysisICA Independent Component AnalysisSVD Singular-Value Decomposition

OutlinePrincipal Component AnalysisIntroductionLinear Algebra ApproachNeural Network ImplementationIndependent Component AnalysisIntroductionDemosNeural Network ImplementationsReferencesQuestions

Principal Component AnalysisPCA identifies an m dimensional explanation of n dimensional data where m < n.Originated as a statistical analysis technique. PCA attempts to minimize the reconstruction error under the following restrictionsLinear ReconstructionOrthogonal FactorsEquivalently, PCA attempts to maximize variance, proof coming.

PCA ApplicationsDimensionality Reduction (reduce a problem from n to m dimensions with m

PCA Example

Minimum Reconstruction Error ) Maximum Variance Proof from Diamantaras and KungTake a random vector x=[x1, x2, , xn]T with E{x} = 0, i.e. zero mean.Make the covariance matrix Rx = E{xxT}.Let y = Wx be a orthogonal, linear transformation of the data.WWT = IReconstruct the data through WT.

Minimize the error.

Minimum Reconstruction Error ) Maximum Variancetr(WRxWT) is the variance of y

PCA: Linear AlgebraTheorem: Minimum Reconstruction, Maximum Variance achieved using W = [e1, e2, , em]T where ei is the ith eigenvector of Rx with eigenvalue i and the eigenvalues are sorted descendingly.Note that W is orthogonal.

PCA with Linear AlgebraGiven m signals of length n, construct the data matrix

Then subtract the mean from each signal and compute the covariance matrixC = XXT.

PCA with Linear AlgebraUse the singular-value decomposition to find the eigenvalues and eigenvectors of C.USVT = CSince C is symmetric, U = V, andU = [e1, e2, , em]T where each eigenvector is a principal component of the data.

PCA with Neural NetworksMost PCA Neural Networks use some form of Hebbian learning.Adjust the strength of the connection between units A and B in proportion to the product of their simultaneous activations.wk+1 = wk + bk(yk xk) Applied directly, this equation is unstable.||wk||2 ! 1 as k ! 1Important Note: neural PCA algorithms are unsupervised.

PCA with Neural NetworksSimplest fix: normalization. wk+1 = wk + bk(yk xk) wk+1 = wk+1/||wk+1||2This update is equivalent to a power method to compute the dominant eigenvector and as k ! 1, wk ! e1.

PCA with Neural NetworksAnother fix: Ojas rule.Proposed in 1982 by Oja and Karhunen.

wk+1 = wk + k(yk xk yk2 wk)This is a linearized version of the normalized Hebbian rule.Convergence, as k ! 1, wk ! e1.

PCA with Neural NetworksSubspace ModelAPEXMulti-layer auto-associative.

PCA with Neural NetworksSubspace Model: a multi-component extension of Ojas rule.

Wk = k(ykxkT ykykTWk)Eventually W spans the same subspace as the top m principal eigenvectors. This method does not extract the exact eigenvectors.

PCA with Neural NetworksAPEX Model: Kung and Diamantaras

y = Wx Cy , y = (I+C)-1Wx (I-C)Wx

PCA with Neural NetworksAPEX Learning

Properties of APEX model:Exact principal componentsLocal updates, wab only depends on xa, xb, wab-Cy acts as an orthogonalization term

PCA with Neural NetworksMulti-layer networks: bottlenecks

Train using auto-associative output.e = x yWL spans the subspace of the first m principal eigenvectors. WLWR

OutlinePrincipal Component AnalysisIntroductionLinear Algebra ApproachNeural Network ImplementationIndependent Component AnalysisIntroductionDemosNeural Network ImplementationsReferencesQuestions

Independent Component AnalysisAlso known as Blind Source Separation. Proposed for neuromimetic hardware in 1983 by Herault and Jutten.ICA seeks components that are independent in the statistical sense.Two variables x, y are statistically independent iff P(x y) = P(x)P(y).Equivalently, E{g(x)h(y)} E{g(x)}E{h(y)} = 0 where g and h are any functions.

Statistical IndependenceIn other words, if we know something about x, that should tell us nothing about y.

Statistical IndependenceIn other words, if we know something about x, that should tell us nothing about y.

DependentIndependent

Independent Component AnalysisGiven m signals of length n, construct the data matrix

We assume that X consists of m sources such that X = ASwhere A is an unknown m by m mixing matrix and S is m independent sources.

Independent Component AnalysisICA seeks to determine a matrix W such that Y = WXwhere W is an m by m matrix and Y is the set of independent source signals, i.e. the independent components.W A-1 ) Y = A-1AX = XNote that the components need not be orthogonal, but that the reconstruction is still linear.

ICA Example

PCA on this data?

Classic ICA ProblemThe Cocktail party. How to isolate a single conversation amidst the noisy environment. http://www.cnl.salk.edu/~tewon/Blind/blind_audio.htmlMic 1Mic 2Source 1Source 2

More ICA Examples

Notes on ICAICA cannot perfectly reconstruct the original signals. If X = AS then1) if AS = (AM-1)(MS) then we lose scale2) if AS = (AP-1)(PS) then we lose orderThus, we can reconstruct only without scale and order. Examples done with FastICA, a non-neural, fixed-point based algorithm.

Neural ICAICA is typically posed as an optimization problem. Many iterative solutions to optimization problems can be cast into a neural network.

Feed-Forward Neural ICAGeneral Network Structure

Learn B such that y = Bx has independent components.Learn Q which minimizes the mean squared error reconstruction.xxyBQ

Neural ICAHerault-Jutten: local updatesB = (I+S)-1Sk+1 = Sk + bkg(yk)h(ykT)g = t, h = t3; g = hardlim, h = tansigBell and Sejnowski: information theoryBk+1 = Bk + k[Bk-T + zkxkT]z(i) = /u(i) u(i)/y(i)u = f(Bx); f = tansig, etc.

Recurrent Neural ICAAmari: Fully recurrent neural network with self-inhibitory connections.

ReferencesDiamantras, K.I. and S. Y. Kung. Principal Component Neural Networks. Comon, P. Independent Component Analysis, a new concept? In Signal Processing, vol. 36, pp. 287-314, 1994. FastICA, http://www.cis.hut.fi/projects/ica/fastica/Oursland, A., J. D. Paula, and N. Mahmood. Case Studies in Independent Component Analysis.Weingessel, A. An Analysis of Learning Algorithms in PCA and SVD Neural Networks. Karhunen, J. Neural Approaches to Independent Component Analysis. Amari, S., A. Cichocki, and H. H. Yang. Recurrent Neural Networks for Blind Separation of Sources.

Questions?

Principal Component Analysis and Independent Component Analysis in Neural Networks

Documents

Transcript of Principal Component Analysis and Independent Component Analysis in Neural Networks