Principal Component Analysis and Independent Component Analysis in Neural Networks

40
Principal Component Analysis and Independent Component Analysis in Neural Networks David Gleich CS 152 – Neural Networks 6 November 2003

description

Principal Component Analysis and Independent Component Analysis in Neural Networks. David Gleich CS 152 – Neural Networks 6 November 2003. TLAs. TLA – Three Letter Acronym PCA – Principal Component Analysis ICA – Independent Component Analysis SVD – Singular-Value Decomposition. Outline. - PowerPoint PPT Presentation

Transcript of Principal Component Analysis and Independent Component Analysis in Neural Networks

  • Principal Component Analysis and Independent Component Analysis in Neural NetworksDavid GleichCS 152 Neural Networks6 November 2003

  • TLAsTLA Three Letter AcronymPCA Principal Component AnalysisICA Independent Component AnalysisSVD Singular-Value Decomposition

  • OutlinePrincipal Component AnalysisIntroductionLinear Algebra ApproachNeural Network ImplementationIndependent Component AnalysisIntroductionDemosNeural Network ImplementationsReferencesQuestions

  • Principal Component AnalysisPCA identifies an m dimensional explanation of n dimensional data where m < n.Originated as a statistical analysis technique. PCA attempts to minimize the reconstruction error under the following restrictionsLinear ReconstructionOrthogonal FactorsEquivalently, PCA attempts to maximize variance, proof coming.

  • PCA ApplicationsDimensionality Reduction (reduce a problem from n to m dimensions with m
  • PCA Example

  • PCA Example

  • PCA Example

  • Minimum Reconstruction Error ) Maximum Variance Proof from Diamantaras and KungTake a random vector x=[x1, x2, , xn]T with E{x} = 0, i.e. zero mean.Make the covariance matrix Rx = E{xxT}.Let y = Wx be a orthogonal, linear transformation of the data.WWT = IReconstruct the data through WT.

    Minimize the error.

  • Minimum Reconstruction Error ) Maximum Variancetr(WRxWT) is the variance of y

  • PCA: Linear AlgebraTheorem: Minimum Reconstruction, Maximum Variance achieved using W = [e1, e2, , em]T where ei is the ith eigenvector of Rx with eigenvalue i and the eigenvalues are sorted descendingly.Note that W is orthogonal.

  • PCA with Linear AlgebraGiven m signals of length n, construct the data matrix

    Then subtract the mean from each signal and compute the covariance matrixC = XXT.

  • PCA with Linear AlgebraUse the singular-value decomposition to find the eigenvalues and eigenvectors of C.USVT = CSince C is symmetric, U = V, andU = [e1, e2, , em]T where each eigenvector is a principal component of the data.

  • PCA with Neural NetworksMost PCA Neural Networks use some form of Hebbian learning.Adjust the strength of the connection between units A and B in proportion to the product of their simultaneous activations.wk+1 = wk + bk(yk xk) Applied directly, this equation is unstable.||wk||2 ! 1 as k ! 1Important Note: neural PCA algorithms are unsupervised.

  • PCA with Neural NetworksSimplest fix: normalization. wk+1 = wk + bk(yk xk) wk+1 = wk+1/||wk+1||2This update is equivalent to a power method to compute the dominant eigenvector and as k ! 1, wk ! e1.

  • PCA with Neural NetworksAnother fix: Ojas rule.Proposed in 1982 by Oja and Karhunen.

    wk+1 = wk + k(yk xk yk2 wk)This is a linearized version of the normalized Hebbian rule.Convergence, as k ! 1, wk ! e1.

  • PCA with Neural NetworksSubspace ModelAPEXMulti-layer auto-associative.

  • PCA with Neural NetworksSubspace Model: a multi-component extension of Ojas rule.

    Wk = k(ykxkT ykykTWk)Eventually W spans the same subspace as the top m principal eigenvectors. This method does not extract the exact eigenvectors.

  • PCA with Neural NetworksAPEX Model: Kung and Diamantaras

    y = Wx Cy , y = (I+C)-1Wx (I-C)Wx

  • PCA with Neural NetworksAPEX Learning

    Properties of APEX model:Exact principal componentsLocal updates, wab only depends on xa, xb, wab-Cy acts as an orthogonalization term

  • PCA with Neural NetworksMulti-layer networks: bottlenecks

    Train using auto-associative output.e = x yWL spans the subspace of the first m principal eigenvectors. WLWR

  • OutlinePrincipal Component AnalysisIntroductionLinear Algebra ApproachNeural Network ImplementationIndependent Component AnalysisIntroductionDemosNeural Network ImplementationsReferencesQuestions

  • Independent Component AnalysisAlso known as Blind Source Separation. Proposed for neuromimetic hardware in 1983 by Herault and Jutten.ICA seeks components that are independent in the statistical sense.Two variables x, y are statistically independent iff P(x y) = P(x)P(y).Equivalently, E{g(x)h(y)} E{g(x)}E{h(y)} = 0 where g and h are any functions.

  • Statistical IndependenceIn other words, if we know something about x, that should tell us nothing about y.

  • Statistical IndependenceIn other words, if we know something about x, that should tell us nothing about y.

    DependentIndependent

  • Independent Component AnalysisGiven m signals of length n, construct the data matrix

    We assume that X consists of m sources such that X = ASwhere A is an unknown m by m mixing matrix and S is m independent sources.

  • Independent Component AnalysisICA seeks to determine a matrix W such that Y = WXwhere W is an m by m matrix and Y is the set of independent source signals, i.e. the independent components.W A-1 ) Y = A-1AX = XNote that the components need not be orthogonal, but that the reconstruction is still linear.

  • ICA Example

  • ICA Example

  • PCA on this data?

  • Classic ICA ProblemThe Cocktail party. How to isolate a single conversation amidst the noisy environment. http://www.cnl.salk.edu/~tewon/Blind/blind_audio.htmlMic 1Mic 2Source 1Source 2

  • More ICA Examples

  • More ICA Examples

  • Notes on ICAICA cannot perfectly reconstruct the original signals. If X = AS then1) if AS = (AM-1)(MS) then we lose scale2) if AS = (AP-1)(PS) then we lose orderThus, we can reconstruct only without scale and order. Examples done with FastICA, a non-neural, fixed-point based algorithm.

  • Neural ICAICA is typically posed as an optimization problem. Many iterative solutions to optimization problems can be cast into a neural network.

  • Feed-Forward Neural ICAGeneral Network Structure

    Learn B such that y = Bx has independent components.Learn Q which minimizes the mean squared error reconstruction.xxyBQ

  • Neural ICAHerault-Jutten: local updatesB = (I+S)-1Sk+1 = Sk + bkg(yk)h(ykT)g = t, h = t3; g = hardlim, h = tansigBell and Sejnowski: information theoryBk+1 = Bk + k[Bk-T + zkxkT]z(i) = /u(i) u(i)/y(i)u = f(Bx); f = tansig, etc.

  • Recurrent Neural ICAAmari: Fully recurrent neural network with self-inhibitory connections.

  • ReferencesDiamantras, K.I. and S. Y. Kung. Principal Component Neural Networks. Comon, P. Independent Component Analysis, a new concept? In Signal Processing, vol. 36, pp. 287-314, 1994. FastICA, http://www.cis.hut.fi/projects/ica/fastica/Oursland, A., J. D. Paula, and N. Mahmood. Case Studies in Independent Component Analysis.Weingessel, A. An Analysis of Learning Algorithms in PCA and SVD Neural Networks. Karhunen, J. Neural Approaches to Independent Component Analysis. Amari, S., A. Cichocki, and H. H. Yang. Recurrent Neural Networks for Blind Separation of Sources.

  • Questions?