Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...

Principal Component Analysis (PCA)

COMP61021 Modelling and Visualization of High Dimensional Data

Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page.

Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2]

COMP61021 Modelling and Visualization of High Dimensional Data2

Outline• Introduction • Principle• Algorithms• Exemplar Applications• Relevant Issues• Conclusion


Introduction• Principal component analysis (PCA)

– A method for high-dimensional data analysis via redundancy reduction– Identifying an “optimal” low-dimensional linear projection; maximum data

variance in the new space– Useful for data visualization, compression and feature extraction

PCA finds a new “coordinate system”of maximum data variance

Projection to principal axis to lead to a new low-dimensional representation

https://i.stack.imgur.com/Q7HIP.gif

https://i.stack.imgur.com/Q7HIP.gif


Principle• Finding 1st principal component

Given a data set of N data points in a d-dimensional space, },,{ 1 NxxX ⋅⋅⋅=


Principle• Finding 1st principal component (cont.)


Principle

• General formulationWe want to find M (M < d) principal components. So we need

,if jiji >λ<λ

otherwise) 0 and if 1( jiδij ==


Principle• Data reconstruction after dimension reduction

From a “compressed” data point in the M-dimensional PCA space (M < d ), we can reconstruct the data point in the original d-dimensional space


Principle• Perspective of minimizing reconstruction errors

Also PCA formulated from a perspective of minimizing reconstruction errors


Principle• Dual PCA Idea

For a d x N (d >>N) matrix, X, dimensionality of this linear space < N– is d x d matrix; it is often computationally infeasible in

solving its eigenvalue problem– is N x N matrix and hence eigenvalue problem is

solvable.– Fortunately, we can prove S and S’ share the same eigenvalues!– If we achieve an eigenvector of S’, we can use it to produce the

corresponding eigenvector of S

XXTNS 1−=′

TNS XX1−=

. eigenvaule theshares which, ofr eigenvecto ingcorrespond its is and ofr eigenvecto an is where

1

i

ii

ii

i

SS'

N

λ

λuv

Xvu =


Principle• Singular Value Decomposition (SVD)

For a d x N matrix, , it can be decomposed into the following form:

– U is a d x d orthogonal matrix, column i is the ith eigenvector of

– is an d x N “diagonal” matrix,

– is an N x N orthogonal matrix, column i is the ith eigenvector

• Link to PCA– If we make all data centralized by subtracting the mean,

is the covariance matrix of X– Column i in U corresponds to the ith principal component– Properties of SVD allow us to deal with a high dimensional data set of few data

points (i.e., d >> N) as it does not use a covariance matrix directly.

TVUX Σ=TXX

Σ jijjiiiii <≥= if , σσλσ

NT /XX

X

XXTTV

http://en.wikipedia.org/wiki/Singular_value_decomposition


Basic Algorithm • Data Centralization

For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by

• Eigenanalysiscalculate , finding out all d eigenvaules, ranking them so that

, and their corresponding eigenvectors, • Finding principal components

Selecting top M (M < d) largest eigenvectors of S to form a project matrix

• Encoding data pointz is a M-dimensional vector encoding a data point x.

• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.

.X̂

NT /ˆˆ XXS =

MMM λλ ≥⋅⋅⋅≥⋅⋅⋅= 11 ,],,[ uuU

)( xxUz −= TM

x

zUxx M+=′

)( , , NdNd <× X

dλ≥⋅⋅⋅≥λ1 .,,1 duu ⋅⋅⋅

x′


Dual Algorithm • Data Centralization

For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by

• SVD Procedurecalculate and applying the SVD to Y. Then we achieve a d x d matrix (i.e., ).

• Finding principal componentsSelecting first M (M < d) columns of V to form a project matrix

• Encoding data pointz is a M-dimensional vector encoding a data point x.

• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.

.X̂

],,[ 1TM

TM vvU ⋅⋅⋅=

)( xxUz −= TM

x

zUxx M+=′

)( , , NdNd ≥× X

x′

NT /X̂Y =TVUY Σ=TV


Examples• Example 1: Synthetic data


Examples• Example 2: visualization of high-dimensional data

PCA application to visualization of microarray data

Examples• Example 3: data compression

A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels


Examples• Example 3: data compression

A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels


Originalimages

Principalcomponents

Reconstructedimages

Examples• Example 4: feature extraction

Extract silent features “Eigenface” from facial image to facilitate recognition.


Examples• Example 4: feature extraction (cont.)

– Eigenfaces are the eigenvectors of the covariance matrix of the vector space of human faces.

– A human face may be considered to be a combination of these standard faces

– the principal eigenface looks like a bland androgynous average human face


Examples• Example 4: feature extraction (cont.)

– When properly weighted, eigenfaces can be summed together to create an approximate face

– Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces

– Suppose we are going to use M eigenfaces. Then a facial image will be represented by M “coordinates” in the PCA subspace.

– Feature vectors of M elements will be used in a face recognition system for both training and testing


xUz TM=

(features) tionrepresentaa be toelements ofa vector :image an from converting elements ofa vector :

seigenvctor M topof consistingmatrix :2

2

Md

MdM

zxU ×


Relevant Issues• How to find an appropriate dimensionality, M, in the PCA space

– We use Proportion of Variance (PoV) to determine it in practice

– When PoV >= 90%, the corresponding k will be assigned to be M.dk

kd

i i

k

i i

λλλλλ

λ

λ+⋅⋅⋅++⋅⋅⋅+

+⋅⋅⋅+==

∑∑

=

=

1

1

1

1PoV


Relevant Issues• Limitations of the standard PCA

– Are dimensions of maximum data variance always the relevant dimensions for preservation?

– Other techniques are required!• Relevance component analysis (RCA)• Linear discriminative analysis (LDA)


Relevant Issues• Limitations of the standard PCA (cont.)

– Should the goal be finding independent rather than pair-wise uncorrelated/orthogonal dimensions?

– Another technique is required!• Independent Component Analysis (ICA)

ICAPCA


Relevant Issues• Limitations of the standard PCA (cont.)

– The reduction of dimensions for complex distributions may need nonlinear processing

– Nonlinear PCA extension• preserves the proximity between the points in the input space; i.e., local

topology of the distribution• Enables to unfold some varieties in the input data• Keep the local topology

Nonlinear projection of a spiral Nonlinear projection of a horseshoe


Relevant Issues• Miscellaneous PCA extensions (>100)

– Probabilistic PCA– 2-D PCA

– Sparse PCA/Scaled PCA

– Nonnegative Matrix Factorization

– PCA mixture and local PCA

– Principal Curve and Surface Analysis

– Kernel PCA (?)

– …………


Conclusion• PCA is a simple yet popular method for handling high

dimensional data and inspires many other methods.• It is a linear method for dimensionality reduction by

projecting original data to a new “coordinate” system to maximize data variance.

• PCA can be interpreted from various perspectives and therefore leads to different formulation methods.

• There are a number of limitations in the standard PCA.• There are several variants or extensions, which tends to

overcome the limitations of the standard PCA.

Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...

Documents

Transcript of Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...