Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...
Transcript of Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...
Principal Component Analysis (PCA)
COMP61021 Modelling and Visualization of High Dimensional Data
Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page.
Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2]
COMP61021 Modelling and Visualization of High Dimensional Data2
Outline• Introduction • Principle• Algorithms• Exemplar Applications• Relevant Issues• Conclusion
COMP61021 Modelling and Visualization of High Dimensional Data3
Introduction• Principal component analysis (PCA)
– A method for high-dimensional data analysis via redundancy reduction– Identifying an “optimal” low-dimensional linear projection; maximum data
variance in the new space– Useful for data visualization, compression and feature extraction
PCA finds a new “coordinate system”of maximum data variance
Projection to principal axis to lead to a new low-dimensional representation
COMP61021 Modelling and Visualization of High Dimensional Data4
Principle• Finding 1st principal component
Given a data set of N data points in a d-dimensional space, },,{ 1 NxxX ⋅⋅⋅=
COMP61021 Modelling and Visualization of High Dimensional Data5
Principle• Finding 1st principal component (cont.)
COMP61021 Modelling and Visualization of High Dimensional Data6
Principle
• General formulationWe want to find M (M < d) principal components. So we need
,if jiji >λ<λ
otherwise) 0 and if 1( jiδij ==
COMP61021 Modelling and Visualization of High Dimensional Data7
Principle• Data reconstruction after dimension reduction
From a “compressed” data point in the M-dimensional PCA space (M < d ), we can reconstruct the data point in the original d-dimensional space
COMP61021 Modelling and Visualization of High Dimensional Data8
Principle• Perspective of minimizing reconstruction errors
Also PCA formulated from a perspective of minimizing reconstruction errors
COMP61021 Modelling and Visualization of High Dimensional Data9
Principle• Dual PCA Idea
For a d x N (d >>N) matrix, X, dimensionality of this linear space < N– is d x d matrix; it is often computationally infeasible in
solving its eigenvalue problem– is N x N matrix and hence eigenvalue problem is
solvable.– Fortunately, we can prove S and S’ share the same eigenvalues!– If we achieve an eigenvector of S’, we can use it to produce the
corresponding eigenvector of S
XXTNS 1−=′
TNS XX1−=
. eigenvaule theshares which, ofr eigenvecto ingcorrespond its is and ofr eigenvecto an is where
1
i
ii
ii
i
SS'
N
λ
λuv
Xvu =
COMP61021 Modelling and Visualization of High Dimensional Data10
Principle• Singular Value Decomposition (SVD)
For a d x N matrix, , it can be decomposed into the following form:
– U is a d x d orthogonal matrix, column i is the ith eigenvector of
– is an d x N “diagonal” matrix,
– is an N x N orthogonal matrix, column i is the ith eigenvector
• Link to PCA– If we make all data centralized by subtracting the mean,
is the covariance matrix of X– Column i in U corresponds to the ith principal component– Properties of SVD allow us to deal with a high dimensional data set of few data
points (i.e., d >> N) as it does not use a covariance matrix directly.
TVUX Σ=TXX
Σ jijjiiiii <≥= if , σσλσ
NT /XX
X
XXTTV
COMP61021 Modelling and Visualization of High Dimensional Data11
Basic Algorithm • Data Centralization
For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by
• Eigenanalysiscalculate , finding out all d eigenvaules, ranking them so that
, and their corresponding eigenvectors, • Finding principal components
Selecting top M (M < d) largest eigenvectors of S to form a project matrix
• Encoding data pointz is a M-dimensional vector encoding a data point x.
• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.
.X̂
NT /ˆˆ XXS =
MMM λλ ≥⋅⋅⋅≥⋅⋅⋅= 11 ,],,[ uuU
)( xxUz −= TM
x
zUxx M+=′
)( , , NdNd <× X
dλ≥⋅⋅⋅≥λ1 .,,1 duu ⋅⋅⋅
x′
COMP61021 Modelling and Visualization of High Dimensional Data12
Dual Algorithm • Data Centralization
For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by
• SVD Procedurecalculate and applying the SVD to Y. Then we achieve a d x d matrix (i.e., ).
• Finding principal componentsSelecting first M (M < d) columns of V to form a project matrix
• Encoding data pointz is a M-dimensional vector encoding a data point x.
• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.
.X̂
],,[ 1TM
TM vvU ⋅⋅⋅=
)( xxUz −= TM
x
zUxx M+=′
)( , , NdNd ≥× X
x′
NT /X̂Y =TVUY Σ=TV
COMP61021 Modelling and Visualization of High Dimensional Data13
Examples• Example 1: Synthetic data
COMP61021 Modelling and Visualization of High Dimensional Data14
Examples• Example 2: visualization of high-dimensional data
PCA application to visualization of microarray data
Examples• Example 3: data compression
A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels
COMP61021 Modelling and Visualization of High Dimensional Data
Examples• Example 3: data compression
A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels
COMP61021 Modelling and Visualization of High Dimensional Data
Originalimages
Principalcomponents
Reconstructedimages
Examples• Example 4: feature extraction
Extract silent features “Eigenface” from facial image to facilitate recognition.
COMP61021 Modelling and Visualization of High Dimensional Data
Examples• Example 4: feature extraction (cont.)
– Eigenfaces are the eigenvectors of the covariance matrix of the vector space of human faces.
– A human face may be considered to be a combination of these standard faces
– the principal eigenface looks like a bland androgynous average human face
COMP61021 Modelling and Visualization of High Dimensional Data
Examples• Example 4: feature extraction (cont.)
– When properly weighted, eigenfaces can be summed together to create an approximate face
– Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces
– Suppose we are going to use M eigenfaces. Then a facial image will be represented by M “coordinates” in the PCA subspace.
– Feature vectors of M elements will be used in a face recognition system for both training and testing
COMP61021 Modelling and Visualization of High Dimensional Data
xUz TM=
(features) tionrepresentaa be toelements ofa vector :image an from converting elements ofa vector :
seigenvctor M topof consistingmatrix :2
2
Md
MdM
zxU ×
COMP61021 Modelling and Visualization of High Dimensional Data20
Relevant Issues• How to find an appropriate dimensionality, M, in the PCA space
– We use Proportion of Variance (PoV) to determine it in practice
– When PoV >= 90%, the corresponding k will be assigned to be M.dk
kd
i i
k
i i
λλλλλ
λ
λ+⋅⋅⋅++⋅⋅⋅+
+⋅⋅⋅+==
∑∑
=
=
1
1
1
1PoV
COMP61021 Modelling and Visualization of High Dimensional Data21
Relevant Issues• Limitations of the standard PCA
– Are dimensions of maximum data variance always the relevant dimensions for preservation?
– Other techniques are required!• Relevance component analysis (RCA)• Linear discriminative analysis (LDA)
COMP61021 Modelling and Visualization of High Dimensional Data22
Relevant Issues• Limitations of the standard PCA (cont.)
– Should the goal be finding independent rather than pair-wise uncorrelated/orthogonal dimensions?
– Another technique is required!• Independent Component Analysis (ICA)
ICAPCA
COMP61021 Modelling and Visualization of High Dimensional Data23
Relevant Issues• Limitations of the standard PCA (cont.)
– The reduction of dimensions for complex distributions may need nonlinear processing
– Nonlinear PCA extension• preserves the proximity between the points in the input space; i.e., local
topology of the distribution• Enables to unfold some varieties in the input data• Keep the local topology
Nonlinear projection of a spiral Nonlinear projection of a horseshoe
COMP61021 Modelling and Visualization of High Dimensional Data24
Relevant Issues• Miscellaneous PCA extensions (>100)
– Probabilistic PCA– 2-D PCA
– Sparse PCA/Scaled PCA
– Nonnegative Matrix Factorization
– PCA mixture and local PCA
– Principal Curve and Surface Analysis
– Kernel PCA (?)
– …………
COMP61021 Modelling and Visualization of High Dimensional Data25
Conclusion• PCA is a simple yet popular method for handling high
dimensional data and inspires many other methods.• It is a linear method for dimensionality reduction by
projecting original data to a new “coordinate” system to maximize data variance.
• PCA can be interpreted from various perspectives and therefore leads to different formulation methods.
• There are a number of limitations in the standard PCA.• There are several variants or extensions, which tends to
overcome the limitations of the standard PCA.