Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...

25
Principal Component Analysis (PCA) COMP61021 Modelling and Visualization of High Dimensional Data Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2]

Transcript of Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf ·...

Page 1: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Principal Component Analysis (PCA)

COMP61021 Modelling and Visualization of High Dimensional Data

Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page.

Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2]

Page 2: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data2

Outline• Introduction • Principle• Algorithms• Exemplar Applications• Relevant Issues• Conclusion

Page 3: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data3

Introduction• Principal component analysis (PCA)

– A method for high-dimensional data analysis via redundancy reduction– Identifying an “optimal” low-dimensional linear projection; maximum data

variance in the new space– Useful for data visualization, compression and feature extraction

PCA finds a new “coordinate system”of maximum data variance

Projection to principal axis to lead to a new low-dimensional representation

Page 4: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data4

Principle• Finding 1st principal component

Given a data set of N data points in a d-dimensional space, },,{ 1 NxxX ⋅⋅⋅=

Page 5: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data5

Principle• Finding 1st principal component (cont.)

Page 6: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data6

Principle

• General formulationWe want to find M (M < d) principal components. So we need

,if jiji >λ<λ

otherwise) 0 and if 1( jiδij ==

Page 7: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data7

Principle• Data reconstruction after dimension reduction

From a “compressed” data point in the M-dimensional PCA space (M < d ), we can reconstruct the data point in the original d-dimensional space

Page 8: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data8

Principle• Perspective of minimizing reconstruction errors

Also PCA formulated from a perspective of minimizing reconstruction errors

Page 9: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data9

Principle• Dual PCA Idea

For a d x N (d >>N) matrix, X, dimensionality of this linear space < N– is d x d matrix; it is often computationally infeasible in

solving its eigenvalue problem– is N x N matrix and hence eigenvalue problem is

solvable.– Fortunately, we can prove S and S’ share the same eigenvalues!– If we achieve an eigenvector of S’, we can use it to produce the

corresponding eigenvector of S

XXTNS 1−=′

TNS XX1−=

. eigenvaule theshares which, ofr eigenvecto ingcorrespond its is and ofr eigenvecto an is where

1

i

ii

ii

i

SS'

N

λ

λuv

Xvu =

Page 10: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data10

Principle• Singular Value Decomposition (SVD)

For a d x N matrix, , it can be decomposed into the following form:

– U is a d x d orthogonal matrix, column i is the ith eigenvector of

– is an d x N “diagonal” matrix,

– is an N x N orthogonal matrix, column i is the ith eigenvector

• Link to PCA– If we make all data centralized by subtracting the mean,

is the covariance matrix of X– Column i in U corresponds to the ith principal component– Properties of SVD allow us to deal with a high dimensional data set of few data

points (i.e., d >> N) as it does not use a covariance matrix directly.

TVUX Σ=TXX

Σ jijjiiiii <≥= if , σσλσ

NT /XX

X

XXTTV

Page 11: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data11

Basic Algorithm • Data Centralization

For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by

• Eigenanalysiscalculate , finding out all d eigenvaules, ranking them so that

, and their corresponding eigenvectors, • Finding principal components

Selecting top M (M < d) largest eigenvectors of S to form a project matrix

• Encoding data pointz is a M-dimensional vector encoding a data point x.

• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.

.X̂

NT /ˆˆ XXS =

MMM λλ ≥⋅⋅⋅≥⋅⋅⋅= 11 ,],,[ uuU

)( xxUz −= TM

x

zUxx M+=′

)( , , NdNd <× X

dλ≥⋅⋅⋅≥λ1 .,,1 duu ⋅⋅⋅

x′

Page 12: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data12

Dual Algorithm • Data Centralization

For a given data set , subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by

• SVD Procedurecalculate and applying the SVD to Y. Then we achieve a d x d matrix (i.e., ).

• Finding principal componentsSelecting first M (M < d) columns of V to form a project matrix

• Encoding data pointz is a M-dimensional vector encoding a data point x.

• Reconstructing data point (Decoding)is a d-dimensional vector for the data point x.

.X̂

],,[ 1TM

TM vvU ⋅⋅⋅=

)( xxUz −= TM

x

zUxx M+=′

)( , , NdNd ≥× X

x′

NT /X̂Y =TVUY Σ=TV

Page 13: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data13

Examples• Example 1: Synthetic data

Page 14: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data14

Examples• Example 2: visualization of high-dimensional data

PCA application to visualization of microarray data

Page 15: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Examples• Example 3: data compression

A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels

COMP61021 Modelling and Visualization of High Dimensional Data

Page 16: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Examples• Example 3: data compression

A hand-written digit “3” data set of 600 images, 100 x 100 =10,000 pixels

COMP61021 Modelling and Visualization of High Dimensional Data

Originalimages

Principalcomponents

Reconstructedimages

Page 17: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Examples• Example 4: feature extraction

Extract silent features “Eigenface” from facial image to facilitate recognition.

COMP61021 Modelling and Visualization of High Dimensional Data

Page 18: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Examples• Example 4: feature extraction (cont.)

– Eigenfaces are the eigenvectors of the covariance matrix of the vector space of human faces.

– A human face may be considered to be a combination of these standard faces

– the principal eigenface looks like a bland androgynous average human face

COMP61021 Modelling and Visualization of High Dimensional Data

Page 19: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

Examples• Example 4: feature extraction (cont.)

– When properly weighted, eigenfaces can be summed together to create an approximate face

– Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces

– Suppose we are going to use M eigenfaces. Then a facial image will be represented by M “coordinates” in the PCA subspace.

– Feature vectors of M elements will be used in a face recognition system for both training and testing

COMP61021 Modelling and Visualization of High Dimensional Data

xUz TM=

(features) tionrepresentaa be toelements ofa vector :image an from converting elements ofa vector :

seigenvctor M topof consistingmatrix :2

2

Md

MdM

zxU ×

Page 20: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data20

Relevant Issues• How to find an appropriate dimensionality, M, in the PCA space

– We use Proportion of Variance (PoV) to determine it in practice

– When PoV >= 90%, the corresponding k will be assigned to be M.dk

kd

i i

k

i i

λλλλλ

λ

λ+⋅⋅⋅++⋅⋅⋅+

+⋅⋅⋅+==

∑∑

=

=

1

1

1

1PoV

Page 21: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data21

Relevant Issues• Limitations of the standard PCA

– Are dimensions of maximum data variance always the relevant dimensions for preservation?

– Other techniques are required!• Relevance component analysis (RCA)• Linear discriminative analysis (LDA)

Page 22: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data22

Relevant Issues• Limitations of the standard PCA (cont.)

– Should the goal be finding independent rather than pair-wise uncorrelated/orthogonal dimensions?

– Another technique is required!• Independent Component Analysis (ICA)

ICAPCA

Page 23: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data23

Relevant Issues• Limitations of the standard PCA (cont.)

– The reduction of dimensions for complex distributions may need nonlinear processing

– Nonlinear PCA extension• preserves the proximity between the points in the input space; i.e., local

topology of the distribution• Enables to unfold some varieties in the input data• Keep the local topology

Nonlinear projection of a spiral Nonlinear projection of a horseshoe

Page 24: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data24

Relevant Issues• Miscellaneous PCA extensions (>100)

– Probabilistic PCA– 2-D PCA

– Sparse PCA/Scaled PCA

– Nonnegative Matrix Factorization

– PCA mixture and local PCA

– Principal Curve and Surface Analysis

– Kernel PCA (?)

– …………

Page 25: Principal Component Analysis(PCA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/PCA.pdf · Principal Component Analysis(PCA) COMP61021 Modelling and Visualization of High Dimensional

COMP61021 Modelling and Visualization of High Dimensional Data25

Conclusion• PCA is a simple yet popular method for handling high

dimensional data and inspires many other methods.• It is a linear method for dimensionality reduction by

projecting original data to a new “coordinate” system to maximize data variance.

• PCA can be interpreted from various perspectives and therefore leads to different formulation methods.

• There are a number of limitations in the standard PCA.• There are several variants or extensions, which tends to

overcome the limitations of the standard PCA.