20090504_ir_studygroup
-
Upload
johnsonchen -
Category
Documents
-
view
13 -
download
0
description
Transcript of 20090504_ir_studygroup
Theory and Toolkits of PCA
2009 5/4 IRLab Study Group
Presenter : Chin-Hui Chen
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
Scenario (Point? Line?)
Consider a 2-dimension space
d
Least Squared Error
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
What is PCA ? (1)
Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called “principal components”.
What is PCA ? (2)
What can PCA do ?◦Dimensionality Reduction
For example :
◦Assuming N points in D-dim space◦e.g. {x1, x2, x3, x4} ; xi = (v1, v2)
◦A set (M) of basis for projection◦e.g. {u1}
They are orthonormal bases ( 長度 1, 兩兩內積 0) M << D (represent the feature in M dimensions)
◦e.g. xi = (p1)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
How to minimize Squared-Error ?
Consider a D-dimension space◦Given N point : {x1, x2, …, xn}
◦ xi is a D-dim vector
How to ◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小
How to ? - Point
◦Goal : Find x0 s.t. min.◦ ◦Let .
How to ? – Point - Line
∴ x0 =
◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小
L : xk’- x0 = ake xk’= x0 + ake = m + ake
How to ? – Line
L : xk’ = m + akeGoal :
Find a1…an
How to ? – Line
每個部份微分後 [2ak – 2aket(xk-m)]
What does it mean ?
How to ? – Line
Then, how about e ?
How to ? – Line
Let
Independent of e
How to ? – Line
f(x,y) ->
But if x,y : g(x,y)=0
J’1(e) = -etSeUse lagrange multiplier :
Because |e| = 1 , u = etSe – λ(ete-1)
How to ? – Line
◦What is S ?
Covariance Matrix ( 共變異數矩陣 )◦Assume D-dim
How to ? – Line
, we know S.Then, what is e ? Eigenvectors of S.
AX= λX Eigen : same
How to ? – conclusion
Summary :◦ Find a line : xk’= m + ake
ak = et(xk-m) Se = λe ; e = eigenvectors of covariance matrix.
◦D-dim space can find D eigenvectors.
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
Dimensionality Reduction
Dimensionality Reduction
Consider a 2-dim space …
X1 = (a,b) X2 = (c,d)
X1 = (a’,b’) X2 = (c’,d’)
We are going to do …X1 = (a’) X2 = (c’)
Dimensionality Reduction
We want to proof :◦Axes of the data are independent.
Consider N m-dim vectors◦{x1, x2, … ,xn}
◦Let X=[x1-m x2-m … xn-m]T m = mean
◦Let E = [e1 e2 … em]
Se = λe eigen decomposition Eigen vector {e1,…,em}
Eigen value {λ1,…, λm}
Dimensionality Reduction
SE = [Se1 Se2 … Sem] = [λe1 λe2 … λem] =
= EDS = EDE-1
E = [e1 e2 … em]
Dimensionality Reduction
We want to know new Covariance Matrix of projected vectors.
Let Y = [y1 y2 … yn]T
E = [e1 e2 … em]
Y = ETX
SY
Dimensionality Reduction
SY = D
1. Covariance of two axes are 0.2. represent data↑->covariance of axes↑ -> λ ↑
Dimensionality Reduction
Conclusion : If we want to reduce
dimension D to M (M<<D) 1. Find S 2. ->eigenvalues 3. Select Top M 4. Project data
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
Toolkits
A List of PCA Toolkits
C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
Perl◦ PDL::PCA
Matlab◦ Statistics Toolbox™ : princomp
Weka◦ weka.attributeSelection.PrincipalComponents
(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html )
A List of PCA Toolkits
C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
C : Download: pca.c Compile: cc pca.c -lm -o pcac Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt
Java : Download: JAMA, PCAcorr.java Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java Run: java PCAcorr iris.dat > pcaout.java.txt