No quiz this week.

43
No quiz this week.

description

No quiz this week. . Reading. Principal components analysis: Textbook, Chapter 10, Section 10.2 Smith, A Tutorial on Principal Components Analysis (linked to class webpage) Evolutionary Learning Textbook, Chapter 12. Dimensionality Reduction: Principal Components Analysis. - PowerPoint PPT Presentation

Transcript of No quiz this week.

Page 1: No quiz this week.

No quiz this week.

Page 2: No quiz this week.

Reading

• Principal components analysis: – Textbook, Chapter 10, Section 10.2– Smith, A Tutorial on Principal Components Analysis

(linked to class webpage)

• Evolutionary Learning– Textbook, Chapter 12

Page 3: No quiz this week.

Dimensionality Reduction:

Principal Components Analysis

(New version of slides as of 3/5/2012)

Page 5: No quiz this week.

Data

x2

x1

Page 6: No quiz this week.

Data

x2

x1

First principal component

Gives direction of largest variation of the data

Page 7: No quiz this week.

Data

x2

x1

Second principal component

Gives direction of secondlargest variation

First principal component

Gives direction of largest variation of the data

Page 8: No quiz this week.

Rotation of Axes

x2

x1

Page 9: No quiz this week.

Dimensionality reduction

x2

x1

Page 10: No quiz this week.

Classification (on reduced dimensionality space)

+−

x2

x1

Page 11: No quiz this week.

Classification (on reduced dimensionality space)

+−

x2

x1

Note: Can be used for labeled or unlabeled data.

Page 12: No quiz this week.

Principal Components Analysis (PCA)

• Summary: PCA finds new orthogonal axes in directions of largest variation in data.

• PCA used to create high-level features in order to improve classification and reduce dimensions of data without much loss of information.

• Used in machine learning and in signal processing and image compression (among other things).

Page 13: No quiz this week.

• Suppose attributes are A1 and A2, and we have n training examples. x’s denote values of A1 and y’s denote values of A2 over the training examples.

• Variance of an attribute:

Background for PCA

)1(

)()var( 1

2

1

n

xxA

n

ii

Page 14: No quiz this week.

• Covariance of two attributes:

• If covariance is positive, both dimensions increase together. If negative, as one increases, the other decreases. Zero: independent of each other.

)1(

))((),cov( 1

21

n

yyxxAA

n

iii

Page 15: No quiz this week.

• Covariance matrix– Suppose we have n attributes, A1, ..., An.

– Covariance matrix:

),cov( where),( ,, jijijinn AAccC

Page 16: No quiz this week.

Covariance matrix

3705.1045.1047.47

)var(5.1045.104)var(

),cov(),cov(),cov(),cov(

MH

MMHMMHHH

Page 17: No quiz this week.

• Eigenvectors: – Let M be an nn matrix.

• v is an eigenvector of M if M v = v• is called the eigenvalue associated with v

– For any eigenvector v of M and scalar a,

– Thus you can always choose eigenvectors of length 1:

– If M is symmetric with real entries, it has n eigenvectors, and they are orthogonal to one another.

– Thus eigenvectors can be used as a new basis for a n-dimensional vector space.

vvM aa

1... 221 nvv

Review of Matrix Algebra

Page 18: No quiz this week.

Principal Components Analysis (PCA)

1. Given original data set S = {x1, ..., xk}, produce new set by subtracting the mean of attribute Ai from each xi.

Mean: 1.81 1.91 Mean: 0 0

Page 19: No quiz this week.
Page 20: No quiz this week.

2. Calculate the covariance matrix:

3. Calculate the (unit) eigenvectors and eigenvalues of the covariance matrix:

x yxy

Page 21: No quiz this week.

Eigenvector with largesteigenvalue traces linear pattern in data

Page 22: No quiz this week.

4. Order eigenvectors by eigenvalue, highest to lowest.

In general, you get n components. To reduce dimensionality to p, ignore np components at the bottom of the list.

0490833989.677873399.735178956.

28402771.1735178956.677873399.

2

1

v

v

Page 23: No quiz this week.

Construct new “feature vector” (assuming vi is a column vector).

Feature vector = (v1, v2, ...vp)

FeatureVector1 =−.677873399 −.735178956−.735178956 .677873399 ⎛ ⎝ ⎜

⎞ ⎠ ⎟

or reduced dimension feature vector :

FeatureVector 2 =−.677873399−.735178956 ⎛ ⎝ ⎜

⎞ ⎠ ⎟

V1V2

Page 24: No quiz this week.

5. Derive the new data set.

TransformedData = RowFeatureVector RowDataAdjust

where RowDataAdjust = transpose of mean-adjusted data

This gives original data in terms of chosen components (eigenvectors)—that is, along these axes.

735178956.677873399.2

677873399.735178956.735178956.677873399.

1

VectorRowFeature

VectorRowFeature

01.131.81.31.79.09.129.99.21.149.71.31.81.19.49.29.109.39.31.169.

ustRowDataAdj

Page 25: No quiz this week.
Page 26: No quiz this week.

Intuition: We projected the data onto new axes that capturesthe strongest linear trends in the data set. Each transformed data point tells us how far it is above or below those trend lines.

Page 27: No quiz this week.
Page 28: No quiz this week.

Reconstructing the original data

We did:TransformedData = RowFeatureVector RowDataAdjust

so we can do RowDataAdjust = RowFeatureVector -1

TransformedData

= RowFeatureVector T TransformedData

and RowDataOriginal = RowDataAdjust + OriginalMean

Page 29: No quiz this week.
Page 30: No quiz this week.

Textbook’s notation• We have original data X and mean-subtracted data B, and covariance matrix C =

cov(B), where C is an N×N matrix:

• We find matrix V such that the columns of V are the N eigenvectors of C and

where λi is the ith eigenvalue of C.

• Each eigenvalue in D corresponds to an eigenvector in V. The eigenvectors, sorted in order of decreasing eigenvalue, become the “feature vector” for PCA.

D = V−1C V =

λ1 0 0 00 λ 2 0 0... ... ... ...0 0 0 λ N

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

C =1N

xnxnT

n =1

N

Page 31: No quiz this week.

• With new data, compute TransformedData = RowFeatureVector RowDataAdjust

where RowDataAdjust = transpose of mean-adjusted data

Page 32: No quiz this week.

What you need to remember• General idea of what PCA does

– Finds new, rotated set of orthogonal axes that capture directions of largest variation

– Allows some axes to be dropped, so data can be represented in lower-dimensional space.

– This can improve classification performance and avoid overfitting due to large number of dimensions.

• You don’t need to remember details of PCA algorithm.

Page 33: No quiz this week.

Example: Linear discrimination using PCA for face recognition (“Eigenfaces”)

1. Preprocessing: “Normalize” faces

• Make images the same size

• Line up with respect to eyes

• Normalize intensities

Page 34: No quiz this week.
Page 35: No quiz this week.

2. Raw features are pixel intensity values (2061 features)

3. Each image is encoded as a vector i of these features

4. Compute “mean” face in training set:

M

iiM 1

1

Page 36: No quiz this week.

• Subtract the mean face from each face vector

• Compute the covariance matrix C

• Compute the (unit) eigenvectors vi of C

• Keep only the first K principal components (eigenvectors)

ii

From W. Zhao et al., Discriminant analysis of principal components for face recognition.

Page 37: No quiz this week.
Page 38: No quiz this week.

The eigenfaces encode the principal sources of variation in the dataset (e.g., absence/presence of facial hair, skin tone, glasses, etc.).

We can represent any face as a linear combination of these“basis” faces.

Use this representation for:• Face recognition

(e.g., Euclidean distance from known faces)

• Linear discrimination (e.g., “glasses” versus “no glasses”, or “male” versus “female”)

Interpreting and Using Eigenfaces

Page 39: No quiz this week.

Eigenfaces Demo

• http://demonstrations.wolfram.com/FaceRecognitionUsingTheEigenfaceAlgorithm/

Page 40: No quiz this week.

Kernel PCA

• PCA: Assumes direction of variation are all straight lines

• Kernel PCA: Maps data to higher dimensional space,

Page 41: No quiz this week.

From Wikipedia

Original data Data after kernel PCA

Page 42: No quiz this week.

Kernel PCA

• Use Φ(x) and kernel matrix Kij = Φ(xi) Φ(xj) to compute PCA transform.

(Optional: See 10.2.2 in textbook, though it might be a bit confusing. Also see “Kernel Principal Components Analysis” by Scholkopf et al., linked to the class website ).

Page 43: No quiz this week.

Kernel Eigenfaces

(Yang et al., Face Recognition Using Kernel Eigenfaces, 2000)

Training data: ~ 400 images, 40 subjects

Original features: 644 pixel gray-scale values.

Transform data using kernel PCA, reduce dimensionality to number ofcomponents giving lowest error.

Test: new photo of one of the subjects

Recognition done using nearest neighbor classification