Covariance Matrix Applications

24
Covariance Matrix Applications Dimensionality Reduction

description

Covariance Matrix Applications. Dimensionality Reduction. Outline. What is the covariance matrix? Example Properties of the covariance matrix Spectral Decomposition Principal Component Analysis. Covariance Matrix. - PowerPoint PPT Presentation

Transcript of Covariance Matrix Applications

Page 1: Covariance Matrix  Applications

Covariance Matrix Applications

Dimensionality Reduction

Page 2: Covariance Matrix  Applications

Outline

• What is the covariance matrix?

• Example

• Properties of the covariance matrix

• Spectral Decomposition – Principal Component Analysis

Page 3: Covariance Matrix  Applications

Covariance Matrix

• Covariance matrix captures the variance and linear correlation in multivariate/multidimensional data.

• If data is an N x D matrix, the Covariance Matrix is a d x d square matrix

• .Think of N as the number of data instances (rows) and D the number of attributes (columns).

Page 4: Covariance Matrix  Applications

Covariance Formula

• Let Data = N x D matrix.

• The Cov(Data)

211

112

11

)())((

))((...)(

dddd

dd

XEXXE

XXEXE

Page 5: Covariance Matrix  Applications

Example

214

113

142

321

R

9167.033.05.0

33.021

5.0167.1

COV(R)

Page 6: Covariance Matrix  Applications

68.00.0

0.007.0

078.006.0

06.007.0

087.0008.0

008.007.0

Moral: Covariance can only capture linear relationships

Page 7: Covariance Matrix  Applications

Dimensionality Reduction

• If you work in “data analytics” it is common these days to be handed a data set which has lots of variables (dimensions).

• The information in these variables is often redundant – there are only a few sources of genuine information.

• Question: How can be identify these sources automatically?

Page 8: Covariance Matrix  Applications

Hidden Sources of Variance

X1

X2

X3

X4

H2

H1

X1 X2 X3 X4

D A T A

D A T A

D A T A

D A T A

Model: Hidden Sources are Linear Combinations of Original Variables

Page 9: Covariance Matrix  Applications

Hidden Sources

• If the information that the known variables provided was different then the covariance matrix between the variables should be a diagonal matrix – i.e, the non-zero entries only appear on the diagonal.

• In particular, if Hi and Hj are independent then E(Hi-i)(Hj-j)=0.

Page 10: Covariance Matrix  Applications

Hidden Sources

• So the question is what should be the hidden sources.

• It turns out that the “best” hidden sources are the eigenvectors of the covariance matrix.

• If A is a d x d matrix, then <, x> is an eigenvalue-eigenvector pair if

• Ax = x

Page 11: Covariance Matrix  Applications

Explanation

We have two axis, X1 and X2. We want to project the data along the directionof maximum variance.

a

Page 12: Covariance Matrix  Applications

Covariance Matrix Properties

• The Covariance matrix is symmetric.

• Non-negative eigenvalues. – 0 · 1 · 2 d

• Corresponding eigenvectors– u1,u2,,ud

Page 13: Covariance Matrix  Applications

Principal Component Analysis

• Also known as– Singular Value Decomposition– Latent Semantic Indexing

• Technique for data reduction. Essentially reduce the number of columns while losing minimal information

• Also think in terms of lossy compression.

Page 14: Covariance Matrix  Applications

Motivation

• Bulk of data has a time component

• For example, retail transactions, stock prices

• Data set can be organized as N x M table

• N customers and the price of the calls they made in 365 days

• M << N

Page 15: Covariance Matrix  Applications

Objective

• Compress the data matrix X into Xc, such that– The compression ratio is high and the

average error between the original and the compressed matrix is low

– N could be in the order of millions and M in the order of hundreds

Page 16: Covariance Matrix  Applications

Example database

We

7/10

Thr

7/11

Fri

7/12

Sat

7/13

Sun

7/14

ABC 1 1 1 0 0

DEF 2 2 2 0 0

GHI 1 1 1 0 0

KLM 5 5 5 0 0

smith

0 0 0 2 2

john 0 0 0 3 3

tom 0 0 0 1 1

Page 17: Covariance Matrix  Applications

Decision Support Queries

• What was the amount of sales to GHI on July 11?

• Find the total sales to business customers for the week ending July 12th?

Page 18: Covariance Matrix  Applications

Intuition behind SVD

x

y

x’

y’

Customer are 2-D points

Page 19: Covariance Matrix  Applications

SVD Definition

• An N x M matrix X can be expressed as

tVUX

Lambda is a diagonal r x r matrix.

Page 20: Covariance Matrix  Applications

SVD Definition

• More importantly X can be written as

trrr

tt vuvuvuX 222111

Where the eigenvalues are in decreasing order.

tkkk

ttc vuvuvuX 222111

k,<r

Page 21: Covariance Matrix  Applications

Example

71.0

71.0

0

0

0

27.

80.53.

0

0

0

0

29.5

0

0

58.

58.

58.

0

00

90.

18.

36.

18.

64.9

t

X

Page 22: Covariance Matrix  Applications

Compression

tii

r

ii vuX

1

it

k

iiic vuX

1

Where k <=r <= M

Page 23: Covariance Matrix  Applications
Page 24: Covariance Matrix  Applications