Environmental Data Analysis with MatLab

65
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis

description

Environmental Data Analysis with MatLab. Lecture 15: Factor Analysis. SYLLABUS. - PowerPoint PPT Presentation

Transcript of Environmental Data Analysis with MatLab

Page 1: Environmental Data Analysis with  MatLab

Environmental Data Analysis with MatLab

Lecture 15:

Factor Analysis

Page 2: Environmental Data Analysis with  MatLab

Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps

SYLLABUS

Page 3: Environmental Data Analysis with  MatLab

purpose of the lecture

introduce

Factor Analysis

a method of detecting patterns in data

Page 4: Environmental Data Analysis with  MatLab

source A

ocean

sediment

source B

s4s2 s3s1 s5

example:

sediment samples are a mix of several sources

Page 5: Environmental Data Analysis with  MatLab

e1e2e3e4e5

e1e2e3e4e5

s1 s2

ocean

sediment

what does the composition of the samples

tell you about the composition of the sources?

Page 6: Environmental Data Analysis with  MatLab

another example

Atlantic Rock Datasetchemical composition for several thousand rocks

Page 7: Environmental Data Analysis with  MatLab

Rocks are a mix of minerals, and …

mineral 1mineral 2mineral 3

rock 1 rock 2rock 3

rock 4

rock 5 rock 6 rock 7

…minerals have a well-defined composition

Page 8: Environmental Data Analysis with  MatLab

Which simpler?

rocks have a chemical composition

or

rocks contain mineralsand

minerals have chemical compositions

Page 9: Environmental Data Analysis with  MatLab

answer will depend on how many minerals are involved

and how many elements are in each mineral

Page 10: Environmental Data Analysis with  MatLab

representing mixing with matrices

Page 11: Environmental Data Analysis with  MatLab

the sample matrix, SN samples by M elements

e.g.sediment samples

rock samples

word element is used in the abstract sense and may not refer to actual chemical elements

Page 12: Environmental Data Analysis with  MatLab

the factor matrix, FP factors by M elements

e.g.sediment sources

minerals

note that there are P factorsa simplification if P<M

Page 13: Environmental Data Analysis with  MatLab

the loading matrix, CN samples by P factors

specifies the mix of factors for each sample

Page 14: Environmental Data Analysis with  MatLab

summary

samples contain factors

factors contain elements

Page 15: Environmental Data Analysis with  MatLab

an important issue

how many factors are needed to represent the samples?

need at most P=Mbut is P < M ?

Page 16: Environmental Data Analysis with  MatLab

simple example using ternary diagrams

Page 17: Environmental Data Analysis with  MatLab

samples

element

element element B

Page 18: Environmental Data Analysis with  MatLab

samples

element

element element B

line of samples implies only 2 factors, so P=2

Page 19: Environmental Data Analysis with  MatLab

factorssamples

element

element element B

Page 20: Environmental Data Analysis with  MatLab

A) B)factor, f’2

factor, f’1

factor, f1

factor, f2

data do not uniquely determine factors

two bracketing factors most typical factor and deviation from it

Page 21: Environmental Data Analysis with  MatLab

mathematically

S = CF = C’ F’with F’ = M F and C’ = C M-1 where M is any P×P matrix with an inverse

must rely on prior information to choose M

Page 22: Environmental Data Analysis with  MatLab

a method to determine

the minimum number of factors, Pand

one possible set of factors

Page 23: Environmental Data Analysis with  MatLab

a digression, but an important one

suppose that we have an N×N square matrix, Mand we experiment with it by multiplying “input”

vectors, v, by it to create “output” vectors, ww = Mv

Page 24: Environmental Data Analysis with  MatLab

surprisingly, the answer to the question

when is the output parallel to the input ?

tells us everything about the matrix

Page 25: Environmental Data Analysis with  MatLab

if w is parallel to vthenw = λ v

where λ is a proportionality factor

the equationw = Mv is then λ v = Mv or (M - λ I)v=0

Page 26: Environmental Data Analysis with  MatLab

but if (M - λ I)v=0then it would seem that

v = (M - λ I)-10 = 0 which is not a very interesting solutionw is parallel to v when v is zero

Page 27: Environmental Data Analysis with  MatLab

to make an interesting solution you must choose λ so that

(M - λ I)-1 doesn’t exist

which is equivalent to choosing λ so that

det(M - λ I)=0

Page 28: Environmental Data Analysis with  MatLab

to make an interesting solution you must choose λ so that

(M - λ I)-1 doesn’t exist

which is equivalent to choosing λ so that

det(M - λ I)=0

since a matrix with zero

determinant has no inverse

Page 29: Environmental Data Analysis with  MatLab

in the 2×2 case …

this is a quadratic equation in λand so has two solutionsλ1 and λ 2

Page 30: Environmental Data Analysis with  MatLab

in the N×N case

det(M - λ I)=0

is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N

each corresponds to a different vv(1), v(2), … v(N)

Page 31: Environmental Data Analysis with  MatLab

in the N×N case

det(M - λ I)=0

is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N

each corresponds to a different vv(1), v(2), … v(N)“eigenvalues”

“eigenvectors”

Page 32: Environmental Data Analysis with  MatLab

N×N matrix, Mw = Mv when is the output parallel to the input ?

N different cases

Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N)

Page 33: Environmental Data Analysis with  MatLab

Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N) simplify notationMV = V Λ

Page 34: Environmental Data Analysis with  MatLab

In the text its shown thatif M is symmetric

then

all λ’s are real

v’s are orthonormal

v(i)T v(j) = 1 if i=j0 if i ≠ j

Page 35: Environmental Data Analysis with  MatLab

In the text its shown thatif M is symmetric

then

all λ’s are real

v’s are orthonormal

v(i)T v(j) = 1 if i=j0 if i ≠ j

implies VTV = VVT= I

Page 36: Environmental Data Analysis with  MatLab

MV = V Λpost-multiply by VT

M = V Λ VT

M can be constructed from V and Λso

when is the output parallel to the input ?tells you everything about M

Page 37: Environmental Data Analysis with  MatLab

now here’s what this has to do with factors

Page 38: Environmental Data Analysis with  MatLab

suppose S is square and symmetricthen

S = CF = V Λ VT

Page 39: Environmental Data Analysis with  MatLab

suppose S is square and symmetricthen

S = CF = V Λ VTC F

Page 40: Environmental Data Analysis with  MatLab

suppose S is square and symmetricthen

S = CF = V Λ VTC F

S can be represented by M mutually-perpendicular factors, F

Page 41: Environmental Data Analysis with  MatLab

furthermore, suppose that only P eigvenvalues are nonzero

the eigenvectors with zero eigenvalues can be thrown out of the equation

Page 42: Environmental Data Analysis with  MatLab

we can reduce the number of factors from M to P

S = CF = VP ΛP VPTC F

S can be represented by P mutually-perpendicular factors, FP

Page 43: Environmental Data Analysis with  MatLab

unfortunately …

Sis usually neither square nor symmetric

so a patch in the methodology is needed

Page 44: Environmental Data Analysis with  MatLab

the trick …

STSis an M×M square matrix

Page 45: Environmental Data Analysis with  MatLab

suppose

STShas eigenvalues ΛP and eigenvectors VP

Page 46: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

Page 47: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

Page 48: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots insert identity matrix, I

Page 49: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

Page 50: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

group and write first group as transpose of transpose

Page 51: Environmental Data Analysis with  MatLab

STS written in terms of its eigenvalues and eigenvectors

write ΛP as product of its square roots

write I = UpTUp, with Up as yet unknown

insert identity matrix, I

group and write first group as transpose of transpose

compare

Page 52: Environmental Data Analysis with  MatLab

so

Page 53: Environmental Data Analysis with  MatLab

and

so

Page 54: Environmental Data Analysis with  MatLab

and

so

called the “singular value decomposition” of S

now the non-square, non-symmetric matrix, S, is represented as a mix of P

mutually perpendicular factors

called the “singular values”

Page 55: Environmental Data Analysis with  MatLab

the matrix of loadings, C.

the matrix of factors, F

since C depends on Σ,the samples contains more of the factors with large singular values than of the factors with

the small singular values

Page 56: Environmental Data Analysis with  MatLab

in MatLab

svd() computes all M factors(you must decide how many to use)

Page 57: Environmental Data Analysis with  MatLab

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000singular values, s(i)

index, i

s(i)

sing

ular

val

ues,

Sii

index, i

singular values of the Atlantic Rock dataset(sorted into order of size)

Page 58: Environmental Data Analysis with  MatLab

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000singular values, s(i)

index, i

s(i)

sing

ular

val

ues,

Sii

index, i

singular values of the Atlantic Rock dataset(sorted into order of size)

discard, since close to zero

Page 59: Environmental Data Analysis with  MatLab

factors of the Atlantic Rock dataset

Page 60: Environmental Data Analysis with  MatLab

factor of the Atlantic Rock dataset

factor 1 is the “typical factor”

Page 61: Environmental Data Analysis with  MatLab

factor of the Atlantic Rock dataset

factor 2 as MgO increases, Al2O3 and CaO decreases

Page 62: Environmental Data Analysis with  MatLab

factor of the Atlantic Rock dataset

factor 3: as Al2O3 increases, FeO and CaO increase

Page 63: Environmental Data Analysis with  MatLab

f2 f3 f4 f5

f2p f3p f4p f5p

graphical representation of factors 2 through 5

f5f2 f3 f4

SiO2

TiO2

Al2O3

FeOtotal

MgO

CaO

Na2O

K2O

Page 64: Environmental Data Analysis with  MatLab

C2C3

C4

factor loadings C2 through C4 plotted in 3D

factors 2 through 4 capture most of the variability of the rocks

Page 65: Environmental Data Analysis with  MatLab

Al203

Ti02Al203

Si02

K20

Fe0

Mg0

Al203

A) B)

C) D)