PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA),...

42
PCA – Examples & Applications Objectives: Showcase PCA analysis – in PC-ORD and the literature

Transcript of PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA),...

Page 1: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA – Examples & Applications

➢ Objectives:

Showcase PCA analysis – in PC-ORD and the literature

Page 2: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – PC-ORD

➢ Results: Randomization tests

Page 3: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – PC-ORD

➢ Results: Species Loadings onto the PC Axes

➢ Use the scaled eigenvectors

Page 4: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – PC-ORD

➢ Results: Correlation with Axes

Page 5: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Example

➢ Results: Graphs

Samples: points Species: vectors

Page 6: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – PC-ORD

➢ Results: Graphs

Samples: points

Species: vectors

PC ORD recommends

displaying

species as vectors /

samples as points

Page 7: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

➢ Rotation by NEDO

Stretch plot along

direction of most

variation for species

NEDO Axes

Correlations

Axis 1: +0.51

Axis 2: -0.74

➢Rotation: Highlights certain patterns. Report in results

Principal Components (PCA) – PC-ORD

Page 8: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

➢ Percent of pattern

explained in original

distance matrix

➢ Orthogonality of PCA axes

Principal Components (PCA) – PC-ORD

Page 9: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Reporting➢ What type of cross-correlation matrix you used?

➢ If used with community data, justify using this linear

model for species data?

➢ How many axes were interpreted, and what proportion

of variance was explained by these axes?

➢ Principal eigenvectors - Test of significance?

➢ Rotation of the solution? Use of interpretation aids?

Use covariance matrix. Use euclidean distance

Were assumptions of linearity / normality met?

Describe the axes – and the individual / cumulative variance

Not necessary, but an option using randomization tests

Explain overlays and correlations of variables with axes

Page 10: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

Where do we start ? Data Exploration + Summarization

What do we look for ?

Mean, S.D., Skewness, Kurtosis (Transformations ?)

Value Ranges, Outliers (Typos ?)

Page 11: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

Principal Component - Results

Page 12: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

1st Stopping Rule (Eigenvalue > Broken-Stick Eigenvalue)

0 Axes meet this criterion

How Many Axes Give us the Right Answer?

Page 13: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

2nd Stopping Rule (Eigenvalue > Mean Randomization)

2 Axes meet this criterion

3rd Stopping Rule (p value)

2 Axes meet this criterion

Page 14: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

➢ Performing a Randomization Test:

The randomization: shuffle values within variables (columns) and re-

compute correlation matrix and eigenvalues. Repeat many times.

The test: Compare the actual eigenvalues (from test of real data)

against eigenvalues from the randomizations.

Calculate p value as:

1

1

N

n = p

where

n = number of randomizations where test statistic ≥ observed value

N = the total number of randomizations.

Principal Components – Randomizations

Page 15: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Rnd-Lambda – Compare eigenvalue for axis to observed eigenvalue for that axis

• fairly conservative and generally effective criterion

• more effective with uncorrelated variables included in the data, than Avg-Rnd

• performs better than other measures with strongly non-normal data

Rnd-F – Compare pseudo-F-ratio for an axis to the observed pseudo-F for that axis

Pseudo-F-ratio: is the eigenvalue for an axis divided by the sum of the remaining

(smaller) eigenvalues (sum of squares error – the remaining unexplained variance)

• particularly effective against uncorrelated variables

• performs poorly with grossly non-normal error structures

Avg-Rnd – Compare observed eigenvalue for a given axis to the average

eigenvalue obtained for that axis after randomization

• good when the data did not contain uncorrelated variables

• less stringent, too liberal when the data contain uncorrelated variables

Principal Components – Randomizations

Page 16: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell39

Page 17: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell36

Page 18: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Time

Page 19: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – MEI

Page 20: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – PDO

Page 21: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

Page 22: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

Page 23: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

T IM E

M EIPDO

upwel l 36upwel l 39

PCA_Upwell

Axis 1

Axis

2

T IM E

M EIPDO

upwel l 36upwel l 39

PCA_Upwell

Axis 1

Axis

2

Page 24: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

T IM E

M EI

PDO

upwel l 36

upwel l 39

PCA_Upwell

Axis 1

Axis

3

Page 25: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

NOTE: Use the

Euclidean Distance

Page 26: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

PCA Example – Upwell

Page 27: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum
Page 28: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components – Reporting➢ What type of cross-correlation matrix you used?

➢ If used with community data, justify using this linear

model for species data?

➢ How many axes were interpreted, and what proportion

of variance was explained by these axes?

➢ Principal eigenvectors - Test of significance?

➢ Rotation of the solution? Use of interpretation aids?

Use covariance matrix. Use euclidean distance

Were assumptions of linearity / normality met?

Describe the axes – and the individual / cumulative variance

Not necessary, but an option using randomization tests

Explain overlays and correlations of variables with axes

Page 29: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper I

➢ Example: Weichler et al. (2004).

➢ Objective: Relate seabird densities to seven

environmental parameters:

(1) water depth, (2) distance to nearest land, (3) number

of trawlers within a radius of 5 km, (4) sea surface

temperature, (5) water temperature difference (0 – 10 m) ,

(6) water temperature difference (0 – 30 m), and (6) water

temperature difference (10 – 50 m)

➢ NOTE: Did Not Report Cross-correlations

of environmental parameters

Page 30: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper I

➢ Data Manipulations To Avoid Biases:

• Species densities (birds / km 2) were selected as variables

and 10 min intervals (samples), were selected as cases

• Only species seen in at least five counting intervals were

included, an arbitrary choice that allowed covering a wide

spectrum of species while ignoring those with few occurrences

• Only commoner species with numbers exceeding 1% of all

individuals counted were included in the analysis

• Dataset of 46 sections of the cruise tracks. Each section

comprised a hydrographic station approximately midway and

10 min intervals in two opposite directions (4 – 8 km away)

• Sample Size: 46 samples / 7 variables: Ratio of 6.5

Page 31: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper I

➢ Community-Wide Result: Six principal eigenvalues (> 1),

showing % of variation explained and ecological interpretation

➢ PC Axis Interpretations:

Page 32: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper I

➢ Community-Wide Result: Loadings for the 11 seabird

species and 7 variables on the six principal eigenvalues

• 3 principal components: 50 % of variance

• 6 principal components: 78 % of variance

L

O

A

D

I

N

G

S

Page 33: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper I

➢ Community-Wide Result: Axes explained using(strongest)

loadings of different species and environmental variables

➢ Note: Cannot determine which loadings are significant

(what can we use to quantify correlation w axes?)

Page 34: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢Example: Ainley, D.G. et al. (2005).

➢ Objective: Relate densities of the 12 most abundant

species of seabirds to 12 habitat variables:

5 biological, 4 oceanographic, 3 geographic (spatial)

82.3%

Page 35: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ Oceanographic variables examined:

sea-surface temperature / salinity, thermocline depth / strength

Date Distance to Fronts Chl

MaxAcoustic

Biomass

Page 36: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ Data Manipulations To Avoid Biases:

• Densities log-transformed to meet normality assumptions

• Nevertheless, residuals generated in the regressions for

some species did not meet those assumptions (Skewness /

Kurtosis Test for Normality of Residuals, p < 0.05)

• Least-squares regression analysis (ANOVA), however,

is a very robust procedure with respect to non-normality

(Seber, 1977, Kleinbaum et al., 1988)

• Yet, while these analyses yield the best linear unbiased

estimator in the absence of normally distributed residuals, p-

values near 0.05 must be viewed with caution (Seber, 1977)

Page 37: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ To avoid double-absences:

• Only 15-min transects in which any given species was

recorded were analyzed

• The total sample size for the 12 species was 1209

➢ Is this an adequate sample size ?

Rule of thumb:

• 5 samples per variable (Tabachnick and Fidell 1989)

• 1209 / 12 ~ 100 samples per variable

Page 38: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ Analysis Methods:

• Principal components analysis (PCA), in combination

with Sidak multiple comparison tests, used to assess

differences in habitat selection among 12 seabird species

• To test for significant differences in habitat affinities

among seabird species, used two one-way ANOVAs:

In the first, tested for differences among PC1 scores of

each species; in the second, compared the PC2 scores

• Differences between two species significant if either

one or both PC scores differed significantly

Page 39: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ Community-Wide Result: First and second PC axes

explained 60% of variance in distribution of 12 species

Page 40: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Paper II

➢ Species-specific Results:

• Species mapped onto two

(independent) dimensions

• Pair-wise associations

(tested) denoted by circles

Near

Fronts

Zoop

Prey

Salty, Green

Fish Prey

Page 41: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – Comparisons

➢ Number of Axes:

- Selected 2 – easy to interpret (Ainley et al. 2005)

- Selected 6 – based on eigenvalues > 1 (Weichler et al. 2004)

➢ Display of Results:

- Plot & table of eigenvalues (Ainley et al. 2005)

- Eigenvalues & interpretation (description) (Weichler et al. 2004)

➢ Significance Tests:

- Pairwise species comparisons (ANOVA) (Ainley et al. 2005)

- Correlations with selected variables (Weichler et al. 2004)

Page 42: PCA Examples & Applications · 2018. 2. 9. · •Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum

Principal Components (PCA) – ReferencesAinley DG, Spear LB, Tynan CT, Barth JA, Pierce SD, Ford RG, Cowles TJ

(2005). Physical and biological variables affecting seabird distributions

during the upwelling season of the northern California Current. Deep-Sea

Research II 52: 123–143

Weichler T, Garthe S, Luna-Jorquera G, Moraga J (2004). Seabird

distribution on the Humboldt Current in northern Chile in relation to

hydrography, productivity, and fisheries. ICES J. Marine Science

61 (1):148-154

Disclaimer ReferencesSeber, G.A.F. (Ed.), 1977, Linear Regression Analysis. Wiley, New York.

Kleinbaum, D.G., Kupper, L.L., Muller, K.E., 1988. Applied Regression

Analysis and other Multivariable Methods. PWS-KENT Publishing Company,

Boston.

Tabachnik, B.G. and L.S. Fidell. 1989. Using Multivariate Statistics. 2nd ed.

New York: Harper and Row.