Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

60
Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin Distance Measures and Ordination

description

Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin. Distance Measures and Ordination. Goals of Ordination. To arrange items along an axis or multiple axes in a logical order To extract a few major gradients that explain much of the variability in the total dataset - PowerPoint PPT Presentation

Transcript of Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Page 1: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Distance Measures and Ordination

Page 2: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Goals of Ordination

• To arrange items along an axis or multiple axes in a logical order

• To extract a few major gradients that explain much of the variability in the total dataset

• Most importantly: to interpret the gradients since important ecological processes generated them

Page 3: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

http://ordination.okstate.edu/

Page 4: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

What makes ordination possible?

• Variables (species) are “correlated” (in a broad sense)

• Correlated variables = redundancy

• Ordination thrives on the complex network of inter-correlations among species

Page 5: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Ordination helps to:

• Describe the strongest patterns of community composition

• Separate strong patterns from weak ones

• Reveal unforeseen patterns and suggest unforeseen processes

Page 6: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

“Direct” gradient analysis

• Order plots along measured environmental gradients

• e.g., regress diatom abundance on salinity

Page 7: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

“Indirect” gradient analysis

• Order plots according to– covariation among species, or– dissimilarity among sample units

• Following this step, we can then examine correlations between environment and ordination axes

• Axes = Gradients• In PCA, these are called “Principal Components”

Page 8: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Data reduction

• Goal: to reduce the dimensionality of community datasets– (i.e., from 100 species down to 2 or 3 main gradients)

This is possible because of redundancy in the data (i.e., species are “correlated”)

n x p n x d

These d dimensions represent the strongest correlation structure in the data

Page 9: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Ordination Diagrams

pH

Sand

Clay

Nitrogen

Elev

OM

LitterPine

Grazing

Axis 1: “Abiotic”

Axi

s 2:

“B

ioti

c”

NMS OrdinationDo not seek patterns as

you would with a regression: axes are orthogonal (uncorrelated)

Know two things:1) What the points

represent (plots or species?)

2) Distance in the diagram is proportional to compositional dissimilarity

Page 10: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

How many axes?

• “How many discrete signals can be detected against a background of noise?”

• Typically we expect 2 or 3 gradients to be sufficient, but if we know that 5 independent environmental gradients are structuring the vegetation (water, light, CO2, nutrients, grazers, etc.), then perhaps 5 axes are justified

Page 11: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Two basic techniques

• Eigenanalysis methods- use information from variance-covariance matrix or correlation matrix (e.g., PCA)– Appropriate for linear models since covariance is a

measure of a linear association

• Distance-based methods- use information from distance matrix (e.g., NMS)– Appropriate for nonlinear models since some distance

measures and ordination techniques can “linearize” nonlinear associations

Page 12: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

A summary table of ordination methods

Page 13: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Ecological Distance Measures

Page 14: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Distance measures

• Distance = Difference = Dissimilarity

• Distance matrix is like a triangular mileage chart on maps (symmetric)

• We are interested in the distances between sample units (plots) in species space

Page 15: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Distance measures

• In univariate species space (one species), the distance between two points is their difference in abundances

• We will examine two kinds of distance measures:– Euclidean distance, and– Bray-Curtis (Sorenson) distance

Page 16: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Domains and Ranges

Distance Domain of x Range of d =f(x)

Euclidean all non-negative

Sorenson x ≥ 0 0<d<1(0<d<100)

Page 17: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Which one works best?

“If species respond noiselessly to environmental gradients, then we seek a perfect linear relationship between distances in species space and distances in environmental space. Any departure from that represents a partial failure of our distance measure.”

McCune p. 51

Page 18: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Easy dataset (low beta diversity)

Figure 6.6

Page 19: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Difficult dataset (high beta diversity)

Intuitive property

Figure 6.7

Page 20: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

NMS is able to linearize the relationship between distance in species space and environmental distance because it is based on ranked distances (stay tuned)

Page 21: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Theoretical basis• Our choice is primarily empirical: we should select

measures that have been shown superior performance

• One important theoretical basis: ED measures distance through uninhabitable, impossibly species rich space.

• In contrast, city-block distances are measured along the edges of species space- exactly where the sample units lie in the dust bunny distribution!

Page 22: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

NonmetricMultidimensional Scaling

(NMS, NMDS, MDS, NMMDS, etc.)

Page 23: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

NMS

• Uses a distance/dissimilarity matrix

• Makes no assumptions regarding linear relationships among variables

• Arranges plots in a space that best approximates the distances in a distance matrix

Page 24: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

From a map to a distance matrix

Calculate distances

Page 25: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

From a distance matrix to a map

NMS

Question: How well do the distances in the ordination match the distances in the distance matrix?

Page 26: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Advantages of NMS

• Avoids the assumptions of linear relations

• The use of ranked distances tends to linearize the relationship between distances in species space and distances in environmental space

• You can use any distance measure

Page 27: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Historical disadvantages of NMS

• Failing to find the best solution (low “stress”) due to local minima

• Slow computation time

These concerns have largely been dealt with given modern computer power

Page 28: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

In a nutshell

• NMS is an iterative search for the best positions of n entities on k dimensions (axes) that minimizes the stress of the k-dimensional configuration

• “Stress” is a measure of departure from monotonicity in the relationship between the original distance matrix and the distances in the ordination diagram

Page 29: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Achieving monotonicity

The closer the points lie to a monotonic line, the better the fit and the lower the stress.

If S* = 0, then relationship is perfectly monotonic

Blue = perfect fit, monotonicRed = high stress, not monotonic

Fig 16.2

Page 30: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Instability

• Instability is calculated as the standard deviation in stress over the preceeding 10 iterations

• Instabilities of 0.0001 are generally preferred

sd = sqrt(var)

Page 31: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Mini Example

Page 32: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Landscape analogy for NMS

Local minimum(strong, regular, geometric patterns emerge)

Global minimum

Page 33: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Reliability of Ordination

• Low stress and stable solutions

• Proportion of variance represented (R2)

• Monte Carlo tests

Page 34: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Variance represented?

• “Ode to an eigenvalue”

• NMS not based on partitioning variance, so there is no direct method

• Calculate R2 for relationship between Euclidean distances in ordination versus Bray-Curtis distances in distance matrix

Axis Increment Cumulative R2

1 0.37 0.372 0.20 0.573 0.15 0.72

Page 35: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Monte Carlo test

• Has the final NMS configuration extracted stronger axes than expected by chance?

• Compare stress obtained using your data with stress obtained from multiple runs of randomized versions of your data (randomly shuffled within columns)

• P-value = (1+n)/(1+N)

n = # of random runs with final stress less than or equal to the observed minimum stress,N = number of randomized runs

P-value = the proportion of randomized runs with stress less than or equal to the observed stress

Page 36: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Monte Carlo tests

Page 37: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Autopilot mode in PC-ORD

PARAMETER Quick and dirty

Medium Slow and thorough

Maximum number of iterations

75 200 400

Instability criterion 0.001 0.0001 0.00001

Starting number of axes 3 4 6

Number of real runs 5 15 40

Number of randomized runs

20 30 50

Table 16.3 in McCune and Grace (2002)

Page 38: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Choosing the best solution

1. Select the appropriate number of dimensions

2. Seek low stress

3. Use a Monte Carlo test

4. Avoid unstable solutions

Page 39: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

1. How many dimensions?

elbow

Figure 16.3

One dimension is generally not used, unless the data is known to be unidimensional.

More than three becomes difficult to interpret.

Find the elbow and inspect Monte Carlo tests.

Page 40: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

2. Seek low stress

• <5 = excellent

• 5-10 = good

• 10-20 = fair, useable

• 20-30 = not great, still useable

• >30 = dangerously close to random

Adapted from Table 16.4, p 132

Page 41: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

A general procedure

• Carefully read pages 135-136

• In your papers, you should report the information that is listed on page 136

• Autopilot mode works really well, but don’t publish ordinations obtained using the Quick and Dirty option! Be sure to publish the parameter settings.

Page 42: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Interpreting NMS axes

• Two main/complementary approaches

– Evaluate how species abundances are correlated with NMS axes

– Evaluate how environmental variables are correlated with NMS axes

Page 43: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Overlays

• Overlays: flexible way to see whether a variable is patterned on an ordination; not limited to linear relationships

Axis 1

Page 44: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Overlays

Page 45: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Species versus Axes

Unimodalpattern

Linearpattern

Resist the temptation to use p-values when examining these relationships! - nonlinear - circular reasoning

Page 46: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Environmental Variables

• Joint plots- diagram of radiating lines, where the angle and length of a line indicate the direction and strength of the relationship

Page 47: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

PerMANOVA

Page 48: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

The analysis of community composition

• Continuous covariates– Use ordination to produce a continuous response

variable (i.e., axis)– Use covariance analysis (multiple regression, SEM) to

explain variance of the axis

• Categorical groups– Ordination is not required (remember, ordination is not

the test)– Permutational MANOVA (PerMANOVA): can use on

any experimental design– MRPP (only one-way or blocked designs)– ANOSIM (up to two factors, in R and PRIMER)

Page 49: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

MANOVA

• Multivariate Analysis of Variance

• Traditional parametric method

• Assumes linear relations among variables, multivariate normality, equal variances and covariances

• Not appropriate for community data

Page 50: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

PerMANOVA

• Permutational MANOVA• Straightforward extension of ANOVA• Decomposes variance in the distance matrix• No distributional assumptions• Can still be sensitive to heterogeneous

variances (dispersion) among groups

• Anderson, M. 2001. Austral Ecology

Page 51: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

ANOVA

• Compare variability within groups versus variability among different groups

Ele

vati

on (

m)

2200

2400

2600

2800

Page 52: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

)()( ...... iijiij yyyyyy

Decomposing an observation (yij)

2.

2...

2.. )()()( iijiij yyyynyy

Variability of observations about the grand mean

Variability of the ith trt mean about the grand mean

= +Variability of observations within each treatment

SStotal = SSamong + SSwithinSStotal = SStreatment + SSerror

PROBLEM: WE CAN’T CALCULATE MEANS WITH SEMIMETRIC BRAY-CURTIS

Page 53: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

ANOVA

• Compare variability within groups versus variability among different groups

A simple 2-D case

Unknowable with semi-metric Bray-Curtis distances

Page 54: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

The key link• The key to this method is that “the sum of squared

distances between points and their centroid is equal to (and can be calculated directly from) the sum of squared interpoint distances divided by the number of points.”

Page 55: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Why is this important?

• Couldn’t use semimetric Bray-Curtis distance in ANOVA context because central locations cannot be found

• But we don’t have to calculate the central locations anymore with this finding

• The analysis can proceed by using distances in any distance matrix

Page 56: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

One-way perMANOVA with two groups

Page 57: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

P = (No. of Fπ >= F)

(Total no. of Fπ)

Fπ obtained with randomly shuffled data

Use at least 999 random permutations

I tend to use 9999 permutations

Permuted p-values

Page 58: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

The link with ANOVA

• This F statistic is equal to Fisher’s original F-ratio in the case of one variable and when Euclidean distances are used

Page 59: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Example: grazing effects (one-way)

F = 36.6

Page 60: Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin

Example: two-way factorial