Analyzing or Explaining Beta Diversity
-
Upload
francisco-candido-cardoso-barreto -
Category
Documents
-
view
224 -
download
0
Transcript of Analyzing or Explaining Beta Diversity
-
8/8/2019 Analyzing or Explaining Beta Diversity
1/12
CONCEPTS & SYNTHESISEMPHASIZING NEW IDEAS TO STIMULATE RESEARCH IN ECOLOGY
Ecology, 87(11), 2006, pp. 26972708 2006 by the Ecological Society of America
ANALYZING OR EXPLAINING BETA DIVERSITY? UNDERSTANDING THETARGETS OF DIFFERENT METHODS OF ANALYSIS
HANNA TUOMISTO1 AND KALLE RUOKOLAINEN
Department of Biology, University of Turku, FI-20014 Turku, Finland
Abstract. It has been actively discussed recently what statistical methods are appropriatewhen one is interested in testing hypotheses about the origin of beta diversity, especiallywhether one should use the raw-data approach (e.g., canonical analysis such as RDA andCCA) or the distance approach (e.g., Mantel test and multiple regression on distancematrices). Most of the confusion seems to stem from uncertainty as to what is the responsevariable in the different approaches. Here our aim is to clarify this issue. We also show that,although both the raw-data approach and the distance approach can often be used to addressthe same ecological hypothesis, they target fundamentally different predictions of thosehypotheses. As the two approaches shed light on different aspects of the ecological hypotheses,they should be viewed as complementary rather than alternative ways of analyzing data.However, in some cases only one of the approaches may be appropriate. We argue that S. P.Hubbells neutral theory can only be tested using the distance approach, because its testablepredictions are stated in terms of distances, not in terms of raw data. In all cases, the decisionon which method is chosen must be based on which addresses the question at hand, it cannotbe based on which provides the highest proportion of explained variance in simulation studies.
Key words: beta diversity; canonical analysis; community composition; ecological hypotheses; Manteltest; multiple regression; multiple regression on distance matrices; spatial variation; species abundances;variation partitioning.
INTRODUCTION
The question of what factors affect community
composition and its variation (beta diversity) has been
of considerable interest to biologists. Although the
concept of beta diversity dates at least to Whittaker
(1960, 1972), interest in it has increased dramatically
since the publication of Hubbells book on the neutral
theory of biodiversity (Hubbell 2001). The neutral
theory challenged the widely held view that environ-
mental factors and ecological-niche differences between
species are the most important factors in determiningwhere species occur and at what abundances. Instead,
the neutral theory proposes that species abundances
fluctuate in a random walk due to random mortality and
stochastic but spatially restricted dispersal.
Multivariate-analysis methods that allow studying
questions related to beta diversity include canonical
analysis (or constrained ordination; e.g., RDA [redun-
dancy analysis] and CCA [canonical correspondence
analysis]) and the Mantel test and its derivatives.
Canonical analysis can be called a raw-data approach,
because there the input data are in the form of raw-data
tables, such as estimates of species abundances at study
sites and measurements of environmental variables at
the same study sites. The Mantel test can be called a
distance approach, because there the input data are in
the form of distance matrices that are based on the raw
data. Both approaches have been extensively used in the
ecological literature, with hundreds of ecological papers
mentioning the Mantel test, redundancy analysis, or
canonical correspondence analysis (easily verified by a
simple search in, e.g., ISI Web of Science or BIOSIS
Previews).
Variation partitioning provides the statistical means
to quantify the relative effects of different groups of
explanatory variables on the response variable of
interest. As proposed by Borcard et al. (1992), variation
partitioning can be based on RDA or CCA to partition
the variation in a species 3 sites raw-data table to
fractions uniquely or jointly explained by variation in
environmental and spatial variables. This kind of
variation partitioning represents the raw-data approach.
Manuscript received 4 January 2006; revised 31 March 2006;accepted 18 April 2006. Corresponding Editor: N. C. Kenkel.
1 E-mail: [email protected]
2697
-
8/8/2019 Analyzing or Explaining Beta Diversity
2/12
More recently, variation partitioning has been extended
to the distance approach by using multiple regression on
distance matrices (Duivenvoorden et al. 2002, Tuomisto
et al. 2003).
The raw-data and distance approaches have tradi-
tionally been used to study similar ecological questions,
and indeed they have been considered alternative
methods for the same purpose (e.g., Legendre 1993).However, recently a discussion has arisen as to which of
the approaches is more appropriate when one is
interested in beta diversity. This question, called the
dilemma between the raw-data approach and the
distance approach, was the main topic of a recent paper
by Legendre et al. (2005).
The main conclusion of Legendre et al. (2005) was
that the proper statistical procedure for testing hypoth-
eses about the origin and maintenance of variation in
community composition among sites is canonical
variation partitioning, and that partitioning on distance
matrices should not be used to study the variation in
community composition among sites. We disagree withthis conclusion, and find that although their paper made
several valuable points that clarified differences between
the analysis approaches, it also confused some impor-
tant concepts. This led to inconsistencies and errors in
their recommendations. We especially disagree with
their suggestion that the raw-data approach is preferable
over the distance approach for the testing of Hubbells
neutral theory.
Most of the confusion in the dilemma between the
raw-data approach and the distance approach seems to
stem from uncertainty as to what is the response variable
in the different analyses. Here we attempt to clarify the
situation by evaluating which ecological and statistical
questions each analysis approach actually targets, and
what should be taken into account when selecting an
analysis method for a particular purpose.
LEVELS OF ABSTRACTION
The basic concepts
We start by specifying at which levels one can ask
ecological questions that are related to the distribution
of species along environmental and spatial gradients. We
distinguish three levels of abstraction (Fig. 1). The first,
basic level is formed by the raw-data tables, which
consist of the observations of the abundances of one ormore species (A1 to Ap) in more than one study site (s1 to
sn), in which the values of one or more environmental
variables and spatial coordinates (x1 to xm) have also
been measured. The second level of abstraction is
derived from the first level and consists of the variation
in the raw-data tables. The third level of abstraction is
derived from the second level and consists of the
variation in the variation in the raw-data tables; i.e.,
(1) raw data ! (2) variation in the raw data ! (3)
variation in the variation in the raw data.
In the case of the species-data table, the sequence can
equally well be written as follows: (1) community
composition! (2) variation in community composition
! (3) variation in the variation in community compo-
sition. Here the term community composition is used
so that it encompasses both species composition and
species abundances. Variation in community composi-
tion across sites is beta diversity. Overall beta
diversity in the data set can be measured with the sum
of squares (SS) of the raw data. The SS or other beta-diversity indices (which measure the difference in species
composition and species abundances between sites) can
also be computed for all different site pairs and used to
construct a dissimilarity matrix; the mean of the cell
values in this matrix is also a measure of overall beta
diversity (Whittaker 1972, ter Braak 1983, Vellend 2001,
Legendre et al. 2005; Fig. 1). Therefore, the sequence of
the levels of abstraction can also be written as follows:
(1) community composition ! (2) beta diversity ! (3)
variation in beta diversity.
This sequence is conceptually analogous to the
relationship between position, velocity and acceleration
in physics: (1) position (geographical coordinates)! (2)
velocity (variation in position over time) ! (3) accel-
eration (variation in velocity over time, i.e., variation in
variation in position over time).
Legendre et al. (2005) also developed their arguments
from a three-level framework, but their levels were (1)
variation in species identity within communities (alpha
diversity), (2) variation in community composition
among sites (beta diversity), and (3) variation in beta
diversity among groups of sites. However, we think that
this does not provide an optimal framework for
clarifying the concepts, because alpha diversity is not a
logical starting point for deriving beta diversity in the
same way as community composition is. There is no
simple relationship between alpha diversity and either
community composition or beta diversity. If two sites
have exactly the same number of species (in exactly the
same proportions of abundance), their alpha diversities
are identical, but their community compositions can be
anything from identical to completely different, and beta
diversity can hence be anything between 0% (if all
species are shared between the sites in similar abundan-
ces) and 100% (if no species are shared). Therefore, our
level of abstraction 1 is different from that of Legendre
et al. (2005), but the levels of abstraction 2 and 3 are the
same.
Application to ecological questions
Now, let us turn to the kind of ecological questions we
may be interested in studying.
We may want to analyze the abundance of a single
species A, in which case we pay attention to just this
species in the community-composition raw-data table.
Then we are concerned with the questions: Why do some
sites have a higher abundance of species A than others?
i.e., Why is there variation in the abundance of species
A? Can the variation in the abundance of species A be
explained by variation in environmental characteristics
HANNA TUOMISTO AND KALLE RUOKOLAINEN2698 Ecology, Vol. 87, No. 11
-
8/8/2019 Analyzing or Explaining Beta Diversity
3/12
or geographical location of the sites? If so, then we can
predict the abundance of species A at a site if we know
the values of environmental variables at the site and its
geographical position. The response variable y is theabundance of species A at study sites s1 to sn, and the
independent variables x1 to xm are the environmental
variables and geographical coordinates measured at the
sites. Standard methods to test whether the variables are
statistically dependent on each other include correlation
analysis (where no distinction is made between response
and independent variables) and multiple regression
(where the variation in the independent variables is used
to explain the variation in the response variable in a
causal framework). The regression models that are fitted
can be either linear or more complex (e.g., Huisman et
al. 1993, Legendre and Legendre 1998, Oksanen and
Minchin 2002, Karadzic et al. 2003). This is what
Legendre et al. (2005) called the raw-data approach,
which is suitable to answering what they called level-2questions. In terms of our levels of abstraction, in this
approach we analyze data from the level-of-abstraction
1 to test whether we can explain its variation, which is
expressed at level-of-abstraction 2 (Fig. 2).
In the present case we may rather want to analyze the
abundances of all observed species at a time, i.e.,
community composition. Then we are concerned with
the questions: Why do some sites have a different
community composition than others? i.e., Why is there
variation in community composition? Can the variation
in the abundances of species A1 to Ap, which form the
FIG. 1. Three levels of abstraction in studies concerning community composition and beta diversity. The overall amount of betadiversity in the raw-data table (whose cell values are the abundances ofp species in n sites) can be summarized with a single numberat the level of abstraction 2. This measure can be the sum of squares ( SS) of the raw-data table, or the mean of the cell values in adistance matrix. The cell values in the distance matrix can be pairwise SS values or values of any other measure that quantifies thedissimilarity in species composition and abundances (community composition) between the two sites in each site pair. The overallamount of variation in beta diversity in the distance matrix can be summarized with a single number at the level-of-abstraction 3.
November 2006 2699ANALYZING OR EXPLAINING BETA DIVERSITY?
-
8/8/2019 Analyzing or Explaining Beta Diversity
4/12
community, be explained by variation in environmental
characteristics or geographical location of a site? If so,
then we can predict the abundances of species A1 to Ap
(community composition) at a site if we know the valuesof environmental variables at the site and its geo-
graphical position. The ecological question is similar to
the previous one, but now there are several response
variables instead of just one. A standard method to test
whether the variation in the independent variables can
explain the variation in the response variables is
canonical analysis, i.e., multiple regression as imple-
mented in RDA (redundancy analysis; used when the
expected response model is linear) and CCA (canonical
correspondence analysis; used when the expected re-
sponse model is unimodal; Legendre and Legendre
1998). Just as in the previous case, this is a raw-data
approach (Fig. 2).
We may also want to analyze variation in community
composition, i.e., beta diversity. Although beta diversity
can be computed for groups of sites that consist of any
number of sites !2, the analytical methods are best
developed for the special case where each group of sites
consists of exactly two sites. Using a fixed number of
sites in all groups simplifies the analyses, and using the
smallest possible number of sites per group maximizes
the power of the statistical tests, because each group of
sites is one data point in the analyses. For simplicity, we
therefore limit our present discussion to this situation.
Then we are concerned with the questions: Why are
some site pairs more different in community composi-
tion than others? i.e., Why is there variation in beta
diversity? Can the variation in the difference incommunity composition between two sites be explained
by variation in difference in environmental character-
istics or geographical location? i.e., Can variation in beta
diversity be explained by variation in environmental
difference or geographical distance? If so, then we can
predict the degree of beta diversity between two sites if
we know how different their environments are and how
far apart they are situated geographically. The response
variable Y is a distance matrix consisting of the n(n 1)/
2 pairwise differences in community composition (i.e.,
floristic or faunistic distances) between all possible pairs
of the study sites s1 to sn. This distance matrix can be
based on any of the various resemblance measures that
have been designed for species data, e.g., Jaccard, Bray-
Curtis, and Hellinger indices, or any other measure that
quantifies variation in community composition (such as
SS). The independent variables X1 to Xm are matrices of
geographical distances and the differences between sites
in environmental variables at the sites. Standard
methods to test whether the variables are statistically
dependent on each other include the Mantel test (to test
for linear or monotonic correlation between two
distance matrices), multiple regression on distance
matrices (which fits a linear regression), and generalized
FIG. 2. An analysis of level-1 data (the raw-data matrix) focuses on modeling what factors explain level-2 data (variation in theraw-data matrix). This is a level-2 question, which can be addressed using the raw-data approach. An analysis of level-2 data (thedistance matrix) focuses on modeling what factors explain level-3 data (variation in the distance matrix). This is a level-3 question,which can be addressed using the distance approach.
HANNA TUOMISTO AND KALLE RUOKOLAINEN2700 Ecology, Vol. 87, No. 11
-
8/8/2019 Analyzing or Explaining Beta Diversity
5/12
distance modeling (which can fit more complex regres-
sion models; Legendre and Legendre 1998, Ferrier et al.
2002). This is what Legendre et al. (2005) called the
distance approach, which is suitable to answering
what they called level-3 questions. In terms of our levels
of abstraction, in this approach we analyze data from
the level of abstraction 2 to test whether we can explain
its variation, which is expressed at level of abstraction 3(Fig. 2).
The distance approach includes also spatial autocor-
relation analysis using correlograms or variograms,
where an autocorrelation coefficient (which can be
interpreted as a similarity measure) or semi-variance
(which can be interpreted as a dissimilarity measure) is
plotted against inter-site geographical distance (Legen-
dre and Legendre 1998).
THE DIFFERENCE BETWEEN ANALYZING
AND EXPLAINING BETA DIVERSITY
In the previous section, we consistently used the
convention that analyzing X refers to an analysis
where X is the response variable, and that such an
analysis explains the variation in X with the variation
in the independent (or explanatory) variables. This is
how these words are commonly used in a statistical
context; e.g., a regression line resulting from a regression
analysis can be said to explain a certain proportion of
the variance in the response variable (e.g., Legendre and
Legendre 1998).
Under this convention, explaining beta diversity
and analyzing beta diversity are clearly different
things. Explaining beta diversity is a level-2 question:
the response variable is community composition, and
what gets explained is variation in community compo-
sition (i.e., beta diversity). In contrast, analyzing beta
diversity is a level-3 question: the response variable is
beta diversity (i.e., variation in community composi-
tion), and what gets explained is variation in beta
diversity (Fig. 2). Consequently, if one aims at explain-
ing beta diversity then using the raw-data approach is
indicated, whereas if one aims at analyzing beta
diversity then using the distance approach is indicated.
These differences may be clarified by our physics
example. Analyzing community composition at different
points in space is like analyzing the position of an object
(say, a kite) at different points in time. Analyzing betadiversity is like analyzing the velocity of the kite (Fig. 2).
Say we are interested in understanding the causes of the
velocity of a kite. If we run an analysis using the raw-
data approach (where position is the response variable),
we learn how much of the kites observed overall
velocity is due to movement in the updown, leftright,
and forwardbackward directions. But this does not tell
us why the kite had this particular overall velocity rather
than some other velocity; to answer this question, we
need to run an analysis using the distance approach
(where velocity is the response variable). Say we are
interested in understanding the causes of beta diversity
in a region. If we run an analysis using the raw-data
approach (where community composition is the re-
sponse variable), we learn how much of the observed
overall beta diversity in the region can be explained by
environmental factors and spatial coordinates. But this
does not tell us why the region had this particular overall
beta diversity rather than some other beta diversity; to
answer this question, we need to run an analysis usingthe distance approach (where beta diversity is the
response variable).
Against this background, it can be observed that
Legendre et al. (2005) accused several studies of having
misused the Mantel test, when in fact its use had been
entirely appropriate. Legendre et al. (2005:438439)
wrote: Here are examples from the recent literature in
which authors used a Mantel approach... although they
declared that the purpose of their study was the analysis
of the variation in community composition among
sites. Following the above convention, analysis of
the variation in community composition among sites
(i.e., analysis of beta diversity) is a level-3 question.
Since level-3 questions need to be addressed using the
distance approach, the Mantel test is a justified choice.
We have seen many papers that are inconsistent or non-
explicit about whether they are addressing level-2
questions or level-3 questions (including our own earlier
work), and we hereby urge ecologists to become more
aware of the levels of abstraction in ecological questions.
Failure to do so easily leads to misinterpretation of the
results.
THE TARGETS OF THE RAW-DATA
AND DISTANCE APPROACHES
The questions of interest in the raw-data approach
concern the relationships among the raw-data variables
that were measured in the field (level of abstraction 1),
and the analyses are based on quantifying to what extent
the variation in one group of raw data variables can be
explained by the variation in another group of raw-data
variables (level-of-abstraction 2; hence the term level-2
question). The questions of interest in the distance
approach concern the relationships among distances
based on the raw data (level-of-abstraction 2), and the
analyses are based on quantifying to what extent the
variation in one group of distances can be explained by
the variation in another group of distances (level-of-abstraction 3; hence the term level-3 question).
One crucial difference between the two approaches is
that when the focus is on distances, neither the species
identities, the actual geographical locations of the study
sites, nor the actual values of the environmental
variables are relevant; we are interested only in how
big the differences in them are. In contrast, the raw-data
approach explicitly models the abundances of specific
species as a function of specific spatial coordinates and
specific values of environmental variables, which leads
to important differences in how the results of the two
approaches should be interpreted.
November 2006 2701ANALYZING OR EXPLAINING BETA DIVERSITY?
-
8/8/2019 Analyzing or Explaining Beta Diversity
6/12
Generally, when one expects a monotonic relationship
between the raw-data variables, one can also expect a
monotonic relationship between the distances based on
these raw data. For example, if the abundance of a given
species increases or decreases monotonically along a
given environmental gradient (e.g., soil nitrogen con-
tent) or a spatial gradient (e.g., longitude), then the
abundance of the species is more similar in sites that areclose to each other along that gradient than in sites that
are far apart. No matter whether the abundance of the
species increases or decreases along the gradient (e.g.,
whether the correlation between species abundance and
soil nitrogen content is positive or negative), the
expectation in terms of distances is the same: environ-
mentally more similar (or geographically more prox-
imate) sites have more similar species abundances than
more dissimilar/distant sites.
However, if the relationship between the raw-data
variables is not monotonic, then a monotonic relation-
ship cannot be expected in the distances either. For
example, if species abundance has a unimodal response
to an environmental gradient (abundance is first zero,
then increases to a maximum and decreases back to
zero), difference in abundance will be very small both
when sites are environmentally very similar and when
they are very different. But even if the environmental
gradient is so long that it exceeds the range of a single
species, community composition as a whole may still
have a monotonic relationship with the environmental
gradient; this happens if different species replace each
other along the gradient.
Data at higher levels of abstraction are derived from
the data at the lower levels of abstraction. Therefore, a
given behavior of the level-1 data predicts a given
behavior of the level-2 data. However, the opposite is
not true. When distances are computed, the information
on the identities and abundances of individual species is
lost, as is the information on the values of the
environmental variables and the geographical locations
of the study sites. This information cannot be recovered
from the distance data, because the same distance matrix
can be derived from an unlimited number of different
raw-data tables. For example, adding any constant to all
values in a raw-data table makes no difference to the
Euclidean distances derived from the table.
Consequently, a process that is defined in terms ofhow level-2 data behave cannot be used to predict how
level-1 data should behave. If we know that the velocity
of a kite decreases when it flies against the wind, we can
use this information to model the velocity of the kite at
any point in time on the basis of the strength of the wind
and the velocity of the kite at some other point in time.
But this information cannot be used to model the
position of the kite, because when velocities were
computed, information on position was lost. Because
the same velocity can be obtained from an unlimited
number of starting positions, absolute position (level-1
data) cannot be recovered from velocity (level-2 data).
Similarly, spatial autocorrelation is a phenomenon that
is independent of absolute position. Spatial autocorrela-
tion causes nearby sites to be more similar than faraway
sites, at least over some distance interval, irrespective of
the actual spatial locations of the sites (Legendre and
Legendre 1998). Information on the strength of spatial
autocorrelation and the geographical distance between
two sites makes it possible to predict how different thetwo sites are in community composition. However, since
information on both absolute position of the sites and the
identities and abundances of individual species was lost
when the distances were computed, this information
cannot be used to model which species should be present
in the community at any particular site, or how
community composition should change towards any
particular direction. We return to this in the next section.
ECOLOGICAL VS. STATISTICAL HYPOTHESES
Three ecological hypotheses
Three hypotheses on the organization of communitycomposition have been actively discussed recently, and
testing them was a central issue in the paper by Legendre
et al. (2005). In brief, the hypotheses are as follows: (1)
Species composition is uniform and the same dominant
species are found over large areas (e.g., Pitman et al.
2001); (2) Species composition fluctuates in a random,
autocorrelated way (e.g., Hubbell 2001); and (3) Species
composition is related to environmental conditions
(numerous authors).
Before moving on, it is important to notice that species
composition is not an entity that has ecological behavior
of its own, but it is a result of how individuals belonging
to different species behave. Therefore, these threehypotheses should be seen as logical consequences of
the following more fundamental ecological hypotheses:
(A) Individuals of all species are able to grow equally
well at all sites and in all ecological conditions present in
the area of interest. Species differ in competitive ability,
and the best competitors become dominant at all sites,
whereas less good competitors remain rare at all sites.
(B) Individuals of all species are able to grow equally
well at all sites and in all ecological conditions present in
the area of interest. All species are competitively equal,
and their abundances fluctuate in a random walk due to
random mortality and random but spatially autocorre-
lated dispersal.(C) Individuals of all species are not able to grow
equally well at all sites and in all ecological conditions
present in the area of interest. Species abundances vary
between sites in response to how suitable the environ-
mental conditions are for each species. All species are
not competitively equal, and competitive ranking may
change in response to changes in environmental
conditions. Species may also be excluded from some
sites because they are physiologically not able to grow in
the environmental conditions present.
Each of the hypotheses AC describes an ecologists
view on how species behave, but before they can be
HANNA TUOMISTO AND KALLE RUOKOLAINEN2702 Ecology, Vol. 87, No. 11
-
8/8/2019 Analyzing or Explaining Beta Diversity
7/12
formally tested, it is necessary to derive statistical
hypotheses (null and alternative hypotheses) from them.
As outlined in the previous sections, such statistical
hypotheses can be formulated either for each species
separately or for all species at the same time, and either
using the raw-data approach or the distance approach.
A part of the confusion in the paper by Legendre et al.
(2005) seems to stem from a failure to make a distinction
between ecological and statistical hypotheses. When
testing the ecological hypotheses AC, one first needs to
think about what ecological predictions they imply, and
how these predictions can be translated into testable
statistical hypotheses. Only thereafter can one decide
which statistical method is appropriate. In the following,
we briefly describe such statistical hypotheses about
community composition and beta diversity that can be
derived from the ecological hypotheses AC.
Testing ecological hypothesis A
The ecological hypothesis A means that the samecompetitively superior species are always most abun-
dant, so community composition is uniform over the
landscape and all sites have the same species in the same
(species-specific) abundances. The expected abundance
of any given species at any given site is equal to the mean
abundance of that species over all study sites, and any
deviations from the mean are due to sampling error.
From the ecological hypothesis A it follows that the
abundance of any given species should not vary much
over the study sites, and that community composition
should not vary much either; any variation found at
level-of-abstraction 2 should be within the limits of
random variation and not explainable by variation inenvironmental variables or spatial location. This pre-
diction is testable with statistical methods that use the
raw-data approach. An example of a statistical hypoth-
esis that can be derived from this prediction is H0: when
community composition is regressed on soil nitrogen
content in CCA, the regression coefficient equals zero.
From the ecological hypothesis A it also follows that
beta diversity is small and similar over different pairs of
study sites; any variation found at level-of-abstraction 3
should be random and not explainable by variation in
environmental differences or geographical distance. This
prediction is testable with statistical methods that use
the distance approach. An example of a statistical
hypothesis that can be derived from this prediction is
H0: the Mantel correlation coefficient between floristic
distances (as measured with the Bray-Curtis index) and
differences in soil nitrogen content (as measured with the
Euclidean distance) is equal to or smaller than zero.
It should be noted here that Pitman et al. (2001) may
not have meant the hypothesis to be interpreted as
strictly as this, but if competitive ability is allowed to
vary in response to environmental conditions, then the
ecological hypothesis A becomes indistinguishable from
ecological hypothesis C.
Testing ecological hypothesis B
The ecological hypothesis B means that any species
can become abundant or rare at any site by chance,
because all species are competitively equal. Species
composition at any one site is not constant but fluctuates
randomly, so there is no way to predict for a given point
in time which species occur at which sites, and at what
abundances they occur at those sites where they do
occur. However, the fluctuations are spatially autocor-
related due to spatially limited dispersal. Sites may lose
or gain different species by chance, but the closer two
sites are to each other, the stronger the homogenizing
effect of dispersal between them (Hubbell 2001, Condit
et al. 2002).
Community composition is heterogeneous over the
landscape at all spatial scales as a result of the
cumulative effects of spatially autocorrelated random
walk in species abundances. This spatial structure is
entirely due to autocorrelation, and spatial dependence
on underlying environmental variables is not present.No directional forces are operating, so space is assumed
isotropic, i.e., the change in community composition per
unit geographical distance is the same to all directions.
From ecological hypothesis B it follows that two
nearby sites should share more species in more similar
abundances than two sites further apart, but differences
in environment are irrelevant. Consequently, variation
in beta diversity at level-of-abstraction 3 should be
explainable by variation in geographical but not
environmental distances. This prediction is testable with
the distance approach.
According to ecological hypothesis B, species abun-
dances fluctuate randomly and are therefore inherentlyunpredictable. As we saw above (see The targets of the
raw-data and distance approaches), the presence of
spatial autocorrelation does not help in predicting how
species abundances (level-1 data) should behave. Even
though spatial autocorrelation may give rise to spatial
structure in community composition, such structure is
random by definition, so it is not possible to predict a
priori where specific species should attain high abun-
dances, or where a specific community composition
should occur. An existing spatial pattern in community
composition can be described a posteriori, especially by
such powerful methods as PCNM (principal coordinates
of neighbor matrices) analysis (Borcard and Legendre2002). However, doing so does not test the neutral
model, because the neutral model did not predict that
this was the particular spatial pattern that was expected
to emerge in this particular case. Any specific spatial
pattern in community composition is just as much in
accordance with the neutral model as any other, as long
as the degree of spatial autocorrelation is similar. And
since spatial autocorrelation is defined in terms of
distances, its presence and strength can only be tested
using the distance approach.
Consequently, from ecological hypothesis B follow no
testable predictions about the expected behavior of the
November 2006 2703ANALYZING OR EXPLAINING BETA DIVERSITY?
-
8/8/2019 Analyzing or Explaining Beta Diversity
8/12
variation in the raw data at level-of-abstraction 2. The
raw-data approach is concerned with quantifying the
effect of location on the abundances of specific species,
rather than quantifying the effect of distance between
locations on beta diversity. Therefore, the raw-data
approach fails to address the neutral model in a relevant
way, and is unable either to falsify the neutral hypothesis
or to quantify its relative contribution to the observedspatial pattern.
Testing ecological hypothesis C
The ecological hypothesis C means that species
abundances vary between sites in response to variation
in environmental conditions. Different species reach
high abundances in different parts of the environmental
gradient, and species may be restricted to just a portion
of the gradient. Accordingly, a gradual turnover in
community composition is observed along the environ-
mental gradient.
Community composition is heterogeneous over the
landscape, and its spatial structure is determined by
spatial dependence on underlying environmental varia-
bles, which themselves may be spatially structured and/
or autocorrelated. The spatial pattern in community
composition may be very complex, especially if different
environmental variables show different spatial patterns.
From ecological hypothesis C it follows that the
variation in species abundances and community com-
position at level-of-abstraction 2 should be explainable
by variation in environmental variables. This prediction
is testable with the raw-data approach.
From ecological hypothesis C it also follows that two
sites with similar environmental conditions (i.e., with
small environmental distance) should have more similar
community compositions (i.e., smaller degree of beta
diversity) than two sites with more different environ-
ments; variation at level-of-abstraction 3 should be
explainable by variation in differences in environmental
conditions. This prediction is testable with the distance
approach.
Summary of ecological-hypothesis testing
It is important to notice that the level-2 statistical
hypotheses and the level-3 statistical hypotheses are
independent from each other in the sense that one is not
derived from the other. Instead, both are derived directlyfrom predictions of an ecological hypothesis. A level-2
prediction is stated in terms of the raw data, and leads to
a statistical hypothesis that can be tested with the raw-
data approach (where community composition is the
response variable). In contrast, a level-3 prediction is
stated in terms of distances, and leads to a statistical
hypothesis that can be tested with the distance approach
(where beta diversity is the response variable).
The above considerations lead to the conclusion that
all three ecological hypotheses can be tested with the
distance approach, but only hypotheses A and C can be
tested with the raw-data approach.
When the three ecological hypotheses are tested using
the distance approach, hypothesis A is indicated when
neither environmental nor geographical distances pro-
vide a significant regression model of beta diversity.
Hypothesis B is indicated when geographical distances
but not environmental distances provide a significant
model, and hypothesis C is indicated when environ-
mental distances do provide a significant model. Differ-entiating between hypotheses B and C is difficult,
because this necessitates differentiating between spatial
autocorrelation and spatial dependence. This is espe-
cially difficult when the measured environmental varia-
bles are autocorrelated, as they often are. And even if
the analyses indicate that some of the variation in beta
diversity be entirely due to variation in geographical
distances, and hence may be an expression of spatial
autocorrelation, the possibility exists that there actually
is spatial dependence on an unmeasured, spatially
autocorrelated environmental variable.
THE TARGET OF VARIATION PARTITIONING
Variation partitioning aims to quantify the relative
effects of different groups of explanatory variables on
the response variable(s) of interest. Variation partition-
ing in the context of community ecology was originally
based on redundancy analysis (RDA) or canonical
correspondence analysis (CCA), and its aim was to
partition the variation in the raw community-composi-
tion data table to fractions explainable by variation in
environmental variables and spatial location (Borcard et
al. 1992). This form of variation partitioning is a raw-
data approach (Figs. 1 and 2). A more recent extension
of variation partitioning is based on multiple regression
on distance matrices, and its aim is to partition the
variation in floristic distances, i.e., the variation in beta
diversity, to fractions explainable by variation in
geographical distances and environmental differences
(Duivenvoorden et al. 2002, Tuomisto et al. 2003). This
form of variation partitioning is a distance approach
(Figs. 1 and 2).
Variation partitioning has become a popular method
to address questions related to the ecological hypotheses
A, B, and C mentioned above (see Ecological vs.
statistical hypotheses). Examples of studies that have
used the RDA/CCA-based variation-partitioning meth-
od are Duivenvoorden (1995), Gilbert and Lechowitch(2004), Svenning et al. (2004), and Cottenie (2005).
Examples of studies that have used the distance-based
variation partitioning are Duivenvoorden et al. (2002),
Tuomisto et al. (2003), Vormisto et al. (2004), and Jones
et al. (2006).
Legendre et al. (2005) discuss the difference between
the two approaches at length, and show that the
variance of the dissimilarity matrix is not the same as
the variance of the raw-data table, and that there is no
simple relationship between the two variances. To this
extent, we agree with them. If the raw-data table (level-
of-abstraction 1) has nonzero variance (level-of-abstrac-
HANNA TUOMISTO AND KALLE RUOKOLAINEN2704 Ecology, Vol. 87, No. 11
-
8/8/2019 Analyzing or Explaining Beta Diversity
9/12
tion 2), this indicates that there is nonzero beta diversity
(analogous with nonzero velocity of a kite). Even if this
is the case, the variance at level-of-abstraction 3 may still
be zero; this happens when all s ites within the
community are equally different from each other (no
variation in beta diversity; analogous with a kite moving
with a constant velocity). Consequently, it is true by
definition that a decomposition of the variance of thedissimilarity matrix does not decompose the variance of
the raw-data matrix.
Legendre et al. (2005:441, 447) conclude from this that
a decomposition of the variance of the dissimilarity
matrix among two or several explanatory tables
represented by dissimilarity matrices cannot help us
understand the causes of the variation of community
composition (beta diversity) across an area and
partitioning on distance matrices should not be used
to study the variation in community composition among
sites. Here we disagree with them. The words study
and understand do not have established statistical
meanings, so they can be interpreted to imply either
explaining or analyzing. Legendre et al. (2005)
argue that only the explaining beta diversity inter-
pretation is correct; we argue that also analyzing beta
diversity is appropriate (see The difference between
analyzing and explaining beta diversity, above).
The RDA/CCA-based variation partitioning is ap-
propriate when one is interested in explaining beta
diversity (level-2 question). This method models species
abundances, and can tell us that in a particular case,
variation in geographical location alone explained x%,
variation in the measured environmental variables alone
explained y%, and variation in the two jointly explained
z% of the overall beta diversity (i.e., of the variation in
species abundances). The distance-based variation par-
titioning is appropriate when one is interested in
analyzing beta diversity (level-3 question). This method
models beta diversity, and can tell us that in a particular
case, variation in geographical distances alone explained
a%, variation in the differences in the measured
environmental variables explained b%, and variation in
the two jointly explained c% of the variation in beta
diversity.
MOVING BETWEEN RAW DATA AND DISTANCES
The raw data and distance worlds are intimatelyinterlinked, and it is quite easy to move between them.
Any raw-data table can be used to compute a distance
matrix, which in turn can be used in principal
coordinates analysis (PCoA) to reconstruct raw-data-
like principal coordinates (Fig. 3). The reconstructed
raw data consist of the coordinates of the sites in the
ordination space rather than actual spatial coordinates
or species abundances, but the information on interplot
relationships remains the same. If Euclidean distances
are computed from the principal coordinates, a distance
matrix identical to the original is obtained. However, if
the originally chosen distance measure is not metric,
only the Euclidean part of the distances is reconstructed
(see Legendre and Legendre [1998] or Legendre and
Anderson [1999] for details).
In spite of this close connection between the raw-data
table and the distance matrix, it is important to know
which of them is used in data analysis.
If the final analysis is based on raw data (recon-
structed or original; level-1 data), it addresses a level-2question, even if some earlier phase of the process
involved also distance matrices (level-2 data). Remem-
bering this is relevant, for example, when one considers
the use of distance-based redundancy analysis (db-RDA;
Legendre and Anderson 1999). In db-RDA, a distance
matrix is first computed from the species abundance
data (using any appropriate dissimilarity measure), and
then PCoA is used to obtain principal coordinates (as in
Fig. 3), which are finally used in RDA. In spite of its
name, distance-based RDA is not a distance approach
sensu Legendre et al. (2005). Like normal RDA, db-
RDA is a raw-data approach; it analyzes data taken
from level-of-abstraction 1 in order to explain its
variation at level-of-abstraction 2. The very purpose of
PCoA in the process is to reconstruct a raw data table
from a distance matrix, because the latter is a level-2
data set and hence cannot be used in RDA.
Another raw-data approach that is called distance
based because it involves a distance matrix in an
intermediate phase is nonlinear canonical analysis of
principal coordinates (NCAP; Millar et al. 2005).
Whereas db-RDA is restricted to linear-regression
models, NCAP can incorporate more complex models,
but the two methods are similar in that both analyze
level-1 data.
CHOOSING AMONG STATISTICAL METHODS
Since ecological hypotheses may yield more than one
testable prediction, more than one kind of statistical
approach is often possible when testing them. This is the
case with ecological hypotheses A (uniformity) and C
(environmental control): both can be tested either with
the raw-data approach or with the distance approach.
So even if one confuses the two analysis approaches, the
results one gets are still relevant to the ecological
hypothesis of interest.
However, the same is not true of ecological hypothesis
B (the neutral theory): as we saw in Testing ecologicalhypothesis B (above), its testable predictions are stated in
terms of distances, not in terms of raw data, so it can
only be tested with the distance approach. Therefore,
attempting to test this ecological hypothesis using the
raw-data approach may give quite misleading results.
At least two recent studies (Cottenie 2005, Legendre et
al. 2005) have attempted to test hypothesis B using the
raw-data approach. They assumed that the presence of
significant spatial patchiness (different from random) in
the distributions of species supports the neutral theory.
However, the presence of spatial autocorrelation (the
hallmark of hypothesis B) cannot be tested using the
November 2006 2705ANALYZING OR EXPLAINING BETA DIVERSITY?
-
8/8/2019 Analyzing or Explaining Beta Diversity
10/12
raw-data approach, but instead it is entirely possible to
find significant patchiness that is in conflict with thepredictions of the neutral theory. For example, a
significant effect of the northsouth coordinates but
not of the eastwest coordinates implies that space is not
isotropic, contrary to the assumption of the theory.
Similarly, if a significant term is found that includes the
square of one of the coordinates, this may indicate that
sites at the extremes of the study area are more similar to
each other in community composition than to sites in
the central parts of the study area, which conflicts with
the prediction that similarity in community composition
decreases with increasing geographical distance.
An analysis using the raw-data approach is concerned
about modeling community composition (species abun-dances) at different points in space (compare with
modeling the position of a kite at different points in
time). A successful model can accurately (with a high R2
value) predict community composition at a given site on
the basis of its position along the environmental and
geographical gradients and the community compositions
of other sites whose positions along the same gradients
are also known. An analysis using the distance
approach, in contrast, is concerned about modeling
beta diversity (the difference in community composition)
between site pairs (compare with modeling the velocity
FIG. 3. The derivation of distances from raw data and the reconstruction of raw data like principal coordinates from distances.Both the raw data and the reconstructed raw data are at level-of-abstraction 1, the distances are at level-of-abstraction 2 (PCoA,principal coordinates analysis). In this example, the raw data consist of UTM coordinates of study sites, so Euclidean distancesindicate the distances between sites in kilometers. The original coordinates can be visualized in a scatterplot; because the UTMcoordinates include information of absolute location, the orientation of the scatterplot in relation to compass bearings is known,and its location can be related to external landmarks. The information on absolute location is discarded when distances arecomputed, so the reconstructed raw data include the original information on the positions of the study sites in relation to eachother, but no information on their positions in relation to external landmarks. With species-abundance data the situation is similar:species identities are lost when the distances are computed. If a distance measure other than the Euclidean distance is used to obtainthe distance matrix from the raw data (as is usually the case with species-abundance data), then the relationships among plots willdiffer in the visualizations based on the raw data vs. the reconstructed raw data.
HANNA TUOMISTO AND KALLE RUOKOLAINEN2706 Ecology, Vol. 87, No. 11
-
8/8/2019 Analyzing or Explaining Beta Diversity
11/12
of a kite between different points in time). A successful
model can accurately (with a high R2 value) predict the
degree of beta diversity between two sites on the basis of
their environmental and geographical distance and the
degree of beta diversity between other site pairs whose
environmental and geographical distances are also
known. In the latter analysis, both geographical position
and community composition are irrelevant and thereforenot included in the model. A given change in geo-
graphical distance has the same effect on beta diversity
no matter which species are actually involved, and no
matter whether the sites of interest are actually situated
east or west of the Greenwich meridian (compare with a
kite hitting a tree: the impact on velocity is the same no
matter what the position of the tree is).
Legendre et al. (2005:438) used simulated data to
compare the statistical power of the two procedures
[RDA and Mantel test] for partitioning the variation of
raw data. They found that RDA (redundancy analysis)
yielded higher R2 values and more powerful significance
tests, and recommended that it be used instead of the
Mantel test.
We find this a very problematic comparison. The raw-
data approach (RDA) and the distance approach
(Mantel test) have fundamentally different null hypoth-
eses, and only one of them (RDA) actually targets the
stated question of interest. The simulations did not show
that one method is better than the other, because the two
methods are not alternative ways of analyzing the same
statistical question. The only thing that can be
concluded from the simulation results is that if you
ask a different question, you may get a different answer.
In this case, the environmental and spatial model, as
implemented in RDA, explained a higher proportion of
the variance in the raw species-data table (level-2 data)
than the model of environmental and geographical
distances, as implemented in Mantel tests, explained of
the variance in the floristic distance matrix (level-3 data).
A data set might be simulated where the distance
approach yields a higher R2 value than the raw-data
approach, but this would not be a valid argument to
recommend the use of the distance approach to analyze
level-2 questions.
Another problem in trying to rank the performance of
different methods according to their R2 values or the
power of their significance tests is that these measuresdepend heavily on several details in the analysis
methods. Often the particular details that are relevant
in the raw-data approach lack counterpart in the
distance approach and vice versa. For example, in the
raw-data approach both P values and R2 values will
change depending on whether the regression that relates
species abundances to environmental variables models a
linear, unimodal symmetric or unimodal skewed re-
sponse, and whether space is modeled using just the x
and y coordinates, also their polynomial terms, or
PCNM (principal coordinates of neighboring matrices)
variables. In the distance approach, none of these
choices is relevant, because it is not the raw data that
are regressed against each other. Instead, both P values
and R2 values will change depending on which dissim-
ilarity measures are used, whether all environmental
variables are combined into a single distance matrix or
used in separate matrices, whether the geographical
distances are ln-transformed or not, and whether the
distance matrices are related to each other using a
monotonic, linear, or more complex function.
If one needs a model to predict the position of a kite,
one does not compare a model built to predict position
with a model built to predict velocity, and choose the
one that happens to yield higher R2 values in a
simulation study. One of the models can, and should,
be discarded from the outset because it does not model
the variable of interest. The situation is no different in
the community-composition case. The first criterion for
choosing a method of analysis is whether it is
appropriate for testing the question at hand or not,
and for making that decision it is irrelevant what R
2
values the available methods have obtained in simu-
lation studies.
CONCLUSIONS
Throughout this paper we have emphasized that
analyses based on the distance approach ask different
statistical questions than analyses based on the raw-data
approach. Therefore, they should be used for different
purposes. When researchers evaluate the answers they
get through statistical analysis, it is essential to under-
stand what questions those analyses ask. One can only
make justified statements about those predictions of the
relevant ecological hypotheses that have actually beentested, so one should not claim to have tested one
prediction when in fact the analysis method tested
another. But if an ecological hypothesis yields predic-
tions at different levels of abstraction, all of these can
fruitfully be tested, if this can computationally be done.
There is no reason to claim that testing one prediction is
more valid than testing another prediction; rather, the
different approaches should be viewed as complement-
ing each other.
ACKNOWLEDGMENTS
Many of the ideas presented in this paper were developed
during inspiring discussions with Pierre Legendre, DanielBorcard, and Pedro Peres-Neto. We thank all of them, as wellas Rune kland and two anonymous reviewers, for construc-tive comments on the manuscript. Financial support wasobtained from the Academy of Finland.
LITERATURE CITED
Borcard, D., and P. Legendre. 2002. All-scale spatial analysis ofecological data by means of principal coordinates ofneighbour matrices. Ecological Modelling 153:5168.
Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling outthe spatial component of ecological variation. Ecology 73:10451055.
Condit, R., N. Pitman, E. G. Leigh, Jr., J. Chave, J. Terborgh,R. B. Foster, P. Nun ez V., S. Aguilar, R. Valencia, G. Villa,
November 2006 2707ANALYZING OR EXPLAINING BETA DIVERSITY?
-
8/8/2019 Analyzing or Explaining Beta Diversity
12/12
H. C. Muller-Landau, E. Losos, and S. P. Hubbell. 2002.Beta-diversity in tropical forest trees. Science 295:666669.
Cottenie, K. 2005. Integrating environmental and spatialprocesses in ecological community dynamics. Ecology Letters8:11751182.
Duivenvoorden, J. F. 1995. Tree species composition and rainforestenvironment relationships in the middle Caqueta area,Colombia, NW Amazonia. Vegetatio 120:91113.
Duivenvoorden, J. F., J.-C. Svenning, and S. J. Wright. 2002.Beta diversity in tropical forests. Science 295:636637.
Ferrier, S., M. Drielsma, G. Manion, and G. Watson. 2002.Extended statistical approaches to modelling spatial pattern inbiodiversity in northeast New South Wales. II. Community-level modelling. Biodiversity and Conservation 11:23092338.
Gilbert, B., and M. J. Lechowicz. 2004. Neutrality, niches, anddispersal in a temperate forest understory. Proceedings of theNational Academy of Sciences (USA) 101:76517656.
Hubbell, S. P. 2001. The unified neutral theory of biodiversityand biogeography. Princeton University Press, Princeton,New Jersey, USA.
Huisman, J., H. Olff, and L. F. M. Fresco. 1993. A hierarchicalset of models for species response analysis. Journal ofVegetation Science 4:3746.
Jones, M. M., H. Tuomisto, D. B. Clark, and P. Olivas Rojas.2006. Effects of mesoscale environmental heterogeneity anddispersal limitation on floristic variation in rain forest ferns.Journal of Ecology 94:181195.
Karadzic, B., S. Marinkovic, and D. Katarinowski. 2003. Useof the b-function to estimate the skewness of speciesresponses. Journal of Vegetation Science 14:799805.
Legendre, P. 1993. Spatial autocorrelation: trouble or newparadigm? Ecology 74:16591673.
Legendre, P., and M. J. Anderson. 1999. Distance-basedredundancy analysis: testing multispecies responses in multi-factorial ecological experiments. Ecological Monographs 69:124.
Legendre, P., D. Borcard, and P. R. Peres-Neto. 2005.Analyzing beta diversity: partitioning the spatial variationof community composition data. Ecological Monographs 75:435450.
Legendre, P., and L. Legendre. 1998. Numerical ecology.Second English edition. Elsevier, Amsterdam, The Nether-lands.
Millar, R. B., M. J. Anderson, and G. Zunun. 2005. Fitting
nonlinear environmental gradients to community data: ageneral distance-based approach. Ecology 86:22452251.Oksanen, J., and P. R. Minchin. 2002. Continuum theory
revisited: what shape are species responses along ecologicalgradients? Ecological Modelling 157:119129.
Pitman, N. C. A., J. W. Terborgh, M. R. Silman, P. Nun ez V.,D. A. Neill, C. E. Cero n, W. A. Palacios, and M. Aulestia.2001. Dominance and distribution of tree species in twoupper Amazonian terra firme forests. Ecology 82:21012117.
Svenning, J.-C., D. A. Kinner, R. F. Stallard, B. M. J.Engelbrecht, and S. J. Wright. 2004. Ecological determinismin plant community structure across a tropical forest land-scape. Ecology 85:25262538.
ter Braak, C. J. F. 1983. Principal components biplots andalpha and beta diversity. Ecology 64:454462.
Tuomisto, H., K. Ruokolainen, and M. Yli-Halla. 2003.
Dispersal, environment, and floristic variation of westernAmazonian forests. Science 299:241244.Vellend, M. 2001. Do commonly used indices of beta-diversity
measure species turnover? Journal of Vegetation Science 12:545552.
Vormisto, J., J.-C. Svenning, P. Hall, and H. Balslev. 2004.Diversity and dominance in palm (Arecaceae) communities interra firme forests in the western Amazon basin. Journal ofEcology 92:577588.
Whittaker, R. H. 1960. Vegetation of the Siskiyou Mountains,Oregon and California. Ecological Monographs 30:279338.
Whittaker, R. H. 1972. Evolution and measurement of speciesdiversity. Taxon 21:213251.
HANNA TUOMISTO AND KALLE RUOKOLAINEN2708 Ecology, Vol. 87, No. 11