Joint work between Eva K ü ster, Ingolf K ü hn ~ UFZ

32
Analysing Analysing the link between the link between traits traits & & invasive spread in German flora: invasive spread in German flora: accounting for residence time accounting for residence time Joint work between Eva Küster, Ingolf Kühn ~ UFZ Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS Athens ALARM meeting, January 2007 Athens ALARM meeting, January 2007

description

Analysing the link between traits & invasive spread in German flora: accounting for residence time. Joint work between Eva K ü ster, Ingolf K ü hn ~ UFZ Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS. Athens ALARM meeting, January 2007. Introduction. - PowerPoint PPT Presentation

Transcript of Joint work between Eva K ü ster, Ingolf K ü hn ~ UFZ

Page 1: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

AnalysingAnalysing the link between the link between traits traits & invasive & invasive spread in German flora: accounting for spread in German flora: accounting for

residence timeresidence time

Joint work betweenEva Küster, Ingolf Kühn ~ UFZ

Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS

Athens ALARM meeting, January 2007Athens ALARM meeting, January 2007

Page 2: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

IntroductionIntroduction• Direct data on the arrival, establishment & spread of invasive

species are typically not available at the national or pan-European levels

• Indirect data about the traits & current spatial distribution of species that invaded in the past can be used to identify correlative relationships between traits and invasive success, accounting for phylogeny

• Data on traits are often missing or ambiguous, however, creating serious problems for the analysis – we look at how to address these using Bayesian methods

Page 3: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

• We analyse data on German vascular plants• Biolflor (www.ufz.de/biolflorwww.ufz.de/biolflor):

database with information on traits & phylogeny of 3660 species

• Florkart (www.floraweb.dewww.floraweb.de):database with information on presence/absense of 4000+ species for 2995 grid

cells within Germany

• We look at neophyte species (arrivals since 1490), excluding ephemerophytes: there are 388 such species

• We use the # of grid cells occupied as a measure of invasive success

DataData

Page 4: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Niche breadth in Germany

# hemerobic levels

Urbanity

# of habitat types

# of vegetation formations

# phytosociological classes

Genetics

Ploidy

DNA content

Morphology

Life form

Growth form

Life span

Generative reproductive cycles

Propagation & dispersal

Types of storage organs

Existence of storage organs

Types of shoot metamorphoses

Types of root metamorphoses

Flowering phenology

Beginning of flowering season

Length of flowering season

End of flowering season

Floral & reproductive biology

Strategy types of reproduction

Mating strategy

Pollen vector

Flower colour

Floral UV pattern

Floral UV reflection

Blossom type

Diaspores & germinules

Types of diaspores

Weights of diaspores

Weights of germinules

Native global distribution

Floristic zones of native area

# floristic zones in native area

Continent of native area

# continents in native area

Native in old or new world?

Oceanity of native area

Amplitude of oceanity

Leaf traits

Leaf persistence

Leaf anatomy

Leaf form

Invasive history

Mode of introduction

Residence time

Life strategy

Ecological strategy

Ruderal life strategy

Page 5: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Current analysis by UFZCurrent analysis by UFZ KKüüster, Kster, Küühn and Klotz (in prep.)hn and Klotz (in prep.)

• Regress log(# grid cells occupied) onto each of the ~40 individual traits in turn, in the presence of phylogenetic variables

• Retain only traits that are significant at the 95% level, exclude non-predictive traits, & then use cluster analysis to further reduce the set of traits

• Use AIC to select the best model from within this set of traits, including interactions

• At all stages, use only those species that have complete data for all traits currently in the model

Page 6: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Phylogenetic correctionPhylogenetic correction KKüüster, Kster, Küühn and Klotz (in prep.)hn and Klotz (in prep.)

• Compute the patristic distance matrix based on the phylogenetic codes given in biolflor

• For the current set of species –• apply a principal coordinate analysis to the relevant part of the distance matrix• retain only axes associated with positive eigenvalues• then retain the axes that account for the first 80% of variation• then regress log(# grid cell occupied) onto the remaining axes and retain only

those that are significant at the 95% level

• The phylogenetic variables need to be recomputed whenever the set of species is changed

Page 7: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Missing dataMissing data

• A large number of species are currently excluded from the final analysis as data are missing on some of their traits

• This is inefficient, & could potentially lead to bias if the data are missing not at random

• The missing data arise from different sources –• there being no record in the Biolflor database• the qualifier in Biolflor suggesting that data quality is poor• multiple states being recorded for a particular trait• a very rare state being recorded

Page 8: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Residence timesResidence times

• Residence time is a particularly important variable because• it has good explanatory power to describe occupancy • It partly accounts for the dynamic nature of invasive processes• it allows us to make time-specific predictions about occupancy

• However, data on German residence times are only available for

171 species, & for 35 of these only to the nearest century

• Some auxiliary data is available for neighbouring countries

• How can we properly include residence time into the analysis,

given the large proportion of missing data?

Page 9: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Species Region Time

Amaranthus deflexus L. Germany 1889

Aesculus hippocastanum L. Germany 16th century

Acer negundo L. Czech Republic

Germany

1699

18th cenutry

Oenothera depressa Greene Germany Early 19th century

Oxalis fontana Bunge Central Europe

Germany

17th century ?

1807

Epilobium ciliatum Raf. Central Europe

Germany

1871 / since 1971

1927

Nepeta grandiflora M. Bieb. Germany ca. 1900

Agrostis scabra Willd. Central Europe

Germany

1909

1960

Page 10: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Work at BioSSWork at BioSS

• The aims of our research on this at BioSS –• to explore how sensitive the results of inferences are to the

assumptions that we make about missing data • to analyse the data in such a way that species with missing

data for some traits do not need to be excluded• to relate the outputs from the the analysis to invasive risk

• We work with the Biolflor-Florkart data, and focus upon missing

data for residence times; however, the methodological ideas are

widely applicable

Page 11: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Application to toolkitApplication to toolkit

• Application to the prediction of invasive risk• e.g. Use traits & phylogeny to infer the number of cells

that a recently arrived species is likely to occupy after

N years of residence

• This number is uncertain, so it will be a probability

distribution rather than a single number

Page 12: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Bayesian methodsBayesian methods

• An alternative approach to statistical modelling and inference, in

which data are regarded as fixed and parameters are regarded

as random

• Increasingly widely used: due to improvements in computational

power it is now often possible to fit more advanced models

using Bayesian inference than using classical statistical methods

• Particularly suitable for problems that involve missing data

• Implemented using free software called WinBUGS:

extremely powerful but not particularly user-friendly…

Page 13: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Bayesian modellingBayesian modelling

Notation: for species i:

yi = # of grid cells occupied

ri = residence time

xi = other trait data

zi = phylogenetic variables

Basic model

log yi ~ N( + xi + zi + ri, 2)

…just the same as a GLM

Prior distributions

We use uninformative priors

, , , ~ N(0,1000)

2 ~ Gamma(1/1000, 1/1000)

• Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS

• Use this to explore potential refinements or extensions to the current analysis

• Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)

• Implementation is in WinBUGS• develop ways of dealing more

efficiently with missing data

• Bayesian

LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff

CRU data: David Viner

GCM data: PCMDI

Statistical methods: Jonathan Rougier, Chris Glasbey

Uncertainty analysis: Bjoern Reineking, Stijn Bierman

MCMC details:

Burn-in = 5000, Sample = 2000

Thinning ratio = 1:50

Page 14: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

ImputationImputation

• When data on residence times are missing, then we can assume that they are random variables• We can use data on the other traits, phylogeny & number of grid cells occupied to infer the distribution of the residence time for a particular species i

e.g.

log ri ~ N(exp{a + bxi + czi + dyi}, s2)

• Use of the cut function ensures this does not bias inferences about , , , and

• Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS• Use this to explore potential refinements or extensions to the current analysis• Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)

• Implementation is in WinBUGS• develop ways of dealing more efficiently with missing data

• Bayesian

LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff

CRU data: David Viner

GCM data: PCMDI

Statistical methods: Jonathan Rougier, Chris Glasbey

Uncertainty analysis: Bjoern Reineking, Stijn Bierman

Page 15: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: PloidyResults: Ploidy

Polyploid vs diploidPolyploid vs diploid

Estimate (SE) for trait effect Classical Bayesian

Trait .580 (.226) .587 (.225)

Trait + Phylogeny .636 (.220) .656 (.211)

Trait + Phylogeny + Residence .790 (.347) .630 (.216) [cut]

.761 (.199) [full]

Pink result based on 124 species

Other results based on 345 species

42 species excluded

Main model: P(parameter > 0)

Imputation model: P(parameter > 0)

> .99 b .14

1, .94 c 1, .84

> .99 d .99

Page 16: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: PloidyResults: Ploidy

Imputed valuesImputed values

Page 17: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: PloidyResults: Ploidy

PredictionsPredictions

Page 18: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: Duration of floweringResults: Duration of flowering

Estimate (SE) for trait effect Classical Bayesian

Trait .362 (.084) .358 (.080)

Trait + Phylogeny .329 (.083) .326 (.081)

Trait + Phylogeny + Residence .298 (.113) .229 (.082) [cut]

.204 (.076) [full]

Pink result based on 135 species

Other results based on 379 species

8 species excluded

Main model: P(parameter > 0)

Imputation model: P(parameter > 0)

> .99 b .97

.99 c > .99

> .99 D > .99

Page 19: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: End of floweringResults: End of flowering

Estimate (SE) for trait effect Classical Bayesian

Trait .207 (.060) .206 (.058)

Trait + Phylogeny .167 (.060) .166 (.059)

Trait + Phylogeny + Residence .275 (.106) .096 (.061) [cut]

.227 (.060) [full]

Pink result based on 135 species

Other results based on 379 species

8 species excluded

Main model: P(parameter > 0)

Imputation model: P(parameter > 0)

.96 b .17

.98 c > .99

> .99 d > .99

Page 20: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: End of floweringResults: End of flowering

Page 21: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: End of floweringResults: End of flowering

Page 22: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: Pollen vectorResults: Pollen vector

Estimate (SE) for trait effect Wind vs Self Insect vs Self

Classical Bayesian Classical Bayesian

Trait -1.16 (.38) -1.16 (.37) -0.71 (.32) -0.72 (.31)

Trait + Phylogeny -0.79 (.38) -0.81 (.36) -0.72 (.32) -0.72 (.31)

Trait + Phylogeny + Residence -1.22 (.51) -0.57 (.37) -0.74 (.43) -0.39 (.33)

-0.51 (.32) -0.56 (.27)

Main model Imputation model

.06, .13 b .06, .82

< .01, <.01 c < .01, .08

> .99 d .99

Pink result: 108 species

Other results: 329 species

58 species excluded

Page 23: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: Shoot metamorphosesResults: Shoot metamorphoses

a vs no Classical Bayesian

T 0.64 (.34) 0.64 (.34)

T+P 0.68 (.34) 0.70 (.34)

T+P+R 0.61 (.62) 0.82 (.35)

rh v no Classical Bayesian

T -1.06 (.35) -1.05 (.34)

T+P -0.79 (.37) -0.82 (.37)

T+P+R 0.26 (.63) -0.70 (.35)

p vs no Classical Bayesian

T 0.09 (.34) 0.10 (.32)

T+P 0.05 (.34) 0.08 (.37)

T+P+R -0.02 (.65) 0.23 (.33)

z vs no Classical Bayesian

T -1.12 (.65) -1.04 (.65)

T+P -0.24 (.75) -0.26 (.75)

T+P+R ? -0.06 (.69)

Page 24: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Significance of trait effect in Bayesian model: posterior probability that > 0

Trait only Trait + phylogeny

CUT: T + P + residence

Ploidy polyploid vs diploid > .99 > .99 > .99

Length of flowering season > .99 > .99 > .99

End of flowering season > .99 > .99 .94

Shoot a vs none .97 .98 .99

rh vs none < .01 .01 .02

p vs none .62 .59 .75

z vs none .05 .36 .47

Pollen vector wind vs self < .01 .01 .06

insect vs self .01 .01 .12

(Note: posterior probability that > 0 is always >0.99)

Page 25: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Further work 1:Further work 1:

Data Not Missing at RandomData Not Missing at Random

• Our model assumes that the data on residence times are missing at random, as does the approach of excluding missing data

• We can also consider possible mechanisms by which the missing data might be related to the variables of interest

Let oi = 1 if residence time observed for species i, 0 otherwise

• We could assume that

oi ~ Binomial(1, logit-1{A + Bxi + Czi + Dyi + Eri})

• The parameter E cannot be estimated, but we can assess sensitivity to the value of it; we assume here that E is negative

Page 26: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Results: End of floweringResults: End of flowering

Trait effect: estimate (SE)

for

Mean (Q2.5%, Q97.5%) imputed

residence

Trait only .206 (.058) -

+ Phylogeny .166 (.059) -

+ Residence MAR CUT .096 (.061) 114 (34, 355)

full .227 (.060) 104 (27, 351)

NMAR CUT E = -1

E = -2

E = -3

.094 (.062)

.096 (.064)

.090 (.058)

145 (44, 454)

191 (55, 619)

315 (73, 916)

Page 27: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Further work 2: Further work 2:

Multiple traitsMultiple traits

• Relatively low proportions of missing data for the other key traits:can just exclude these when he look at traits individually, but more problematic when we look at effects of multiple traits

• Most “missing data” for the other key traits arise because rare or duplicate trait states are recorded in Biolflor

• We would like to incorporate this information directly into the analysis, rather than attempting to impute the missing values

• We can deal with duplicate states either by assuming:• that the parameter for species that have both states is the average of the

parameters for the two states; or• by including a separate parameter for species that have duplicate traits

Page 28: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

# treated as missing in current analysis

# with no record at all

Ploidy 42 13

Length of flowering season 8 8

End of flowering season 8 8

Pollen vector 58 37

Shoot metamorphoses 59 1

Any of the above five traits 134 54

Page 29: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Species Pollen vector Qualifer

Acer negundo L. Wind Always

Adonis annua L. Selfing

Insects

Unknown

Unknown

Alcea Rosea L. Selfing

Insects

At failure of outcrossing

The rule

Artemisia dracunculus L. Wind The rule

Diplotaxis muralis (L.) DC. Selfing

Selfing

Insects

The rule

At failure of outcrossing

The rule

Elodea canadensis Michx. Water The rule

Epilobium ciliatum Raf.H Selfing

Cleistogamy

The rule

The rule

Missing datain current analysis

Page 30: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Method to deal with duplicates: Exclude Average of parameters

Separate parameter

Ploidy Polyploid vs Diploid .636 (.220) .592 (.226) .641 (.225)

Both vs Diploid - .296 (.113) .747 (.396)

Pollen vector Wind vs Self -.795 (.376) -.508 (.376) -.653 (.384)

Insect vs Self -.716 (.315) -.683 (.310) -.748 (.314)

Water vs Self - -.094 (.935) -.134 (.933)

Insect+Self vs Self - -.342 (.155) -.138 (.614)

Wind+Self vs Self - -.254 (.188) 2.18 (1.42)

Wind+Insect vs Self - -.596 (.244) .099 (1.99)

Classical analysis, model = Traits + Phylogeny

Page 31: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Furthur work 3: Furthur work 3:

Auxiliary residence time dataAuxiliary residence time data

• The imputation model allows us to draw inferences about residence times for species where the arrival date is unknown

• The performance of the imputation model depends upon us it containing regressors that are strongly correlated with residence time in Germany

• Possibility of using data on residence in a neighbouring country, ni, as an explanatory variable:

log ri ~ N(exp{a + bxi + czi + dyi + eni }, s2)

Page 32: Joint work between Eva K ü ster, Ingolf K ü hn  ~ UFZ

Furthur work 4: Furthur work 4:

Climate changeClimate change

• UFZ are using the species-level model to identify key

traits for invasive success, & then a spatial approach

to estimate impact of environmental change on these

• A non-spatial approach might involve grouping cells

according to environmental characteristics, & fitting the

species-level model seperately for each group of cells

• We are interesting in comparing these approaches