The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use...

28
The adephylo package S. Dray Univ. Lyon 1 2015, Lausanne SD (Univ. Lyon 1) 2015, Lausanne 1 / 24

Transcript of The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use...

Page 1: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

The adephylo package

S. Dray

Univ. Lyon 1

2015, Lausanne

SD (Univ. Lyon 1) 2015, Lausanne 1 / 24

Page 2: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Introduction

The ade family

analyse de donnees ecologiques

SD (Univ. Lyon 1) 2015, Lausanne 2 / 24

Page 3: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Introduction

an package to analyse phylogenetic signal in traits data

Reimplementation and development of ade4 functionalities

use of phylo (ape), phylo4d (phylobase) classes instead of phylog

new methods (e.g., ppca) and functions

ade4 → adephylo← ape, phylobase

multivariate phylogeny

SD (Univ. Lyon 1) 2015, Lausanne 3 / 24

Page 4: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Introduction

Two ingredients

speciestraits

SD (Univ. Lyon 1) 2015, Lausanne 4 / 24

Page 5: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Introduction

Two ingredients

speciestraits

SD (Univ. Lyon 1) 2015, Lausanne 5 / 24

Page 6: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Multivariate analysis

Summarizing data

variables

individuals

what are the relationships between the variables ?

what are the resemblances/differences between the individuals ?

SD (Univ. Lyon 1) 2015, Lausanne 6 / 24

Page 7: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Multivariate analysis

Summarizing data with multivariate methods

variables

individuals

d = 2

●●

●●

●●

●●

●●●

●●

●●

1234

56

78910

11121314

15

16 171819202122

23

24

25

2627

282930

dfs

altslo

flopH har

pho

nit

amm

oxy

bdo

what are the relationships between the variables ?

what are the resemblances/differences between the individuals ?

SD (Univ. Lyon 1) 2015, Lausanne 7 / 24

Page 8: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Multivariate analysis

One table, two geometric viewpoints

0

X

cloud of n rows (individuals)

variable 1

variable 2

variable p

individuals hyperspace

0

0 X

cloud of p columns (variables)

individual 1

individual 2

individual n

variables hyperspace

Multivariate methods aim to answer these two questions and seek for smalldimension hyperspaces (few axes) where the representations of individualsand variables are as close as possible to the original ones.

SD (Univ. Lyon 1) 2015, Lausanne 8 / 24

Page 9: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Multivariate analysis

Principal Component Analysis

In ade4 : dudi.pca(df)

X =[xij−xj

s(xj )

]Q = Ip

D = 1n In

Maximization of :

Q(a) = aTQTXTDXQa =‖ XQa ‖2D= var(XQa)

S (k) = kTDTXQXTDk =‖ XTDk ‖2Q=

p∑j=1

cor2(k,xj )

SD (Univ. Lyon 1) 2015, Lausanne 9 / 24

Page 10: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Multivariate analysis

Lizards data

18 species, 8 traits :

mean.L : mean adult female length (mm)

matur.L : female length at maturity (mm)

max.L : maximum length of adult female(mm)

hatch.L : hatchling length (mm)

hatch.m : hatchling mass (g)

clutch.S : clutch size (n. eggs)

age.mat : age at maturity (months)

clutch.F : clutch frequency (n. per year)

Demo

Bauwens, D. et R. Dıaz-Uriarte. 1997. Covariation of life-history traits in Lacertid lizards : a comparative study. AmericanNaturalist. 149 :91-111.

SD (Univ. Lyon 1) 2015, Lausanne 10 / 24

Page 11: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny

Management in R

Packages ape and phylobase provides functions, methods and classes todeal with phylogenetic data

Import : read.tree

Classes for a tree : phylo (ape), phylo4 (phylobase)

Class for a tree + data : phylo4d (phylobase)

Graphic : plot

Demo

SD (Univ. Lyon 1) 2015, Lausanne 11 / 24

Page 12: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits

species

traits

Phylogenetic structures (i.e. phylogenetic autocorrelation or signal) : thevalues of biological traits observed in a set of taxa are not independentfrom their position in the phylogenetic tree.

positive : closely related taxa tend to share similar trait values

negative : strong contrasts between sister taxa

Need for mathematical representations of the phylogenetic relatedness

SD (Univ. Lyon 1) 2015, Lausanne 12 / 24

Page 13: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits

species

traits

Phylogenetic structures (i.e. phylogenetic autocorrelation or signal) : thevalues of biological traits observed in a set of taxa are not independentfrom their position in the phylogenetic tree.

positive : closely related taxa tend to share similar trait values

negative : strong contrasts between sister taxa

Need for mathematical representations of the phylogenetic relatedness

SD (Univ. Lyon 1) 2015, Lausanne 12 / 24

Page 14: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits

Phylogeny as a distance/similarity matrix

Function distTips computes distances. The argument method can takedifferent values :

patristic : patristic distance, i.e. sum of branch lengths on theshortest path between two tips

nNodes : number of nodes on the shortest path between two tips

Abouheif : Abouheif’s distance

sumDD : sum of the number of direct descendants of all nodes on theshortest path between two tips

Function proxTips returns phylogenetic proximities wij based on aphylogenetic distance dij using wij = 1

daij

SD (Univ. Lyon 1) 2015, Lausanne 13 / 24

Page 15: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Measuring and testing the phylogenetic signal

Moran’s index

The n-by-1 vector x = [x1 · · · xn ]T contains the measurements of aquantitative trait for n species and W = [wij ] is the the n-by-nphylogenetic proximity matrix.

MC (x) =n∑

(i ,j ) wij (xi − x )(xj − x )∑(i ,j ) wij

∑ni=1 (xi − x )2

see moran.idx, abouheif.moran

SD (Univ. Lyon 1) 2015, Lausanne 14 / 24

Page 16: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Measuring and testing the phylogenetic signal

Moran’s index and Abouheif’s Cmean

Abouheif’s test of phylogenetic signal is exactly a test of Moran’s indexwith phylogenetic proximities defined as :

wij =aij∑

j ,i 6=j aij

withaij = (

∏p∈Pij

f (p))−1

where Pij is the set of nodes on the shortest path from tip i to tip j andf (p) is the number of direct descendents from node p.

Demo

Pavoine, S., Ollier, S., Pontier, D. and Chessel, D. 2008. Testing for phylogenetic signal in phenotypic traits : new matrices ofphylogenetic proximities. Theoretical Population Biology, 73, 79–91.

SD (Univ. Lyon 1) 2015, Lausanne 15 / 24

Page 17: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Moran’s index allows to test the phylogenetic autocorrelation

Phylogenetic structure is summarized by a single number

Different stories can lead to the same value

Measuring→ Describing

How the variance of a quantitative trait is decomposed along thephylogenetic tree ?

SD (Univ. Lyon 1) 2015, Lausanne 16 / 24

Page 18: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Moran’s index allows to test the phylogenetic autocorrelation

Phylogenetic structure is summarized by a single number

Different stories can lead to the same value

Measuring→ Describing

How the variance of a quantitative trait is decomposed along thephylogenetic tree ?

SD (Univ. Lyon 1) 2015, Lausanne 16 / 24

Page 19: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Phylogeny as an orthonormal basis

Tools to represent the structure of a tree. Orthonormal basis allows asimple and unique decomposition of the variance.

Dummy variables

Moran’s eigenvectors

SD (Univ. Lyon 1) 2015, Lausanne 17 / 24

Page 20: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Dummy variables

It defines partitions of tips reflecting the topology of the tree : each node(except the root) is translated into a dummy variable having one value foreach tip (1 if the tip descends from this node and 0 otherwise).

Not an orthonormal basis

Only based on the topology

Ollier, S., Chessel, D. and Couteron, P. 2005 Orthonormal Transform to Decompose the Variance of a Life-History Trait across aPhylogenetic Tree. Biometrics, 62, 471–477.

SD (Univ. Lyon 1) 2015, Lausanne 18 / 24

Page 21: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Dummy variables

It defines partitions of tips reflecting the topology of the tree : each node(except the root) is translated into a dummy variable having one value foreach tip (1 if the tip descends from this node and 0 otherwise).

Not an orthonormal basis

Only based on the topologyOllier, S., Chessel, D. and Couteron, P. 2005 Orthonormal Transform to Decompose the Variance of a Life-History Trait across aPhylogenetic Tree. Biometrics, 62, 471–477.

SD (Univ. Lyon 1) 2015, Lausanne 18 / 24

Page 22: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Moran’s eigenvectors

The eigenvectors (B) of a doubly centred matrix of phylogeneticproximities :

H(1

2(WT + W))H

where H = In − 1n1Tn/n

The n − 1 column-vectors of B (sorted by decreasing eigenvalue) areorthonormal variables ranging from the largest to the lowest possiblephylogenetic autocorrelation as measured by Moran’s index.

SD (Univ. Lyon 1) 2015, Lausanne 19 / 24

Page 23: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Decomposition of a trait on an orthonormal basis

The vector of squared correlation [cor2(x,b1), . . . , cor2(x,bn−1)] providesa decomposition of a quantitative trait on the phylogeny.

ME 1 ME 3 ME 5 ME 7 ME 9 ME 11 ME 13 ME 15

r2

0.00

0.10

0.20

0.30

Demo

SD (Univ. Lyon 1) 2015, Lausanne 20 / 24

Page 24: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Describing the phylogenetic signal

Associated tests

The function orthogram provides different statistics for detecting phylogeneticsignal :

The maximum squared correlation :

R2Max(x) = max(r21 , . . . , r2n−1)

The deviation from an ordered uniform distribution (KS) :

Dmax(x) = max1≤m≤n−1

(

m∑i=1

r2i −m

n − 1)

The skewness (to the root or to the tips) of the variance decomposition :

SkR2k(x) =

n−1∑i=1

ir2i

The average local variation :

SCE(x) =

n−1∑i=2

(r2i − r2i−1)2

SD (Univ. Lyon 1) 2015, Lausanne 21 / 24

Page 25: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Multivariate data

From univariate to multivariate data

Phylogenetic tools are mainly adapted to univariate data → indirectapproach :

summarize multivariate data by PCA

apply phylogenetic analysis on PCA scores

Not optimal as PCA identifies the main resemblances/differences betweenthe individuals but these differences are not constrained by thephylogenetic relatedness

Demo

SD (Univ. Lyon 1) 2015, Lausanne 22 / 24

Page 26: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Multivariate data

From univariate to multivariate data

Phylogenetic tools are mainly adapted to univariate data → indirectapproach :

summarize multivariate data by PCA

apply phylogenetic analysis on PCA scores

Not optimal as PCA identifies the main resemblances/differences betweenthe individuals but these differences are not constrained by thephylogenetic relatedness

Demo

SD (Univ. Lyon 1) 2015, Lausanne 22 / 24

Page 27: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Phylogeny and traits Multivariate data

From PCA to phylogenetic PCA

pPCA is an extension of PCA that includes the matrix of phylogeneticproximities in the algorithm. It modifies the criteria maximized by theanalysis

PCA maximizes

Q(a) = aTQTXTDXQa = var(XQa)

pPCA maximizes

Q(a) = aTQTXT 1

2(WTDT + DW)XQa = var(XQa) ·MC (XQa)

Jombart, T., Pavoine, S., Devillard, S., and Pontier, D. 2010. Putting phylogeny into the analysis of biological traits : Amethodological approach. Journal of Theoretical Biology, 264(3), 693–701.

Demo

SD (Univ. Lyon 1) 2015, Lausanne 23 / 24

Page 28: The adephylo package - · PDF fileReimplementation and development of ade4 functionalities use of phylo ... In ade4 : dudi.pca(df) X = h x ij x j s(x j) i Q = I p D = 1 n I n ... The

Conclusion

vignette("adephylo")

SD (Univ. Lyon 1) 2015, Lausanne 24 / 24