Phylogenetics in R

21
Phylogenetics in R Scott Chamberlain November 18, 2011

description

Talk given on 18 Nov, 2011 on doing phylogenetics in R.

Transcript of Phylogenetics in R

Page 1: Phylogenetics in R

Phylogenetics in R

Scott ChamberlainNovember 18, 2011

Page 2: Phylogenetics in R

What sorts of phylogenetics things can I do in R?

Page 3: Phylogenetics in R

The run down• Get sequence data• Align sequence data• Phylogenetic inference

– NJ, maxlik, parsimony, Bayesian, UPGMA

• Visualize phylogenies• Traits on trees

–Phylogenetic signal–Trait evolution–Ancestral state character reconstruction

• Tree simulations• Get trees• Phylogenetic community structure• Bonus stuff: polytomy resolver

Page 4: Phylogenetics in R

Basic trees in R

Example

require(ape)tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);")tr1 # print tree summarywrite.tree(tr1) # print tree in newick format "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);"

tr1$tip.label # tip labels "B" "C" "D" "A"

tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10

tr1$node.label # node labels NULL [MEANING – no node labels]

# Assign properties to treestr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tipstr1$tip.label # did it work? "sleepy" "happy" "grumpy" "frumpy“

Etcetera for other tree properties

Page 5: Phylogenetics in R

Get sequence data# install and load apeinstall.packages("ape"); require(ape)

# get data from Genbank# make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species

cotton_acc <- c("U56806", "U12712", "U56810","U12732", "U12725", "U56786", "U12715","AF057758","U56790", "U12716", "U12729","U56798", "U12727", "U12713", "U12719","U56811", "U12728", "U12730", "U12731","U12722", "U56796", "U12714", "U56789","U56797", "U56801", "U56802", "U12718","U12710", "U56804", "U12734", "U56809","U56812", "AF057753", "U12711", "U12717","U12723", "U12726")

# get data from Genbankrequire(ape)cotton <- read.GenBank(cotton_acc, species.names = T)

# name the sequences with species names instead of access numbersnames_accs <- data.frame(species = attr(cotton, "species"), accs = names(cotton))names(cotton) <- attr(cotton, "species")

Page 6: Phylogenetics in R

Align sequence datarun external: clustal, mafft

# multiple sequence alignment### Get clustalw here, and install: http://www.clustal.org/

# set to your working directorysetwd(“/path on your computer to/ClustalW2")

# write fasta file to directorywrite.dna(cotton, "cotton.fas", format = "fasta")

# run clustal multiple alignment, prints clustal output to consolesystem(paste('"./clustalw2" cotton.fas')) # should work on OSX or Windows

# read the alignment back in to Rcotton_clustalaligned <- read.dna("cotton.aln", format="clustal")

Manual aligment may have to be done, dare I say it, not in R

Page 7: Phylogenetics in R

Get and align sequencesDIY

• Get together with a few other people…or not– Choose some species to investigate– Get their accession numbers on GenBank– Download sequence data from Genbank– If you are really adventurous, also align sequences

Page 8: Phylogenetics in R

Phylogenetic inference Tools

R Packages: ape, phangorn, phyclust, phytools, scaleboot

• ape has the most functionality for phylogenetic inference

• You should be able to call MrBayes form R, but I don’t know how – package phyloch?

Page 9: Phylogenetics in R

Phylogenetic inference • Fitting evol models: see fxn modelTest in package phangorn

• NJinstall.packages(“ape"); require(ape)data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw)

• Maximum likelihoodinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian)dm <- dist.logDet(Laurasiatherian) njtree <- NJ(dm)MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameterMLfit_ <- optim.pml(MLfit, model = "GTR") MLfit_$treeplot(MLfit_$tree)

• Parsimonyinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian) dm = dist.logDet(Laurasiatherian) tree = NJ(dm) treepars <- optim.parsimony(tree, Laurasiatherian)

Page 10: Phylogenetics in R

Phylogenetic inference---Continued• Bayesian

– You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R…

– …however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/), fyi

Page 11: Phylogenetics in R

Phylogenetic inferenceDIY

• With your partners…or not– Use the sequence data from GenBank you got

earlier– (if you didn’t align the sequences, don’t worry

about it – OR use data set provided with ape or other package)

– Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)

Page 12: Phylogenetics in R

Visualize phylogenies

R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo

# visualize phylogeniesinstall.packages("ape")require(ape)tree <- rcoal(10)treeplot(tree)plot(tree, type = "cladogram")plot(tree, type = "unrooted")plot(tree, type = "radial")plot(tree, type = "fan")

Page 13: Phylogenetics in R

Visualize phylogeniesDIY

• Get together with a few other people…or not– Use the tree you made, or use one provided with

ape, or other packages – Do basic plotting, e.g.: plot(mytree)– Then see if you can

• color the branches, • label the branches with the edge lengths• change the tip labels• etc.

Page 14: Phylogenetics in R

Traits on treesphylogenetic signal

R Packages: ape, picante, caper, phytoolsExamples from picante and phytools:# phylogenetic signalinstall.packages("picante")require(picante)randtree <- rcoal(20)randtraits <- rTraitCont(randtree)Kcalc(randtraits[randtree$tip.label],randtree)

install.packages("phytools")require(phytools)tree <- rbdtree(1,0,Tmax=4) # make a treex <- fastBM(tree) # simulate traitsphylosig(tree, x, method="lambda", test=TRUE) # calcualte physig, lambdaphylosig(tree, x, method="K", test=TRUE) # calcualte physig, K

Page 15: Phylogenetics in R

Traits on treesmodeling trait evolution

R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot

Above can do: trait evolution of traits, including: discrete and continuous, and with Brownian motion or OU models

See also: • Rbrownie• Various dev evol modeling frameworks to be included in geiger

soon: auteur, mecca, medusa, and fossilmedusahere: http://www.webpages.uidaho.edu/~lukeh/software/index.html

Page 16: Phylogenetics in R

Ancestral state reconstruction

R Packages: ape, ouch, phytools

Function ‘ace’ in the ape package works nicelyBut very sensitive to parameters

Exampledata(bird.orders)x <- rnorm(23)out <- ace(x, bird.orders)

out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)

Page 17: Phylogenetics in R

Tree simulationsR Packages: Treesim, geiger, ape, phybaseExamplerequire(ape)tree <- rcoal(10) # Make a random treetrait <- rTraitCont(tree, model = "BM") # Simulate a trait on that tree

# Write a function to make a tree, simulate a BM trait, and take the mean of that traitmyfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = "BM") mean(trait)}

# do it 100 times and make a data.frame required for ggplot2 plottingdat <- replicate(100, myfunc(10))dat2 <- data.frame(dat)

# plot resultsrequire(ggplot2)ggplot(dat2, aes(dat)) + geom_histogram()

Page 18: Phylogenetics in R

Get trees

rOpenSci’s treeBASE packageon CRAN: http://cran.r-project.org/web/packages/treebase/

install.packages("treebase") # installrequire(treebase) # loadtree <- search_treebase("Derryberry", "author")[[1]] # searchmetadata(tree$S.id) # metadata for treeplot(tree) # plot the tree

Page 19: Phylogenetics in R

Phylogenetic community structure

R Packages: picante (includes phylocom functionality)--Although, not bladj for some reason, talk to me if you want to run bladj from R

Example

Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index

data(phylocom)comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE)

Also, new approach to phycommstruct in R from Matt Helmus, code here:http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html

Page 20: Phylogenetics in R

Bonus: Polytomy resolver

MEE paper: “A simple polytomy resolver for dated phylogenies” by Kuhn, Mooers, and Thomas

– Paperhttp://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract

– Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo

Page 21: Phylogenetics in R

Resources• Bodega Phylogenetics Wiki:

– Home: http://bodegaphylo.wikispot.org/Front_Page – BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution

– Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R

• R phylo-wiki (from NESCent): • http://www.r-phylo.org/wiki/HowTo/Table_of_Contents • CRAN task view, Phylogenetics:

http://cran.r-project.org/web/views/Phylogenetics.html • rmesquite: https://r-forge.r-project.org/R/?group_id=213 • R-phylogenetics listserve:

https://stat.ethz.ch/mailman/options/r-sig-phylo/