Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe...

48
High-throughput comparative genomics 24th October 2013 Joe Parker, Queen Mary University London

description

Invited research seminar given to MSc students at University College Dublin on 24th October 2013. I introduce the discipline of phylogenomics - comparative phylogenetic analyses of DNA sequences across genomes - and some of the applications and recent breakthroughs in the field. As an in-depth case study I explain the methods and significance of our 2013 Nature paper on adaptive genotypic molecular convergence in echolocating mammals. I then highlight some of the avenues of study on the frontiers of current research.

Transcript of Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe...

Page 1: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

High-throughput comparative genomics

24th October 2013Joe Parker,

Queen Mary University London

Page 2: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Topics

1. Introduction2. Background: why phylog e no mics?3. Examples4. Practice5. Case study6. On the horizon7. Over the horizon

Page 3: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Aims

• Context of phylogenomics: Next-generation sequencing (NGS)

• Why phylog e no mics?• Practical analyses• Future developments

Page 4: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

1. Our Research

Page 5: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Lab Interests

• Ecology and evolution of traits• Echolocation, sociality• NGS data for population genetics and phylogenomics

Page 6: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Activities

• Phylogeny estimation/comparison• Molecular correlates of evolution;

– site substitutions, dN/dS, composition• Simulation • Dataset limitations

(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey

Page 7: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

2. Background

Page 8: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Next-generation sequencing

Page 9: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Why phylog e no mics, not -genetics?

• Causes of discordant signal– Incomplete lineage sorting– Lateral transfer– Recombination – Introgression

Page 10: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Quantitative biology• Multiple configurations

• Hyperparameters empirically investigated

• Determine sensitivity of results

Page 11: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Distributions• Genome-scale data

provides context

• Identify outliersGene s / taxa / tre e s

• Compare values across biological systems

Page 12: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Integration with ‘Omics• Multiple databases

• Functional data

• Bibliographic information

Page 13: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

3. Example studies

Page 14: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Tsakgogeorgia e t al. (in press)

Page 15: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Salichos & Rokas (2013)

Page 16: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Backström e t al. (2013)

Page 17: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Lindblad-Toh e t al. (2011)

Page 18: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

4. Practice

Page 19: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Source material

• Samples• Storage• Purification• Library prep

Page 20: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Sequencing• Genome

– Sanger– Illumina – Pyro /454– SOLiD– PacBio

• Transcriptome / RNA-seq– MyBAITS

• HiSeq / MiSeq• IonTorrent

Page 21: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Infrastructure

• Desktop machines• Computing clusters• Grid systems• Cloud-based computation

Page 22: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Assembly, Annotation• Assembly

– To reference (mapping)

– De novo

• Annotation– By homology– De novo

•SOAPdenovo•MAKER•Velvet•Bowtie / Cufflinks / Tophat•Trinity

Page 23: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Alignment

• PRANK• MUSCLE• MAFFT• Clustal

Page 24: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Phylogeny inference

• MrBayes• RAxML• BEAST• MP-EST• STAR

Page 25: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Phylogenetic analysis

• BEAST• HYPHY• PAML• Pipelines• LRT

Page 26: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

5. Case study

Page 27: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Parker e t al. (2013)• De novo genomes:

– four taxa– 2,321 protein-coding loci– 801,301 codons

• Published:– 18 genomes

• ~69,000 simulated datasets

• ~3,500 cluster cores

Page 28: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Our pipeline for detecting genome-wide convergence

Page 29: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 30: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 31: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 32: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 33: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 34: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 35: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Page 36: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

mean = 0.05

Page 37: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

mean = 0.05 mean = -0.01 mean = -0.08

Page 38: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Development cycleDesign

Wireframe & specify tests

Implement

AlignmentloadSequences()

getSubstitutions()

PhylogenytrimTaxa()getMRCA()

DataSeriescalculateECDF()randomise()

RegressiongetResiduals()

predictInterval()

Review, refine & refactor

Page 39: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Parker e t al. (2013)

Page 40: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Parker e t al. (2013)

Page 41: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

6. On the horizon

Page 42: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Environmental metagenomics

Page 43: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Models of computation• Cloud resources: Unlimited

flexibility, finite time

• Development trade-off– Off-the-shelf– Bespoke

• Exploratory work– Real time genomic transects?

• Essential fundamental data missing from nearly every system;

– Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer

Page 44: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Serialisation• Process data remotely

• Freeze-dry objects, download to desktop

• Implement new methods directly on previously-analysed data

Page 45: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

7. Over the horizon

• Real-time phylogenetics• Field phylogenetics• Alignment-free analyses

Page 46: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Conclusions

• Why phylogenomics?• Practice• Comparative approach• Statistical context

Page 47: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

ThanksSteve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1

1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n2Wellco me Trust Sang e r Institute

3Cente r fo r Translatio nal Geno mics and Bio info rmatics, San Raffae le Institute , Milan

Chris Walker & Dan TraynorQue e n Mary GridPP High-thro ughput Cluste r

Chaz Mein & Anna TerryBarts and The Lo ndo n Geno me Centre

Mahesh PancholiScho o l o f Bio lo g ical and Chemical Scie nce s

BBSRC (UK); Queen Mary, University of London

Page 48: Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Resources• My email: Joe Parker (Queen Mary University of London): [email protected]

• Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511.

• Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press.

• Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327-331. doi:10.1038/nature12130

• Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033

• Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530

• Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009

• The Tree Of Life: http://phylogenomics.blogspot.co.uk/

• RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html

• Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/

• OpenHelix: http://blog.openhelix.eu/

• Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)