Survey of softwares for phylogenetic analysis

37
Survey of softwares for phylogenetic analysis ag1805x ag1805x

Transcript of Survey of softwares for phylogenetic analysis

Page 1: Survey of softwares for phylogenetic analysis

Survey of softwares for phylogenetic analysis

ag1805xag1805x

Page 2: Survey of softwares for phylogenetic analysis

PhylogeneticsPhylogeneticsPhylogenetics - in biology – is the study of the

evolutionary history and relationships among individuals or groups of organisms (e.g. species, or populations).These relationships are discovered through

phylogenetic inference methods that evaluate observed heritable traits, such as DNA sequences or morphology under a model of evolution of these traits.The result of these analyses is a phylogeny (also

known as a phylogenetic tree) – a hypothesis about the history of evolutionary relationships

Page 3: Survey of softwares for phylogenetic analysis

Phylogenetic treePhylogenetic treePhylogenetic tree also known as “evolutionary tree” is the graphical representation of the evolutionary relationship between the taxa/genes in question.

A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree.

Page 4: Survey of softwares for phylogenetic analysis

The cladogram is a dendogram which explains only genealogy of the taxa but says nothing about the branch lengths or time periods of divergence.

Page 5: Survey of softwares for phylogenetic analysis

The phylogram (additive tree) is a phylogenetic tree that explicitly represents a number of character changes (nucleotide/amino acid changes/number of character variations) through its branch lengths. In case of phylogram the evolutionary distance between any two taxa is given by sum of the branch lengths connected them. Though these trees may be rooted or unrooted, often these trees lack a root.

Page 6: Survey of softwares for phylogenetic analysis

A chronogram (ultra metric) is a rooted phylogenetic tree that possesses all the characteristics of an additive tree, in addition with the assumption of molecular clock determination of the molecular divergence time between taxa can be possible. The molecular clock hypothesis assumes that every site in a protein or coding nucleotide sequence from all the species evolve at a constant rate. Furthermore, the chronogram consists of taxa placed equidistant from the ancestor which cannot be seen in case of phylogram.

Page 7: Survey of softwares for phylogenetic analysis

Phenetics (taximetrics) infers the relationship between the taxa that usually involves morphology or other observable traits as phylogenetic informative markers.

Page 8: Survey of softwares for phylogenetic analysis

A tree that shows the evolution of the genes is known as gene tree. While, tree that shows the evolution of species is known as species' tree

Page 9: Survey of softwares for phylogenetic analysis

The whole process of construction of the phylogenetic tree is divided into five different steps, viz. Step 1: Choosing an appropriate markers for

the phylogenetic analysisStep 2: Multiple sequence alignmentsStep 3: Selection of an evolutionary modelStep 4: Phylogenetic reconstructionStep 5: Evaluation of the phylogenetic tree

Process of construction of the phylogenetic treeProcess of construction of the phylogenetic tree

Page 10: Survey of softwares for phylogenetic analysis

Step 1: Choosing an appropriate markers for the Step 1: Choosing an appropriate markers for the phylogenetic analysisphylogenetic analysis

Any biological information that can be used to infer the evolutionary relationship among the taxa is known as a phylogenetic information marker.It can be anything like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, and conserved intronic positions, etc.Identification of conserved genetic loci (coding- or non-coding) is the first step in analyzing the phylogenetic relationship.Both coding (genes) and non-coding genetic region can be used for the analysis of phylogenetic relationships.

Page 11: Survey of softwares for phylogenetic analysis

However, selected sequence(s) must satisfy the defined necessary rules:

(a) the sequence should have a long evolutionary history of conservation, as this feature facilitates, firstly in the preservation of long evolution-selection episodes, and secondly, aids in easy amplification of the target sequences from distant taxa.

(b) conserved, slow evolving genes may be used to resolve the evolutionary relationship between distantly related species while fast evolving genes should be choose for the recently evolved species or intra-species.

(c) amino acid sequences are more informative while inferring the evolutionary relationship among distantly related taxa, and conversely, nucleotide information for recently evolved/closely related species.

Page 12: Survey of softwares for phylogenetic analysis

(d) the sequences need to be employed in the phylogenetic analysis should be tested for their usability in a given lineage (for instance, mitochondrial (cytochrome C oxidase subunit I & II (CoxI & II)), chloroplast (trnH-psbA, matK, rpoC, rpoB, rbcL), and nuclear (16S ribosomal RNA) conserved genes are preferred to use for analyzing animal, plant, and microbial species, respectively-and are called “barcode genes”).

(e) finally, if, objective is to estimate the divergence periods between taxa, the selected gene or protein sequences should essentially follow the molecular clock hypothesis. However, recently relaxed molecular clock models have also been proposed. This step follows successful polymerase chain reaction amplification of the target gene/protein, followed by sequencing and editing of the sequences for further analysis.

Page 13: Survey of softwares for phylogenetic analysis

Step 2: Multiple sequence alignmentsStep 2: Multiple sequence alignments

●The main aim of multiple sequence alignment is to compare the three or more nucleotide or protein sequences and to provide the basis for calculation of the sequence diversities/divergences to infer the evolutionary relationship among the taxa.

●Different models (discussed below) have been proposed based on various assumptions to calculate the sequence divergences between the sequences or taxa. Hence, the correct sequence alignment is mandatory in order to get the true phylogeny that is representative of the evolutionary relationship among the taxa.

Page 14: Survey of softwares for phylogenetic analysis

Step 3: Selection of an evolutionary modelStep 3: Selection of an evolutionary model

Evolutionary  models  are  sets  of  assumptions  about the process of nucleotide or amino­acid substitution. They  describe  the  different  probabilities  of  change from one nucleotide or amino acid  to another, with the aim of correcting for unseen changes along thePhylogeny.

Page 15: Survey of softwares for phylogenetic analysis

Step 4: Phylogenetic reconstructionStep 4: Phylogenetic reconstruction

Two different methodologies are employed by the presently available programs to generate the dendograms;(a) clustering methods-where two most closely related taxa are placed under single inter-node and further add third taxa considering within internodes taxa as a single group. In this way, the program progressively adds the other remaining taxa to yield final phylogenetic tree(b) second type of methods generate the 'n' number of trees proportional to the number of taxa involved in the phylogenetic analysis followed by the selection of best fit tree topology (increased likelihood or probability) for a given evolutionary model.

Page 16: Survey of softwares for phylogenetic analysis

Step 5: Evaluating the phylogenetic treeStep 5: Evaluating the phylogenetic tree

This process can be performed using two evaluation methods, namely bootstrap method and interior-branch test.

Page 17: Survey of softwares for phylogenetic analysis

PracticalsPracticals

Page 18: Survey of softwares for phylogenetic analysis

Phylogenetic information marker: Cytochrome C oxidase subunit I (CoxI)Organisms:

Homo sapiens (Human)Bos taurus (Bovine)Danio rerio (Zebrafish)Sus scrofa (Pig)Ovis aries (Sheep)

Software  used:  Clustal  Omega  (MSA)  &  Clustal W2 phylogeny

Page 19: Survey of softwares for phylogenetic analysis

Downloading the COX1 sequence from UniProt

Page 20: Survey of softwares for phylogenetic analysis

>sp|P00395|COX1_HUMAN  Cytochrome  c  oxidase  subunit  1  OS=Homo  sapiens  GN=MT­CO1  PE=1 SV=1MFADRWLFSTNHKDIGTLYLLFGAWAGVLGTALSLLIRAELGQPGNLLGNDHIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSLLLLLASAMVEAGAGTGWTVYPPLAGNYSHPGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMTQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGSNMKWSAAVLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFIHWFPLFSGYTLDQTYAKIHFTIMFIGVNLTFFPQHFLGLSGMPRRYSDYPDAYTTWNILSSVGSFISLTAVMLMIFMIWEAFASKRKVLMVEEPSMNLEWLYGCPPPYHTFEEPVYMKS

Page 21: Survey of softwares for phylogenetic analysis

>sp|P00396|COX1_BOVIN Cytochrome c oxidase subunit 1 OS=Bos taurus GN=MT­CO1 PE=1 SV=1MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVMITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMMWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFVHWFPLFSGYTLNDTWAKIHFAIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTMWNTISSMGSFISLTAVMLMVFIIWEAFASKREVLTVDLTTTNLEWLNGCPPPYHTFEEPTYVNLK

Page 22: Survey of softwares for phylogenetic analysis

>sp|Q9MIY8|COX1_DANRE Cytochrome c oxidase subunit 1 OS=Danio rerio GN=mt­co1 PE=2 SV=1MTITRWFFSTNHKDIGTLYLVFGAWAGMVGTALSLLIRAELSQPGALLGDDQIYNVIVTAHAFVMIFFMVMPILIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSGVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTTINMKPPTISQYQTPLFVWAVLVTAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGIISHVVAYYAGKKEPFGYMGMVWAMMAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGAIKWETPMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMAGFVHWFPLFTGYTLNSVWTKIHFGVMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYALWNTVSSIGSLISLVAVIMFLFILWEAFTAKREVLSVELTATNVEWLHGCPPPYHTFEEPAFVQIQSN

Page 23: Survey of softwares for phylogenetic analysis

>sp|O79876|COX1_PIG Cytochrome c oxidase subunit 1 OS=Sus scrofa GN=MT­CO1 PE=3 SV=2MFVNRWLYSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFVHWFPLFSGYTLNQAWAKIHFVIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTAWNTISSMGSFISLTAVMLMIFIIWEAFASKREVSAVELTSTNLEWLHGCPPPYHTFEEPTYINLK

Page 24: Survey of softwares for phylogenetic analysis

>sp|O78749|COX1_SHEEP Cytochrome c oxidase subunit 1 OS=Ovis aries GN=MT­CO1 PE=3 SV=1MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMMWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFVHWFPLFSGYTLNDTWAKIHFAIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTMWNTISSMGSFISLTAVMLMIFIIWEAFASKREVLTVDLTTTNLEWLNGCPPPYHTFEEPTYVNLK

Page 25: Survey of softwares for phylogenetic analysis

Multiple Sequence Alignment using Clustal Omega

Page 26: Survey of softwares for phylogenetic analysis

MSA result page

Page 27: Survey of softwares for phylogenetic analysis

Phylogenetic tree construction using ClustalW2 Phylogeny

Page 28: Survey of softwares for phylogenetic analysis

Result page of tree construction

Page 29: Survey of softwares for phylogenetic analysis
Page 30: Survey of softwares for phylogenetic analysis

Other softwares for phylogenetic analysisOther softwares for phylogenetic analysis

Page 31: Survey of softwares for phylogenetic analysis

Description: Phylogenetic inference packageMethods: Maximum parsimony, distance matrix, maximum likelihood

Page 32: Survey of softwares for phylogenetic analysis

Description: Molecular Evolutionary Genetics AnalysisMethods: Distance, Parsimony and Maximum Composite Likelihood Methods

Page 33: Survey of softwares for phylogenetic analysis

T­REX Description: Tree inference and visualization, Horizontal gene transfer detection, multiple sequence alignmentMethods: Distance (neighbor joining), Parsimony and Maximum likelihood (PhyML, RAxML) tree inference, MUSCLE, MAFFT and ClustalW sequence alignments and related applications

Page 34: Survey of softwares for phylogenetic analysis

Description:  Fast  and  free  multiplatform  tree editorMethods: based Phylip 3.6 package algorithms

Page 35: Survey of softwares for phylogenetic analysis

BEAST

Description: Bayesian Evolutionary Analysis Sampling TreesMethods: Bayesian inference, relaxed molecular clock, demographic history

Page 36: Survey of softwares for phylogenetic analysis

And many more......

Page 37: Survey of softwares for phylogenetic analysis

ag1805xag1805x