Applied Bioinformatics Week 8 Jens Allmer. Theory I.

34
Applied Bioinformatics Week 8 Jens Allmer

Transcript of Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Page 1: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Applied Bioinformatics

Week 8

Jens Allmer

Page 2: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Theory I

Page 3: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Phylogeny

• Sources– Sequences– Clades– Organisms

• Why– Understand evolution– Strain diversity– Epidemiology– Gene predicion

Page 4: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

? globin

plants

Ath-g

analogs

Page 5: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Dendrogram

http://en.wikipedia.org/wiki/Dendrogram

Page 6: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Phylogenetic Tree

Page 7: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Tree Terminology• All circled elements (e.g.: a) are called node(s)• The connections between them are called edge(s) or

branch(es)• The first node that forms the tree is called root

(here abcdef)• Terminal nodes that have only one connection are

called leaf(ves) (e.g.: a)

Unrooted Trees (remove red root)

Page 8: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Branch Length

• Arbitrary

• Similarity

• Evolutionary Time

Page 9: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Tree types

• A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree.

• A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time.

• A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths.

• A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

Page 10: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Sequences• DNA

– Sensitive but quite divergent at longer distances

– Use for very closely related organisms

• cDNA– Still sensitve but less divergent (e.g. introns)

– Use for closely related families

• Protein– Least sensitive but most useful for more distant relationships

– Use for distantly related species

• 16S RNA– Exists in all organisms

– Highly conserved

Page 11: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Overall Process

• Get Sequences• Construct MSA• Compute pairwise distances (for some methods)• Build Tree

– Topology

– Branch Lengths

• Estimate accuracy, reliability– Build several different trees for that

• Visualize the tree

Page 12: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Computational Tree Formation

• Distance Methods– Neighbor-Joining– Least-Squares– UPGMA

• Parsimony– Least number of evolutionary steps

• Maximum Likelihood– Highest probable tree to fit to the hypothesis is

constructed

Page 13: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Neighbor Joining

• Bottom-up clustering method1. Create distance map

2. Join closest nodes

3. Do (1-2) until fully joined

http://en.wikipedia.org/wiki/Neighbor_joining

Page 14: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Least Squares

• Standard approximation approach– Minimizes the sum of the error (squares)

• Example PGLS – Phylogenetic Generalized Least Squares– Needs additional data (traits)

http://www.dynamicgeometry.com/General_Resources/Advanced_Sketch_Gallery/Other_Explorations/Statistics_Collection/Least_Squares.html

Page 15: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

UPGMA

• Unweighted Pair Group Method with Arithmetic Mean– Aglomerative hierarchial clustering method– Assumes constant rate of evolution

Page 16: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Similarity Measures

• Sequence– Number of different positions

– Weighted differences• Substitution Matrices

– Pairwise alignments• NW, SW, ..

• Additional measurements or knowlege– Traits

• Parsimony– Number of changes for tree paths

Page 17: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Tree Accuracy

• Bootstrapping– Resample– Recompute– Do many times– Compare results

http://www.sciencedirect.com/science/article/pii/S0191814107000156

Page 18: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299

Page 19: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

End Theory I

• Mindmap

• Break

Page 20: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Practice I

Page 21: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Where to get Trees

• Most servers that allow for MSA will also provide at least the guide tree which was used to construct the alignment

• If that’s all you are interested in you don’t need to go any further

Page 22: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Edit your MSA

• Remove blocks consisting of mostly gaps (using JalView)

• Remove N- and C-termini if not conserved well

Page 23: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Easy Tree

• www.ebi.ac.uk/clustalw/• Paste your alignment• Select a tree type• Other options need to be set (see

right)• Press run• Make a screen shot• You can paste it where needed

Page 24: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Phylip (More elaborate tree)

• http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html

• Choose protdist from the page• Paste the MSA• Bootstrapping e.g.:

Page 25: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Phylip

• Run the query

• Click further analysis

Page 26: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Click Run

Select full screen view

There is your tree

Page 27: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Ugly Tree

• Let’s face it the tree is quite ugly• http://iubio.bio.indiana.edu/treeapp/treeprint-form.html

• Select the consense.outtree from the previous website and paste it into the box

• Select submit to create the tree

• Play around with the formats and settings

Page 28: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Tree Topologies

Page 29: Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Page 30: Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Page 31: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Other Resources

• http://en.wikipedia.org/wiki/List_of_phylogenetics_software

• http://itol.embl.de/

Page 32: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

MSA

• http://www.ebi.ac.uk/clustalw

• http://www.tcoffee.org

• http://www.drive5.com/muscle

• Try all the above and compare the resulting MSAs

Page 33: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Editing Alignments• http://www.jalview.org• Start the applet

• Choose File – Input Alignment – from Textbox• Copy and paste the ClustalW alignment

Page 34: Applied Bioinformatics Week 8 Jens Allmer. Theory I.

Playtime

• Be creative

• Explore the functions

• For saving you need to install locally

• JAVA applets are not allowed to save to your computer