1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life...
-
Upload
harry-young -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life...
1
Phylogeny WorkshopPhylogeny Workshop
By Eyal Privman
The Bioinformatics UnitG.S. Wise Faculty of Life Science
Tel Aviv University, IsraelNovember 2009
http://ibis.tau.ac.il/twiki/bin/view/Bioinformatics/Phylogeny2009
2Why should weWhy should we
care about phylogeny?care about phylogeny?
"Nothing in biology makes sense except in the light of evolution"
(Theodosius Dobzhansky, 1973)
33 Alignment and phylogeny are mutually dependant
Inaccurate tree building
MSA
Sequence alignment
0.4
Phylogeny reconstruction
Unaligned sequences
44 Alignment and phylogeny are both challenging
25% of residues are
aligned wrong
Based on BAliBASE: a large representative set of proteins
55 Alignment and phylogeny are both challenging
5% of tree branches are wrong
Based on simulations of 100 protein sequences
66 Multiple sequence alignment (MSA)
progressive alignment
ABCDE
Guide tree
A
DCB
E
MSA
Pairwise distance table Iterative
77
Multiple sequence alignment (MSA)
Several advanced MSA programs are available.Today we will use two:
• MAFFT – fastest and one of the most accurate
• PRANK – distinct from all other MSA programs because of its correct treatment of insertions/deletions
88
MAFFT• Web server & download:
http://align.bmr.kyushu-u.ac.jp/mafft/online/server/
• Efficiency-tuned variants quick & dirty or slow but accurate
Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066© 2002 Oxford University Press
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*
99
Choosing a MAFFT strategy
quick & dirty slow
but accurate
1010
Choosing a MAFFT strategy
quick & dirty slow
but accurate
1111
Choosing a MAFFT strategy
quick & dirty slow
but accurate
1212
Choosing a MAFFT strategy
L-INS-i
ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------
--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------
------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------
--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo
--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------
G-INS-i
XXXXXXXXXXX-XXXXXXXXXXXXXXX
XX-XXXXXXXXXXXXXXX-XXXXXXXX
XXXXX----XXXXXXXX---XXXXXXX
XXXXX-XXXXXXXXXX----XXXXXXX
XXXXXXXXXXXXXXXX----XXXXXXX
E-INS-i
oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo
---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------
-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo
---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------
---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------
quick & dirty slow
but accurate
1313
MAFFT outputSaving the output
• Choose a format: Clustal, Fasta, or click "Reformat" to convert to a selection of other formats
• Save page as a text file
A colored view of the alignment
1414PRANK
1515
Classical alignment errors for HIV env
1616
PRANK
• Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/
1717
PRANK output
If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/
1818
Downloadable PRANK
• http://www.ebi.ac.uk/goldman-srv/prank/prank/– PRANK: A command-line program interface
– PRANKSTER: A program with graphical user interface
1919 1. Download and unzip the sequence files from my homepage (Google "Eyal Privman" and look for the workshop materials under "Teaching"). Open "fahA.fas" in Notepad – these are 65 protein sequences in FASTA format.
2. Run PRANKSTER, open the "fahA.fas" file, and run "Alignment""Make alignment"
3. While you wait: Copy the sequences into the MAFFT web server and run the "automatic" "moderate" strategy – which strategy did MAFFT choose for you? Click "Reformat", choose "phylip|phylip4", and save as "fahA.mafft.phylip"
4. When PRANKSTER finishes click FileSave, and save the MSA in Phylip format by the name "fahA.prank.phylip"
2020
Phylogeny reconstruction
Different approaches (algorithms / programs):
• Distance based methods (e.g. neighbor-joining, as in ClustalW) Fast but inaccurate
• Maximum parsimony (e.g. MEGA)
• Maximum likelihood methods (e.g. phyML, RAxML) Accurate but slower
• Bayesian methods (e.g. MrBayes) Most accurate but very slow
ABCDE
Guide tree
A
DCB
E
MSA
Pairwise distance table
2121
PhyMLThe most widely used maximum likelihood (ML) program
• Web server & download: http://www.atgc-montpellier.fr/phyml/
Accepts input MSA in PHYLIP format only:
• Interleaved: • Sequencial:
2222
Downloadable PhyMLLess user-friendly, but allows using local computer power
• Run "phyml.bat"
• Drag the file from Windows Explorer to the blue window
• Enter "d" to switch fromDNA to AA
• Enter "y" to run
2323
1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your email)
2. Run it with the local installation of "phyml.bat"
You should end up with a file: "fahA.prank.phylip_phyml_tree.txt"
2424
RAxML
• Web server: http://phylobench.vital-it.ch/raxml-bb/
• Similar maximum likelihood (ML) methodology as phyML, but much faster Faster results Better results in same run-time
2525
Downloadable RAxML
• A command-line program:http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm(On that page you will also find instructions for running on Windows, and the RAxML manual)
• easyRAx takes care of some of the RAxML options for you: http://projects.exeter.ac.uk/ceem/easyRAx.htmlbut installation is a somewhat more complex
2626
1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and enter your email)
Save the resulting tree file as: "fahA.prank.phylip.raxml"
2727 FigTree: tree visualization and figure creation
Manipulate a node
Manipulate a clade
Manipulate a taxon
2828
1. Open "fahA.prank.phylip_phyml_tree.txt" in FigTree
2. Play around with the different options and make a pretty figure!
1. Find out how to color specific clades, as below
2. Try each of the three options under "Layout"
3. Export a figure in PDF format(File Export Graphic…)
29
Thanks for your attentionThanks for your attention
andand
happy phylogeny…happy phylogeny…