Remarks About Homework
-
Upload
chancellor-hendrix -
Category
Documents
-
view
17 -
download
1
description
Transcript of Remarks About Homework
![Page 1: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/1.jpg)
Remarks About HomeworkRemarks About Homework
Write detailed answersWrite detailed answers
Pay attention to details in the questionsPay attention to details in the questions
“… “… nor can the shy man learn…”nor can the shy man learn…”
![Page 2: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/2.jpg)
Multiple Multiple Sequence Sequence
Alignment (MSA)Alignment (MSA)andand
Phylogeny Phylogeny
![Page 3: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/3.jpg)
OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file
![Page 4: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/4.jpg)
OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file
![Page 5: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/5.jpg)
MSA input: multiple sequence MSA input: multiple sequence Fasta fileFasta file
>gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI
>gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI
>gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT
>gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI
![Page 6: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/6.jpg)
Clustal XClustal X
![Page 7: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/7.jpg)
Step1: Load the sequencesStep1: Load the sequences
![Page 8: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/8.jpg)
Uploaded sequencesUploaded sequences
A little unclear…
![Page 9: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/9.jpg)
Edit Fasta headersEdit Fasta headers…… MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI
MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI
MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT
MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI
>Homo_sapiens_CD4
>Pan_troglodytes_CD4
>Sus_scrofa_CD4
>Rattus_norvegicus_CD4
>gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]
>gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes]
>gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa]
>gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus]
![Page 10: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/10.jpg)
Uploaded sequencesUploaded sequences
Much better
![Page 11: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/11.jpg)
Step2: Perform alignmentStep2: Perform alignment
![Page 12: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/12.jpg)
Multiple Sequence Alignment and Multiple Sequence Alignment and conservation viewconservation view
![Page 13: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/13.jpg)
Step 3: Create treeStep 3: Create tree
![Page 14: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/14.jpg)
The Newick tree format is used to represent trees as strings
CA D
In Newick format: ((A,C),(B,D));
B
• Each pair of parenthesis () encloses a clade in the tree • A comma “,” separates the members of the corresponding clade• A semicolon “;” is always the last character
![Page 15: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/15.jpg)
Step 4: View tree with NJPlotStep 4: View tree with NJPlot
Note :unrooted tree
![Page 16: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/16.jpg)
CB
A
A
B
C
=
B
C
A
=B
C
A
=
![Page 17: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/17.jpg)
Rooted vs. unrooted trees
1
2
3A
B
C
1
CBA
2
BCA
3
ABC
≠
≠
![Page 18: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/18.jpg)
How would each tree look in Newick format?
1
2
3A
B
C
1
CBA
2
BCA
3
ABC
≠
≠
((C,B),A) ((A,B),C)
((A,C),B)(A,B,C)
![Page 19: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/19.jpg)
Step 4.5: defining an outgroupStep 4.5: defining an outgroup
![Page 20: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/20.jpg)
Step 4: View tree with NJPlotStep 4: View tree with NJPlot
Note :The order
inside a split doesn’t matter
![Page 21: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/21.jpg)
Chimp HumanGorillaHuman ChimpGorilla
=
Chimp GorillaHuman
= =
Human GorillaChimp
(Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human))
= ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla)
![Page 22: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/22.jpg)
![Page 23: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/23.jpg)
How How robustrobust is our tree is our tree??
![Page 24: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/24.jpg)
We need some statistical way to estimate the We need some statistical way to estimate the confidence in the tree topology confidence in the tree topology (like we need the E-(like we need the E-value to estimate the confidence of a blast hit)value to estimate the confidence of a blast hit)
But we don’t know anything about the But we don’t know anything about the distribution of tree topologiesdistribution of tree topologies
The only data source we have is our data (MSA)The only data source we have is our data (MSA) So, we must rely on our own resources: So, we must rely on our own resources: “pull up “pull up
by your own bootstraps”by your own bootstraps”
How robust is our treeHow robust is our tree??
![Page 25: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/25.jpg)
Bootstrap
![Page 26: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/26.jpg)
Bootstrap1. Create n (100-1000) new MSAs (pseudo-datasets) by randomly sampling K positions from our original MSA with replacement
12345 K1 : ATCTG…A 2 : ATCTG…C3 : ACTTA…C 4 : ACCTA…T
11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C
97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A
51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C
![Page 27: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/27.jpg)
Bootstrap2. Reconstruct a pseudo-tree from each pseudo-dataset using the same method used for reconstructing the original tree
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C
97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A
51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C
![Page 28: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/28.jpg)
Bootstrap3. For each node in our original tree, we count the number of times it appeared in the pseudo-trees Sp1
Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3
Sp4
67%100%
![Page 29: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/29.jpg)
Step 3.5 - BootstrapStep 3.5 - Bootstrap
![Page 30: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/30.jpg)
Bootstrap values on NJPlotBootstrap values on NJPlot
Note:ClustalX saves trees with .ph extension. Trees with bootstrap are saved with .phb extension
![Page 31: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/31.jpg)
Reconstructing the tree of lifeReconstructing the tree of life
![Page 32: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/32.jpg)
Darwin’s vision of the tree of life Darwin’s vision of the tree of life from the from the Origin of SpeciesOrigin of Species
![Page 33: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/33.jpg)
Based on molecular data (SSU Based on molecular data (SSU rRNA), branching of several rRNA), branching of several kingdoms remain in disputekingdoms remain in dispute
![Page 34: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/34.jpg)
Lateral Gene Transfer (LGT) Lateral Gene Transfer (LGT) Challenges the Conceptual Basis Challenges the Conceptual Basis
of Phylogenetic Classificationof Phylogenetic Classification
![Page 35: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/35.jpg)
Science 3 March 2006:Vol. 311. no. 5765, pp. 1283 - 1287
Toward Automatic Reconstruction of a Highly Resolved Tree of Life
![Page 36: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/36.jpg)
MethodologyMethodology Started with 36 genes universally present in 191 Started with 36 genes universally present in 191
species (spanning all 3 domains of life), for species (spanning all 3 domains of life), for which orthologs could be unambiguously which orthologs could be unambiguously identifiedidentified
Eliminated 5 genes that are LGT suspects Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases)(mostly tRNA synthetases)
Constructed an MSA for each of the 31 Constructed an MSA for each of the 31 orthogroupsorthogroups
Concatenated all 31 MSAs to a super-MSA of Concatenated all 31 MSAs to a super-MSA of 8090 columns8090 columns
The phylogeny was reconstructed based on the The phylogeny was reconstructed based on the super-MSA using the maximum likelihood super-MSA using the maximum likelihood approachapproach
![Page 37: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/37.jpg)
Archaea
Eukaryota
Bacteria
![Page 38: Remarks About Homework](https://reader036.fdocuments.net/reader036/viewer/2022062422/5681336f550346895d9a82d9/html5/thumbnails/38.jpg)
Tree supportTree support
81.7% of the branches show bootstrap 81.7% of the branches show bootstrap support of over 80%support of over 80%
65% of the branches show bootstrap 65% of the branches show bootstrap support of 100%support of 100%
However, several deep branchings show However, several deep branchings show low supportslow supports