Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning...
Transcript of Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning...
![Page 1: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/1.jpg)
Jason WilliamsCold Spring Harbor Laboratory, DNA Learning Center
[email protected]@JasonWilliamsNY
DNALC Live
Barcoding BioinformaticsPart III
![Page 2: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/2.jpg)
DNALC LiveThis is an experiment; give us feedback
on what you would like to see!
![Page 3: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/3.jpg)
DNALC Live• Provide genetics, molecular biology, and bioinformatics learning
resources
• Laboratory and computer demos, short online courses for middle school, high school, and the general public
• Interviews with scientists, help for teachers
• At-home activities, social media contests, and more
![Page 4: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/4.jpg)
dnalc.cshl.edu
DNALC Website and Social Media
dnalc.cshl.edu/dnalc-live
![Page 5: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/5.jpg)
youtube.com/DNALearningCenter
DNALC Website and Social Media
facebook.com/cshldnalc
@dnalc
@dna_learning_center
![Page 6: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/6.jpg)
Barcoding BioinformaticsPart III
![Page 7: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/7.jpg)
Who is this course for?
• Audience(s): US AP Biology (high school grades 10-12) AND Intro undergraduate biology
• Format: 3 sessions (1 per week); ~ 45 minutes each
• Exercises: Follow along with our online bioinformatics tool DNA Subway
• Learning resources: Slides and packet available (teachers can also request the teacher edition)
![Page 8: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/8.jpg)
Course Learning Goals
• Learn how DNA can be used to identify unknown organisms
• Understand how we obtain DNA Sequence and access its quality
• Use BLAST* to compare an unknown DNA Sequence to known sequences
• Compare DNA Sequences using phylogenetics
*AP Bio (Lab 3 – Comparing DNA Sequences)
![Page 9: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/9.jpg)
Lab Setup• We will be using DNA Subway – You can get a free account at cyverse.org
(optional)
![Page 10: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/10.jpg)
Barcoding BioinformaticsPart III
(Sequence alignment and phylogenetics)
![Page 11: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/11.jpg)
Steps for today’s session
• Recap on our experimental dataset
• Review of BLAST
• Introduction to multiple alignment
• Introduction to phylogenetic trees
![Page 12: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/12.jpg)
Recap of the dataset
![Page 13: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/13.jpg)
![Page 14: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/14.jpg)
Aedes adult Anopheles adult Culex adult
By Muhammad Mahdi Karim - Own work, GFDL 1.2, https://commons.wikimedia.org/w/index.php?curid=11185617
By Jim Gathany - (PHIL), ID #5814. https://commons.wikimedia.org/w/index.php?curid=799284 By Muhammad Mahdi Karim - Own work, GFDL 1.2, https://commons.wikimedia.org/w/index.php?curid=7673048
Aedes larva Anopheles larva Culex larva
Photograph by Michele M. Cutwa, University of Florida.Photograph by Michelle Cutwa-Francis, University of Florida.
![Page 15: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/15.jpg)
Steps to DNA Barcoding
ACGAGTCGGTAGCTGCCCTCTGACTGCATCGAATTGCTCCCCTACTACGTGCTATATGCGCTTACGATCGTACGAAGATTTATAGAATGCTGCTACTGCTCCCTTATTCGATAACTAGCTCGATTATAGCTACGATG
Organism is sampled DNA is extracted “Barcode” amplified
Sequenced DNA is compared with DNA in a barcode database
![Page 16: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/16.jpg)
Let’s do a BLAST
![Page 17: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/17.jpg)
Basic Local Alignment Search Tool
• An algorithm for searching a database of sequences
• “Google for DNA” (although works with any biologicalsequence, and started before Google ~1990 vs 1998)
• NCBI is the most popular interface, but this is software thatcan be run anywhere (including Subway)
![Page 18: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/18.jpg)
BLAST algorithm analogy
Query sequenceACTGACATCGGGGTGCTACG
Database
TGCTAACTGACTGCTAACTGAGACTG
TGCTAAC
TGCTAACTAC
TGCTAACTGAG
TGCTAACTGAG
ACTGTGCTAAC
![Page 19: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/19.jpg)
Where in the DNA should we look for the “Barcode”?
![Page 20: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/20.jpg)
Where in the DNA should we look?
Photo Credit:https://www.broadinstitute.org/files/styles/landing_page/public/generic-pages/images/circle/MPG-mosaic_385x300.png?itok=B5zrVgYz
Humans share greater than 99% of their DNA with other humans
![Page 21: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/21.jpg)
Where in the DNA should we look?
Photo credit:https://www.cnbc.com/2015/09/02/can-you-find-the-forgery.html
![Page 22: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/22.jpg)
Where in the DNA should we look?
Photo credit:https://www.cnbc.com/2015/09/02/can-you-find-the-forgery.html
COUNTERFEIT
![Page 23: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/23.jpg)
Where in the DNA should we look?>Human Tubulin –Black? GCAGGTTCTCTTACATCGACCGCCTAAGAGTCGCGCTGTAAGAAGCAACAACCTCTCCTCTTCGTCTCCG CCATCAGCTCGGCAGTCGCGAAGCAGCAACCATGCGTGAGTGCATCTCCATCCACGTTGGCCAGGCTGGT GTCCAGATTGGCAATGCCTGCTGGGAGCTCTACTGCCTGGAACACGGCATCCAGCCCGATGGCCAGATGC CAAGTGACAAGACCATTGGGGGAGGAGATGATTCCTTCAACACCTTCTTCAGTGAGACGGGGGCTGGCAA GCATGTGCCCCGGGCAGTGTTTGTAGACTTGGAACCCACAGTCATTGATGAAGTTCGCACTGGCACCTAC CGCCAGCTCTTCCACCCTGAGCAACTTATCACAGGCAAAGAAGATGCTGCCAATAACTATGCCCGAGGGC ACTACACCATTGGCAAGGAGATCATTGACCTCGTGTTGGACCGAATTCGCAAGCTGGCCGACCAGTGCAC Almost everywhere we
look in our (human) genome, we all look the same
>Human Tubulin –White?GCAGGTTCTCTTACATCGACCGCCTAAGAGTCGCGCTGTAAGAAGCAACAACCTCTCCTCTTCGTCTCCG CCATCAGCTCGGCAGTCGCGAAGCAGCAACCATGCGTGAGTGCATCTCCATCCACGTTGGCCAGGCTGGT GTCCAGATTGGCAATGCCTGCTGGGAGCTCTACTGCCTGGAACACGGCATCCAGCCCGATGGCCAGATGC CAAGTGACAAGACCATTGGGGGAGGAGATGATTCCTTCAACACCTTCTTCAGTGAGACGGGGGCTGGCAA GCATGTGCCCCGGGCAGTGTTTGTAGACTTGGAACCCACAGTCATTGATGAAGTTCGCACTGGCACCTAC CGCCAGCTCTTCCACCCTGAGCAACTTATCACAGGCAAAGAAGATGCTGCCAATAACTATGCCCGAGGGC ACTACACCATTGGCAAGGAGATCATTGACCTCGTGTTGGACCGAATTCGCAAGCTGGCCGACCAGTGCAC
>Human Tubulin –Asian? GCAGGTTCTCTTACATCGACCGCCTAAGAGTCGCGCTGTAAGAAGCAACAACCTCTCCTCTTCGTCTCCG CCATCAGCTCGGCAGTCGCGAAGCAGCAACCATGCGTGAGTGCATCTCCATCCACGTTGGCCAGGCTGGT GTCCAGATTGGCAATGCCTGCTGGGAGCTCTACTGCCTGGAACACGGCATCCAGCCCGATGGCCAGATGC CAAGTGACAAGACCATTGGGGGAGGAGATGATTCCTTCAACACCTTCTTCAGTGAGACGGGGGCTGGCAA GCATGTGCCCCGGGCAGTGTTTGTAGACTTGGAACCCACAGTCATTGATGAAGTTCGCACTGGCACCTAC CGCCAGCTCTTCCACCCTGAGCAACTTATCACAGGCAAAGAAGATGCTGCCAATAACTATGCCCGAGGGC ACTACACCATTGGCAAGGAGATCATTGACCTCGTGTTGGACCGAATTCGCAAGCTGGCCGACCAGTGCAC
![Page 24: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/24.jpg)
Cytochrome c oxidase I (COI)
Photo credit:Structure: https://en.wikipedia.org/wiki/Cytochrome_c_oxidase_subunit_I#/media/File:PDB_1occ_EBI.jpgGenome map:Emmanuel Douzery; https://en.wikipedia.org/wiki/Cytochrome_c_oxidase_subunit_I#/media/File:Map_of_the_human_mitochondrial_genome.svg
![Page 25: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/25.jpg)
Choosing a barcoding locus
There are many criteria that go in to selecting an appropriate locus (location in the genome) that can serve as a barcode.
Three of them include:
• Universality• Discrimination • Robustness
![Page 26: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/26.jpg)
Universality
Since barcoding protocols (typically) amplify a region of DNA by PCR, you need to choose a DNA sequence that every species has
![Page 27: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/27.jpg)
Universality
Since barcoding protocols (typically) amplify a region of DNA by PCR, you need to choose a DNA sequence that every species has
![Page 28: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/28.jpg)
Universality
Since barcoding protocols (typically) amplify a region of DNA by PCR, you need to choose a DNA sequence that every species has
Photo credit:Dental formulahttps://slideplayer.com/slide/10972461/Animal diagramshttps://vet-science.blogspot.com/2012/01/dentition-in-sheep-and-goat.html
![Page 29: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/29.jpg)
Discrimination
Barcoding regions must be different for each species. Ideally you are looking for a single DNA locus which differs in each species
![Page 30: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/30.jpg)
Discrimination
Barcoding regions must be different for each species. Ideally you are looking for a single DNA locus which differs in each species
Photo creditHuman hairhttps://lewigs.com/human-hair-color-101/
![Page 31: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/31.jpg)
Discrimination
Barcoding regions must be different for each species. Ideally you are looking for a single DNA locus which differs in each species
Photo credithttp://loreal-dam-videos-corp-en-cdn.brainsonic.com/corpen/20160330pm/20160330-170533-c4bb4cb7/picture_photo_3eadc4.jpg
![Page 32: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/32.jpg)
Discrimination (but not too much)
Fail: Sequence is completely conserved, good for PCR, but uninformative as barcode
![Page 33: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/33.jpg)
Discrimination (but not too much)
Fail: Sequence shows no conservation, impossible for PCR, but good as barcode
![Page 34: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/34.jpg)
Discrimination (but not too much)
Win: Sequence shows some (ideally ~70%) conservation, good for PCR, good as barcode
![Page 35: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/35.jpg)
Robustness
Since barcoding protocols (typically) amplify a region of DNA by PCR, also need to select a locus that amplifies reliably and sequences well
Photo credithttps://www.bio-rad.com/es-mx/applications-technologies/pcr-troubleshooting?ID=LUSO3HC4S
![Page 36: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/36.jpg)
Choosing a barcoding locus
Cytochrome Oxidase C subunit 1 (COI):
• Universal• Discriminating• Robust
Photo credit:https://en.wikipedia.org/wiki/Cytochrome_c_oxidase#/media/File:Cmplx4.PNG
![Page 37: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/37.jpg)
Reference data and phylogenetics
![Page 38: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/38.jpg)
Experimental components/design
• We have DNA from unknown mosquito samples• We can obtain DNA from known samples
Materials
Hypothesis• We can use computational methods (BLAST/phylogenetic analysis) to
infer the species
Controls• We have sensitivity controls (sequence quality, BLAST parameters)• We have outgroup sequences (non-mosquito, negative controls) and
known samples (positive controls)
![Page 39: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/39.jpg)
Intro to Multiple Sequence Alignment
![Page 40: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/40.jpg)
Multiple sequence alignment
Photo credit:https://www.pexels.com/photo/jigsaw-puzzle-1586950/
To compare sequence features (nucleotides, amino acids, etc.) we need to line them up
![Page 41: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/41.jpg)
Warning: Analogy(useful for discussion but not the whole picture)
![Page 42: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/42.jpg)
Multiple sequence alignment
THEFISHISINTHEWATERTH’FESHISINTH’WATERDEVISISINHETWATERDIEVISISINDIEWATERDERFISCHISTIMWASSERFISKINERÍVATNI
![Page 43: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/43.jpg)
Multiple sequence alignment
![Page 44: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/44.jpg)
Multiple sequence alignment
Colored at 60% identity
![Page 45: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/45.jpg)
Multiple sequence alignment
There is more information here …
![Page 46: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/46.jpg)
Multiple sequence alignment
There is more information here …
![Page 47: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/47.jpg)
Multiple sequence alignment
There is more information here …
![Page 48: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/48.jpg)
Multiple sequence alignment
There is more information here …
![Page 49: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/49.jpg)
Multiple sequence alignment
Photo credit:https://www.pexels.com/photo/jigsaw-puzzle-1586950/
!" !"! !
≈ !!"
$"
Number of (global) alignments for 2 sequences of length n
![Page 50: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/50.jpg)
Multiple sequence alignment
Photo credit:https://www.pexels.com/photo/jigsaw-puzzle-1586950/
!" !"! !
≈ !!"
$"
Number of (global) alignments for 2 sequences of length n
So, for 2 sequences (n=100) ≈ 1077 *
- We need software!
*(some are trivial)
![Page 51: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/51.jpg)
CLUSTAL alignment algorithm
Photo credit:https://slideplayer.com/slide/5155996/
![Page 52: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/52.jpg)
Multiple sequence alignment
![Page 53: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/53.jpg)
Making phylogenetic trees
![Page 54: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/54.jpg)
Tree relationships
![Page 55: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/55.jpg)
Tree relationshipsThis is not a phylogenetic tree
![Page 56: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/56.jpg)
Phylogenetic trees
A phylogenetic tree is a hypothesis about howspecies may be related
Photo credithttps://en.wikipedia.org/wiki/File:Phylogenetic_tree.svg
![Page 57: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/57.jpg)
Phylogenetic trees
Trees can be created from(and therefore reflect)characteristics/traits
Photo credithttps://kids.britannica.com/students/assembly/view/235364
![Page 58: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/58.jpg)
Phylogenetic trees
Trees can be created from(and therefore reflect)characteristics/traits
Photo credithttps://wildlifesnpits.wordpress.com/2014/03/22/understanding-phylogenies-terminology/
![Page 59: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/59.jpg)
Phylogenetic treesPh
oto
cred
ithttps://wildlifesnp
its.wordp
ress.com
/2014/03/22/un
derstand
ing-ph
ylogen
ies-term
inology/
![Page 60: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/60.jpg)
Phylogenetic trees
Photo credithttps://wildlifesnpits.wordpress.com/2014/03/22/understanding-phylogenies-terminology/
Tree Vocabulary
• Taxa: Individual species• Tip: The endpoints of a tree• Node: A point where branches split• Branch: Any collection after a node
![Page 61: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/61.jpg)
Tree experiments• Experiment 1: Unknown Mosquitos and outgroup• Method: neighbor joining
• Experiment 2: Unknown Mosquitos and outgroup• Method: maximum likelihood
• Experiment 3: Unknown Mosquitos + BLAST hits and outgroup• Method: neighbor joining
• Experiment 4: Unknown Mosquitos + BLAST hits and outgroup• Method: maximum likelihood
![Page 62: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/62.jpg)
Trees are also computationally intensive
Photo credit:https://www.slideshare.net/sebastiendelandtsheer/phylogenetics1
![Page 63: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/63.jpg)
Tree building methods
• There is no one “best” method
![Page 64: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/64.jpg)
Tree building methods
• There is no one “best” method
Neighbor joining:
• “distance-based” method of tree building
![Page 65: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/65.jpg)
Tree building methods
• There is no one “best” method
Neighbor joining:
• “distance-based” method of tree building • a matrix of the sequences is created based on distance between
each pair of sequences in multiple alignment
![Page 66: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/66.jpg)
Tree building methods
• There is no one “best” method
Neighbor joining:
• “distance-based” method of tree building • a matrix of the sequences is created based on distance between
each pair of sequences in multiple alignment• distance is related to the number of mismatches between the
sequences
![Page 67: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/67.jpg)
Tree building methods
Bootstrap values (how we evaluate NJ trees):
• columns in the sequence alignment randomly resampled to make many new alignments (DNA subway does this 100x)
![Page 68: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/68.jpg)
Tree building methods
Bootstrap values (how we evaluate NJ trees):
• columns in the sequence alignment randomly resampled to make many new alignments (DNA subway does this 100x)• a matrix of the sequences is created
![Page 69: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/69.jpg)
Tree building methods
Bootstrap values (how we evaluate NJ trees):
• columns in the sequence alignment randomly resampled to make many new alignments (DNA subway does this 100x)• a matrix of the sequences is created• each bootstrap value is the number of times that particular
relationship appears in the 100 resampled trees
![Page 70: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/70.jpg)
Tree building methods
Bootstrap values (how we evaluate NJ trees):
• columns in the sequence alignment randomly resampled to make many new alignments (DNA subway does this 100x)• a matrix of the sequences is created• each bootstrap value is the number of times that particular
relationship appears in the 100 resampled trees • Values of 70 and above are “plausible”; above 95 considered
highly supported
![Page 71: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/71.jpg)
Tree building methods
Maximum likelihood:
• Attempts to take into account observed patterns of how nucleotides and amino acids change over time. For instance, mutations from C to T are more common than mutations of C to A.
![Page 72: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/72.jpg)
Tree building methods
Maximum likelihood:
• Attempts to take into account observed patterns of how nucleotides and amino acids change over time. For instance, mutations from C to T are more common than mutations of C to A.
• The tree with highest overall likelihood score is accepted as the best estimate of the relationships between sequences.
![Page 73: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/73.jpg)
Tree building methods
Maximum likelihood:
• Attempts to take into account observed patterns of how nucleotides and amino acids change over time. For instance, mutations from C to T are more common than mutations of C to A.
• The tree with highest overall likelihood score is accepted as the best estimate of the relationships between sequences.
• The length of the branches between nodes is also a measure of the confidence of the relationships. Very short branches separating sequences have lower confidence and the relationships are less certain, while longer branches are better supported.
![Page 74: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/74.jpg)
Summary
![Page 75: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/75.jpg)
Course Learning Goals
• Learn how DNA can be used to identify unknown organisms
• Understand how we obtain DNA Sequence and access its quality
• Use BLAST* to compare an unknown DNA Sequence to known sequences
• Compare DNA Sequences using phylogenetics
*AP Bio (Lab 3 – Comparing DNA Sequences)
![Page 76: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/76.jpg)
See the handouts for even more explanations and activities
![Page 77: Barcoding Bioinformatics Part III · Jason Williams Cold Spring Harbor Laboratory, DNA Learning Center williams@cshl.edu @JasonWilliamsNY e Barcoding Bioinformatics Part III](https://reader035.fdocuments.net/reader035/viewer/2022081607/5edf3024ad6a402d666a89ce/html5/thumbnails/77.jpg)
dnalc.cshl.edu
DNALC Website and Social Media
dnalc.cshl.edu/dnalc-live