MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu
description
Transcript of MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu
![Page 1: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/1.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 1
MW 11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu
CS173
Lecture 11: Repeats II, Mutations
![Page 2: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/2.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 2
Announcements• TA HW1 Comments
![Page 3: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/3.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 3
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
![Page 4: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/4.jpg)
Transcription
http://cs173.stanford.edu [BejeranoWinter12/13] 4
![Page 5: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/5.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 5
Transcription RegulationChromatin / Proteins
DNA / Proteins
Extracellular signals
![Page 6: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/6.jpg)
Repeats
http://cs173.stanford.edu [BejeranoWinter12/13] 6
![Page 7: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/7.jpg)
Sequences that repeat many times in the genome
• Take up cumulatively a whooping half of the genome• Come in two major, very different, flavors
http://cs173.stanford.edu [BejeranoWinter12/13] 7
I
II
![Page 8: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/8.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 8
I. Interspersed Repeats
Get a copy out of the genome, and into a new location.
![Page 9: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/9.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 9
II. Simple Repeats
•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.
•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.
•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.
•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.
•Highly variable between species: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.
AAAAAAAAACACACACACCAACAACAA
![Page 10: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/10.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 10
DNA Replication
![Page 11: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/11.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 11
Simple Repeats Create Funky DNA structures
![Page 12: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/12.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 12
These Bumps Give The DNA Polymerase Hiccups
![Page 13: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/13.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 13
Expandable Repeats and Disease
![Page 14: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/14.jpg)
Restriction Enzymes• Restriction enzymes recognize and make a cut within
specific DNA sequences, known as restriction sites. • This is usually a 4-6 base pair palindromic sequence.• Naturally found in different types of bacteria• Bacteria use restriction enzymes to protect themselves
from foreign DNA • Many have been isolated and sold for use in lab work
http://cs173.stanford.edu [BejeranoWinter12/13] 14
blunt end
sticky end
![Page 15: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/15.jpg)
DNA Fingerprint BasicsDNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.
15
![Page 16: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/16.jpg)
DNA fragments are then separated based on size using gel
electrophoresis.
16
![Page 17: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/17.jpg)
DNA Fingerprinting can be used in paternity testing or
murder cases.
17
![Page 18: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/18.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 18
There are Tracks for it
![Page 19: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/19.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 19
Interspersed vs. Simple Repeats
From an evolutionary point of view transposons and simple repeats are very different.
Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).
Different instances of the same simple repeat most often do not.
![Page 20: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/20.jpg)
Genome Content, Genome Function DONE• Transcripts
• Protein coding genes• Non-coding RNAs
• Gene regulatory elements• Promoters• Enhancers• Repressors• Insulators
• Epigenomics• Nucleosomes, open chromatin• Histone modifications
• Repeats• Interspersed repeats / mobile elements• Simple repeats
http://cs173.stanford.edu [BejeranoWinter12/13] 20
![Page 21: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/21.jpg)
Categories are NOT mutually exclusive• We already discussed repeat instances that became
• Coding exons• Enhancers
• There are known genomic loci that• Code for protein coding exons and act as enhancers.• Ditto for non-coding RNA + enhancer.
• There are bi-direction exons• Coding in both directions• Coding and anti-sense• Both non-coding
http://cs173.stanford.edu [BejeranoWinter12/13] 21
![Page 22: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/22.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 22
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
![Page 23: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/23.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 23
human
mouserat
chimp
chicken
fugu
zfish
dog
tetra
human
mouserat
chimp
chicken
fugu
zfish
dog
tetra
opossum
cow
macaque
platypus
opossum
cow
macaque
platypus
Comparative Genomics
“Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky
t
![Page 24: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/24.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 24
The genome is constantly replicated
Every cell holds 2 copies of all its DNA = its genome.The human body is made of ~1013 cells.All originate from a single cell through repeated cell divisions.
cell
genome =all DNA
chicken ≈ 1013 copies(DNA) of egg (DNA)
chicken
eggegg
egg
celldivision
DNA strings =Chromosomes
![Page 25: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/25.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 25
Evolution = Mutation + Selection
Mistakes can happen during DNA replication. Mistakes are oblivious to DNA segment function. But then selection kicks in.
...ACGTACGACTGACTAGCATCGACTACGA...
chicken
egg ...ACGTACGACTGACTAGCATCGACTACGA...
functionaljunk
TT CAT
“anythinggoes”
many changesare not tolerated
chicken
This has bad implications – disease, and good implications – adaptation.
![Page 26: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/26.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 26
Mutation
![Page 27: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/27.jpg)
Chromosomal (ie big) Mutations
• Five types exist:–Deletion–Inversion–Duplication–Translocation–Nondisjunction
![Page 28: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/28.jpg)
Deletion• Due to breakage• A piece of a
chromosome is lost
![Page 29: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/29.jpg)
Inversion• Chromosome segment
breaks off• Segment flips around
backwards• Segment reattaches
![Page 30: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/30.jpg)
Duplication• Occurs when a
genomic region is repeated
![Page 31: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/31.jpg)
Whole Genome Duplication at the Base of the Vertebrate Tree
http://cs173.stanford.edu [BejeranoWinter12/13] 31
Xen.Laevis WGD
![Page 32: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/32.jpg)
Translocation• Involves two
chromosomes that aren’t homologous
• Part of one chromosome is transferred to another chromosomes
![Page 33: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/33.jpg)
Nondisjunction• Failure of chromosomes to
separate during meiosis• Causes gamete to have too many
or too few chromosomes• Disorders:
– Down Syndrome – three 21st chromosomes
– Turner Syndrome – single X chromosome– Klinefelter’s Syndrome – XXY
chromosomes
![Page 34: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/34.jpg)
Genomic (ie small) Mutations
• Six types exist:–Substitution (eg GT)
–Deletion–Insertion–Inversion–Duplication–Translocation
![Page 35: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/35.jpg)
35
Example: Human-Chimp Genomic DifferencesN
umbe
r of e
vent
s
Nucleotid
e substi
tutions
Indels
< 10 Kb
Microinve
rsions <
100 Kb
Deletions/D
uplicatio
ns
Microinve
rsions >
100 Kb
Pericentr
ic inve
rsions
Fusion
![Page 36: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/36.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 36
Inferring Genomic MutationsFrom Alignments of Genomes
![Page 37: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/37.jpg)
37
A Gene tree evolves with respect to a Species tree
Species tree
Gene tree
SpeciationDuplicationLoss
By “Gene” we meanany piece of DNA.
![Page 38: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/38.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 38
TerminologyOrthologs : Genes related via speciation (e.g. C,M,H3)Paralogs: Genes related through duplication (e.g. H1,H2,H3)Homologs: Genes that share a common origin
(e.g. C,M,H1,H2,H3)
Species tree
Gene tree
SpeciationDuplicationLoss
singleancestralgene
![Page 39: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/39.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 39
Typical Molecular DistancesIf they were only evolving neutrally:• To which is H1 closer in sequence, H2 or H3?• To which H is M closest?• And C?(Selection may change distances)
Species tree
Gene tree
SpeciationDuplicationLoss
singleancestralgene
![Page 40: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/40.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 40
Gene trees and even species trees are figments of our (scientific) imagination
Species trees and gene trees can be wrong.All we really have are extant observations, and fossils.
Species tree
Gene tree
SpeciationDuplicationLoss
singleancestralgene
ObservedInferred
![Page 41: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/41.jpg)
Gene Families
41
![Page 42: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/42.jpg)
Sequence Alignment
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
DefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,
an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each
letter in one sequence with either a letter, or a gapin the other sequence
AGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC
![Page 43: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/43.jpg)
Scoring Function
• Sequence edits:AGGCCTC
Mutations AGGACTC
Insertions AGGGCCTC
Deletions AGG . CTC
Scoring Function:Match: +mMismatch: -sGap: -d
Score F = (# matches) m - (# mismatches) s – (#gaps) d
Alternative definition:
minimal edit distance
“Given two strings x, y,find minimum # of edits (insertions, deletions,
mutations) to transform one string to the other”
Cost of edit operationsneeds to be biologicallyinspired (eg DEL length).
Solve via Dynamic Programming
![Page 44: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/44.jpg)
Are two sequences homologous?
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Given an (optimal) alignment between two genome regions,you can ask what is the probability that they are (not) related by homology?
Note that (when known) the answer is a function of the molecular distance between the two (eg, between two species)
AGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC
DP matrix:
![Page 45: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/45.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 45
Chaining Alignments
Chaining highlights homologous regions between genomes (it bridges the gulf between syntenic blocks and base-by-base alignments.
Local alignments tend to break at transposon insertions, inversions, duplications, etc.
Global alignments tend to force non-homologous bases to align.Chaining is a rigorous way of joining together local alignments into larger structures.
dot plots:DP matrix:
![Page 46: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/46.jpg)
46
“Raw” (B)lastz track (no longer displayed)
Protease Regulatory Subunit 3
Alignment = homologous regions
![Page 47: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/47.jpg)
Chains & Nets: How they’re built
• 1: Blastz one genome to another– Local alignment algorithm– Finds short blocks of similarity
Hg18: AAAAAACCCCCAAAAAMm8: AAAAAAGGGGG
Hg18.1-6 + AAAAAAMm8.1-6 + AAAAAA
Hg18.7-11 + CCCCCMm8.1-5 - CCCCC
Hg18.12-16 + AAAAAMm8.1-5 + AAAAA
47
![Page 48: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/48.jpg)
Chains & Nets: How they’re built• 2: “Chain” alignment blocks together
– Links blocks that preserve order and orientation– Not single coverage in either species
Hg18: AAAAAACCCCCAAAAAMm8: AAAAAAGGGGGAAAAA
Hg18: AAAAAACCCCCAAAAA Mm8 chains
Mm8.1-6 +
Mm8.7-11 -
Mm8.12-16 +
Mm8.12-15 + Mm8.1-5 + 48
![Page 49: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/49.jpg)
Another Chain ExampleA B C
D E
Ancestral Sequence
A B CD E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
49
![Page 50: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/50.jpg)
The Use of an Outgroup
A B CD E
Outgroup Sequence
A B CD E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
50
![Page 51: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/51.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 51
Chains join together related local alignments
Protease Regulatory Subunit 3
likely ortholog
likely paralogsshared domain?
![Page 52: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/52.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 52
Chains• a chain is a sequence of gapless aligned blocks, where there must
be no overlaps of blocks' target or query coords within the chain.• Within a chain, target and query coords are monotonically non-
decreasing. (i.e. always increasing or flat)• double-sided gaps are a new capability (blastz can't do that) that
allow extremely long chains to be constructed.• not just orthologs, but paralogs too, can result in good chains. but
that's useful!• chains should be symmetrical -- e.g. swap human-mouse -> mouse-
human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.
• chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
• chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). [Angie Hinrichs, UCSC wiki]
![Page 53: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/53.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 53
Before and After Chaining
![Page 54: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader035.fdocuments.net/reader035/viewer/2022070421/568161c5550346895dd1ab36/html5/thumbnails/54.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 54
Chaining AlgorithmInput - blocks of gapless alignments from blastzDynamic program based on the recurrence relationship: score(Bi) = max(score(Bj) + match(Bi) - gap(Bi, Bj))
Uses Miller’s KD-tree algorithm to minimize which parts of dynamic programming graph to traverse. Timing is O(N logN), where N is number of blocks (which is in hundreds of thousands)
j<i