Evolution of Proteins and Genomes select subset of slides
-
Upload
camilla-norton -
Category
Documents
-
view
59 -
download
2
description
Transcript of Evolution of Proteins and Genomes select subset of slides
Biochemistry and Molecular GeneticsComputational Bioscience Program
Consortium for Comparative GenomicsUniversity of Colorado School of Medicine
Evolution of Proteins and Genomes
select subset of slides
Evolution of Proteins
Jason de Koning
DescriptionFocus on protein structure, sequence, and
functional evolutionSubjects
structural comparison and prediction, biochemical adaptation, evolution of protein complexes, probabilistic methods for detecting patterns of
sequence evolution, effects of population structure on protein evolution, lattice and other computational models of protein
evolution, protein folding and energetics, mutagenesis experiments, directed evolution,
coevolutionary interactions within and between proteins, and detection of adaptation, diversifying selection and
functional divergence.
Reconstruction of Ancestral Function
Comparative Sequence AnalysisLooking at sets of sequences
Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...
Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...
Conservedproline
Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...
Conservedproline Variable
“High entropy”
A common but wrong assumption: sequences are a random sample from the set of all possible sequences
In reality, proteins are related by evolutionary process
Comparative Sequence AnalysisLooking at sets of sequences
Selection
SelectivePressure
Stochastic Realizations
Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...
Stability
AB
C
Function
Folding
Fitness
Model
SelectivePressure
Data
Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...
Stability
AB
C
Function
Folding
Understanding
Mutations result in genetic variation
…UGUACAAAG…
Genetic changes
…UGUAUAAAG…
Substitution
…UGUUACAAAG…
Insertion
…UGUAAAAG…
Deletion
Substitutions Can Be:
Purines: A G
Pyrimidines: C T
Transitions
Transversions
UGU/AGA/AAG
Substitutions in coding regions can be:
UGU/CGA/AAG
Silent
UGU/UGA/AAG
Nonsense
UGU/GGA/AAG
Missense
Cys STOP LysCys Gly Lys
Cys Arg Lys
Cys Arg Lys
First position: 4% of all changes silentSecond position: no changes silentThird position: 70% of all changes silent (wobble position)
Uneven crossover leading to gene deletion and duplication
Homologous crossover
Gene conversion
Fate of a duplicated gene
Keep on doing whatever it originally was doing
Lose ability to do anything(become a pseudogene)
Learn to do something new (neofunctionalization)
Split old functions among new genes (subfunctionalization)
Homologies
Rat Hb
Mouse Hb
Mouse Hb
Rat Hb
OrthologsParalogs
Hemoglobin Hemoglobin
Geneduplication
Speciation
Probability of fixation =
10-02
-0.01 0 0.01 0.02
10-04
1
10-06
10-08
10-10
10-12
10-14
N = 10,000
N = 1000
N = 10
= 1/(2N) when |s| < 1/(2N)
= 2s (large, positive S, large N)
Selective advantage (s)
Fix
atio
n pr
obab
ility
1-e-2s
1-e-2Ns
N = 100
The Rate of Evolution Depends on Constraints
Human vs. Rodent Comparison
Highest substitution rates pseudogenes introns 3’ flanking (not transcribed to mature mRNA) 4-fold degenerate sites
Intermediate substitution rates 5’ flanking (contains promoter) 3’, 5’ untranslated (transcribed to mRNA) 2-fold degenerate sites
Lowest substitution rates Nondegenerate sites
Selection of Species for DNA comparisons
Both coding and
non-coding
sequences
~70-75%
~150 MYA
4.2
Opossum
0.42.53.0Size (Gbp)
~65%~80%>99%Sequence
conservation (in coding regions)
Primarily coding
sequences
Both coding and non-coding sequences
Recently changed
sequences and genomic
rearrangements
Aids identification of…
~450 MYA~ 65 MYA~5 MYATime since divergence
PufferfishMouseChimpanzeeHuman versus
20
UCSC Genome Browser
Comparative analysis of multi-species sequences from targeted genomic regions
2121
Nature, 2003Nature, 2003
Looking backward from the human genome How much is still there after 450my (Fugu)
22
Transposable ElementsGone Wild!
Using 12 species, 561 Multi-Species ConservedSequences (MCSs) were found
How can be found using just the Mouse genome (rather than all 12)
Identifying Functionally Important Regions
How many comparative genomes do we need?Can’t we just use the mouse?
False Pos.
False Neg.True Pos.
Interpreting Evolutionary Changes Requires a Model
e.g. 0.00005 / my 20 x 20 Substitution Matrix
…IGTLS…
…IGRLS...
In evolution:what is the rate R(T R) at
which Ts become Rs?