Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation...

7
B - 1 Mutations B - 2 Tree Of Life B - 3 Tree Of Life B - 4 Mutations vs. Substitutions Mutations are changes in DNA Substitutions are mutations that evolution has tolerated Which rate is greater? Replicative proofreading and DNA repair constrain the mutation rate

Transcript of Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation...

Page 1: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 1

Mutations

B - 2

Tree Of Life

B - 3

Tree Of Life

B - 4

Mutations vs. Substitutions

• Mutations are changes in DNA

• Substitutions are mutations that evolution has tolerated

Which rate is greater?

Replicative proofreading and DNA repair constrain the mutation rate

Page 2: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 5

Selectionist Evolution

• Most mutations are deleterious; removed via negative selection

• Advantageous mutations positively selected

• Variability arises via selection

deleterious

beneficialneutral

B - 6

Why Are Mutations Important?

Mutations can be deleterious

Mutations drive evolution

B - 7

UV Damage to DNA

Thymine dimers

UV

What happens if damage is not repaired?B - 8

Radiation Resistant

• 10 Gray will kill a human• 60 Gray will kill an E. coli culture• Deinococcus can survive 5000 Gray

Page 3: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 9

A Sequence Mutating at Random

1

Multiple substitutions at one site can cause underestimation of number of actual substitutions

121

3 *

5 *6 *7 *8 *9 *10 *11 *12

2 *

4

9 actual substitutions

p=5 observed substitutions

B - 10

Simulating Random Mutations

observed substitutions

Substitutions

Sequ

ence

dis

tanc

e

actual substitutions

B - 11

Measuring Sequence Divergence:Why Do We Care?

• Inferring phylogenetic relationships

• Dating divergence, correlating with fossil record

• Use in sequence alignments and homology searches of databases*

* Comparative genomics is an important field. Determining not only how many substitutions exist between two sequences but how similar two sequences are. B - 12

DNA Structure

AT

GC

AT

C G

OH

AT

OH

5’

5’

3’

3’

G-C: 3 hydrogen bondsA-T: 2 hydrogen bonds

Two base types:- Purines (A, G)- Pyrimidines (T, C)

Page 4: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 13

Not All Base Substitions Are Created Equal

• Transitions

• Transversions

Transition rate ~2x transversion rate

• Purine to purine (A ➔ G or G ➔ A)• Pyrimidine to pyrimidine (C ➔ T or T ➔ C)

• Purine to pyrimidine (A ➔ C or T; G ➔ C or T )• Pyrimidine to purine (C ➔ A or G; T ➔ A or G)

B - 14

Substition Rates Differ Across Genomes

Alignment of 3,165 human-mouse pairs

Splice sites

Start of transcription Polyadenylation site

B - 15

The PAM Model of Protein Sequence Evolution

• Empirical data-based substitution matrix

• Global alignments of 71 families of closely related proteins.

• Constructed hypothetical evolutionary trees

• Built matrix of 1572 amino acid point accepted mutations

B - 16

Original PAM Substitution Matrix

Dayhoff, 1978

Count number of times residue i was replaced with residue j

Page 5: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 17

Deriving PAM Matrices

mj = # times amino acid j mutatedtotal occurrences of amino acid j

For each amino acid, calculate its relative mutability, i.e.,

the likelihood that the amino acid will mutate:

B - 18

Deriving PAM Matrices

Calculate mutation probabilities for each possible substitution

Mi,j = relative mutability x proportion of all substitutions to j by changing to i

mj x Ai,jMi,j =

∑Ai,ji

B - 19

PAM1 Mutation Probability Matrix

Dayhoff, 1978

B - 20

Deriving PAM Matrices

Calculate log odds ratio to convert mutation probability to substitution score

(Mi,j)

Frequency of residue i(Probability of amino acid i

occurring by chance)

Mutation probability(Prob. substitution from j to i

is an accepted mutation)Si,j = 10 x log10 fi

( )

Page 6: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 21

Deriving PAM Matrices

Scoring in log odds ratio:- Allows addition of scores for residues in alignments

Interpretation of score:- Positive: non-random (accepted mutation) favored- Negative: random model favored

B - 22

PAM Matrix

B - 23

Using PAM Scoring Matrices

PAM1: 1% difference (99% identity)

Can “evolve” the mutation probability matrix by multiplying it by itself, then take log odds ratio

(PAMn = PAM matrix multiplied by itself n times)

B - 24

BLOSUM = BLOCKS Substitution Matrix

• Like PAM, empirical proteins substitution matrices, use log odds ratio to calculate substitution scores

• Large database: local alignments of conserved regions of distantly related proteins

Gapless alignment

blocks

Page 7: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 25

BLOSUM Uses Clustering To Reduce Sequence Bias

• Cluster the most similar sequences together

• Reduce weight of contribution of clustered sequences

• BLOSUM number refers to clustering threshold used (e.g. 62% for BLOSUM 62 matrix)

B - 26

BLOSUM and PAM Substitution Matrices

BLOSUM 30

BLOSUM 62

BLOSUM 90

% identity

PAM 250 (80)

PAM 120 (66)

PAM 90 (50)

% change

change

BLAST algorithm uses BLOSUM 62 matrix

B - 27

PAM and BLOSUM

• Smaller set of closely related proteins - short evolutionary period

• Use global alignment• More divergent matrices

extrapolated• Errors arise from

extrapolation

• Larger set of more divergent proteins-longer evolutionary period

• Use local alignment• Each matrix calculated

separately• Clustering to avoid bias• Errors arise from

alignment errors

PAM BLOSUM

B - 28

Importance of Scoring Matrices

• Scoring matrices appear in all analyses involving sequence comparison

• The choice of matrix can strongly influence the outcome of the analysis