Evidence that natural selection acts on silent mutation

11
BioSystems, 16 (1983) 101--111 101 Elsevier Scientific Publishers Ireland Ltd. EVIDENCE THAT NATURAL SELECTION ACTS ON SILENT MUTATION MICHAEL CONRAD, CARL FRIEDLANDER and MORRIS GOODMAN a Department of Computer Science and aDepartment of A natomy Wayne State University, Detroit, M148202, U.S.A. (Received October 7th, 1982) (Revision received March 3rd, 1983) Analysis of nucleic acid sequence data of mammalian hemoglobin, yeast cytochrome c, and human interferon reveals strong biases in favor of specific codons. These biases do not appear to dissipate over time, suggesting that an indirect form of selection acts on silent mutations. The data are compatible with the "bootstrapping" hypo- thesis that silent mutations which alter the rate of evolution can hitchhike with traits whose appearance they facilitate. Selection involving modulating effects of codon usage on gene expression may also be involved, but the data appear to exclude simple maximization of gene expression. Key words: Codon usage; Bootstrapping; Silent mutation Introduction Differential codon usage for the same amino acid is silent with respect to the amino acid sequence. Direct natural selection on a particular codon catalog is not expected, although indirect effects are possible. The investigation of codon usage statistics for different mRNAs by Grantham et al. (1980) indicates that each gene in a genome adheres to its species usage of the codon catalog. Our investigation utilizes the isolation of silent mutation at the amino acid level from direct natural selection of the protein and investi- gates the hypothesis that codon usage is based on evolutionary factors which deter- mine the amenability of the resultant protein to evolution (Conrad 1977, 1978, 1979a, 1979b, 1983; Conrad and Rizki, 1980; Conrad and Volkenstein, 1981). We center our investigation upon the nature of codon usage with respect to the statistical distri- bution of codon choices and suggest some reasons for the distributions that present themselves. The mammalian hemoglobin data are those used for parsimony reconstructions by Goodman (1981) and Czelusniak (un- published data, 1982). The cytochrome c and interferon data were obtained from the Dayhoff Nucleic Acid Sequence Database (1981). A tabulation of the codons which occur in the DNA sequence coding for a particular protein allows consideration of the existence of any biases among the codons used in coding individual amino acids. Table 1 lists the results of such a tabulation for c0don sequences from 29 different hemoglobins. Although random distributions of the codons used to encode particular amino acids might be expected, we found a distribution which selects among the codons. For example, although GCA, GCC, GCG or GCU may code for alanine, their distribution within the coding sequences for particular proteins is not distributed according to chance. We would expect that each of the codons would be chosen one fourth of the time if the selection of the codon were by a random process. As Table 1 illustrates, the selection 0303-2647/83/$03.00 © Elsevier Scientific Publishers Ireland Ltd. Printed and Published in Ireland

Transcript of Evidence that natural selection acts on silent mutation

Page 1: Evidence that natural selection acts on silent mutation

BioSystems, 16 (1983) 101--111 101 Elsevier Scientific Publishers Ireland Ltd.

EVIDENCE THAT NATURAL SELECTION ACTS ON SILENT MUTATION

MICHAEL CONRAD, CARL FRIEDLANDER and MORRIS GOODMAN a

Department of Computer Science and aDepartment of A natomy Wayne State University, Detroit, M148202, U.S.A.

(Received October 7th, 1982) (Revision received March 3rd, 1983)

Analysis of nucleic acid sequence data of mammalian hemoglobin, yeast cytochrome c, and human interferon reveals strong biases in favor of specific codons. These biases do not appear to dissipate over time, suggesting that an indirect form of selection acts on silent mutations. The data are compatible with the "bootstrapping" hypo- thesis that silent mutations which alter the rate of evolution can hitchhike with traits whose appearance they facilitate. Selection involving modulating effects of codon usage on gene expression may also be involved, but the data appear to exclude simple maximization of gene expression.

Key words: Codon usage; Bootstrapping; Silent mutation

Introduction

Differential codon usage for the same amino acid is silent with respect to the amino acid sequence. Direct natural selection on a particular codon catalog is not expected, although indirect effects are possible. The investigation of codon usage statistics for different mRNAs by Grantham et al. (1980) indicates tha t each gene in a genome adheres to its species usage of the codon catalog. Our investigation utilizes the isolation of silent mutat ion at the amino acid level from direct natural selection of the protein and investi- gates the hypothesis that codon usage is based on evolutionary factors which deter- mine the amenability of the resultant protein to evolution (Conrad 1977, 1978, 1979a, 1979b, 1983; Conrad and Rizki, 1980; Conrad and Volkenstein, 1981). We center our investigation upon the nature of codon usage with respect to the statistical distri- bution of codon choices and suggest some reasons for the distributions that present themselves. The mammalian hemoglobin data

are those used for parsimony reconstructions by Goodman (1981) and Czelusniak (un- published data, 1982). The cytochrome c and interferon data were obtained from the Dayhoff Nucleic Acid Sequence Database (1981).

A tabulation of the codons which occur in the DNA sequence coding for a particular protein allows consideration of the existence of any biases among the codons used in coding individual amino acids. Table 1 lists the results of such a tabulation for c0don sequences from 29 different hemoglobins. Although random distributions of the codons used to encode particular amino acids might be expected, we found a distribution which selects among the codons. For example, although GCA, GCC, GCG or GCU may code for alanine, their distribution within the coding sequences for particular proteins is not distributed according to chance. We would expect that each of the codons would be chosen one fourth of the time if the selection of the codon were by a random process. As Table 1 illustrates, the selection

0303-2647/83/$03.00 © Elsevier Scientific Publishers Ireland Ltd. Printed and Published in Ireland

Page 2: Evidence that natural selection acts on silent mutation

1 0 2

T A B L E 1

Codon f requencies for 29 m a m m a l i a n h e m o g l o b i n sequences

A m i n o Codon N u m b e r Percent x: A m i n o C o d o n N u m b e r Percent x 2 acid found found acid f o u n d f o u n d

ALA GCA 21 5 LEU ALA GCC 178 50 LEU

0.20 x 103 ALA GCG 24 6 LEU ALA GCU 129 36 LEU

LEU A R G A G A 11 12 LEU A R G AGG 49 56 ARG CGA 1 1 A R G CGC 12 13 A R G CGG 6 6 A R G CGU 8 9

CUA 20 4 CUC 79 18 CUG 265 62 CUU 24 5 U U A 4 0 UUG 35 8

LYS A A A 51 18 0.11 x 103 LYS A A G 224 81

ASN AAC 101 70 MET 0.23 × 102 ASN AAU 43 29

PHE ASP GAC 102 58 PHE

0.11 x 102 ASP G A U 73 41

PRO CYS UGC 17 39 PRO

0.18 X 10 ~ CYS UGU 26 60 PRO

PRO

GLN CAA 12 16 0.31 x 102 GLN CAG 59 83

GLU G A A 65 36 0.12 x 102 GLU G A G 113 63

0.13 X 103

GLY GGA 30 12 GLY GGC 137 55 GLY G G G 22 8 GLY G G U 58 23

HIS CAC 126 63 0.16 x l 0 s HIS CAU 71 36

ILE A U A 12 15 ILE AUC 40 52 ILE A U U 24 31

0.15 X 102

A U G 39 100

UUC 109 57 U U U 81 42

CCA 14 11 CCC 49 39 CCG 12 9 CCU 49 39

SER AGC 41 19 SER A G U 45 21 SER UCA 3 1 SER UCC 63 29 SER UCG 5 2 SER UCU 54 25

THR ACA 11 6 THR ACC 105 61 T H R ACG 3 1 THR ACU 51 30

T R P UGG 46 100

T E R M U A A 16 57 T E R M U A G 4 14 T E R M U G A 8 28

0.67 X 103

0.10 X 103

0.41 x 101

0.41 X 10 2

0,89 x 102

0.15 X 103

TYR UAC 43 72 0.12 × 102

TYR U A U 16 27

VAL G U A 10 3 VAL GUC 58 17 VAL GUG 212 65 VAL G U U 44 13

0.27 X 103

is b i a s e d t o w a r d G C C a n d G C U . A n o n -

p a r a m e t r i c C h i - S q u a r e a n a l y s i s s h o w s t h a t t h e

p r o b a b i l i t y t h a t s u c h a s e l e c t i o n w o u l d o c c u r

r a n d o m l y is l e s s t h a n 0 . 0 0 1 . ( N o n - p a r a m e t r i c

s t a t i s t i c s w e r e u s e d f o r a l l a n a l y s e s f o r w h i c h

p r o b a b i l i t i e s a r e r e p o r t e d i n t h i s p a p e r . )

C o m p u t i n g s u c h p r o b a b i l i t y m e a s u r e s f o r

e a c h o f t h e a m i n o a c i d s i n t h e p r o t e i n ( e x -

Page 3: Evidence that natural selection acts on silent mutation

cluding the terminator codes as well as the two amino acids encoded by only one codon), we find that 16 out of the possible 18 amino acids are encoded in a significantly biased manner (Chi-Square probability <0.05). Al- though we would expect some of the 18 to demonstrate biased behavior if we assumed a 50% chance of choosing a biased coding, 16 out of 18 is expected with a binomial proba- bility of less than 0.001. One might expect that silent mutations of single bases would lead to a more uniform distribution of codons within each of the amino acids, however, strong preferences are evidenced instead. One possible hypothesis is that there has been a selection for these particular codon choices based upon the presence of a restricted set of transfer-RNA molecules within the cells of the chosen organisms. The infrequency of a particular tRNA would make transcription of the DNA sequence difficult. Natural selection would then reject that particular mutat ion since the ability of the individual to survive would not be high.

To test the validity of this hypothesis, a study of hemoglobin sequence data was performed with the data divided into three functional groups: pseudo-genes, alpha-hemo- globin sequences and beta-hemoglobin se- quences. Table 2 lists pseudo-gene concen- trations; Table 3, alpha-hemoglobin concen- trations; Table 4, beta-hemoglobin concen- trations of codons. Since alpha and beta hemoglobin codings are translated into proteins within organisms, they are affected by the concentrations of RNA within the cells. Pseudo-genes may have been functional genes at one time and are now nucleotide sequences that are never translated to pro- teins; therefore, they should not be affected by tRNA concentrations within the cells. Note that freedom from tRNA cell concen- trations has not removed the biases within the codon choices of the pseudo-genes, though it should also be noted that such pseudo- genes have originated relatively late (less than 100 million years ago). The data in Table 2 shows 9 out of 18 possible amino acids coded

103

in a biased manner; Table 3, 12 out of 18; Table 4, 14 out of 18. Although pseudo- genes exhibit seemingly more random be- havior (9 of 18 as compared to 12 of 18 or 14 of 18), the biases appear to remain strong in all cases. Additional support for rejecting the tRNA hypothesis is found in a study of data from human interferon proteins summarized in Table 5. Although the data exhibits strong biases, i.e. 14 out of 18 amino acids are coded in a biased manner, the codons preferred are not the same as those of the hemoglobin data. The data are inconsistent with a maximization of gene expression though they are not inconsistent with use of codon frequencies to modulate gene expression.

That historical accidents control codon frequency is another hypothesis. The hemo- globin protein may not have had sufficient time to evolve very far from its original form; hence, it has not had time to "randomize". If true, molecules coding for phylogenetically older proteins would display a better distri- bution of codon choices. Accordingly, two nucleotide sequences for yeast cytochrome c, an ancient molecule, were examined.

As summarized in Table 6, the cytochrome c data also exhibits particular biases for specific codons. The codon choices that are popular for cytochrome c are quite different from those for hemoglobin. The strength of choice is 15 out of the 18 amino acids selected in a biased fashion. The binomial probability of such an occurrence is small, less than 0.09.

A third possibility is that the codon choices are related to the amenability of the protein to evolution. As pointed out by Zuckerkandl and Pauling (1965), evolutionary conservatism of protein tertiary structure may be due to structural similarities among different types of amino acids. The amenability hypothesis is that some codon choices are compatible with mutations to structurally less similar amino acids. To test this hypothesis, we define a measure of the replaceability of codons related to the amount of change in a protein that would occur if a single point mutat ion were to occur at a given base

Page 4: Evidence that natural selection acts on silent mutation

1 0 4

T A B L E 2

C o d o n f requencies for pseudo-gene h e m o g l o b i n sequences

A m i n o Codon N u m b e r Percen t × 2 A m i n o C o d o n N u m b e r Percen t x 2 acid found f o u n d acid f o u n d found

ALA GCA 8 14 LEU CUA 2 2 ALA GCC 26 45 LEU CUC 12 16

0.17 × 10: ALA GCG 6 10 LEU CUG 40 56 ALA GCU 17 29 LEU CUU 7 9

LEU U U A 2 2 ARG A G A 5 33 LEU U U G 8 11 A R G A G G 7 46 ARG CGA 1 6 LYS A A A 13 33

0.16 x 10: A R G CGC 0 0 LYS A A G 26 66 ARG CGG 0 0 A R G CGU 2 13

ASN AAC 14 60 MET A U G 8 100 0.i0 × I0'

ASN AAU 9 39

ASP GAC 20 55 0 .44 z 100 ASP GAU 16 44

PHE UUC 23 65 PHE U U U 12 34

PRO CCA 3 11 CYS UGC 8 80 PRO CCC 13 48

0.36 × I0' CYS UGU 2 20 PRO CCG 1 3

PRO CCU I0 37 GLN CAA 3 25

0 .30 × i02 GLN CAG 9 75

GLU G A A 12 42 0.51 × 100 GLU G A G 16 57

0.16 × 10:

GLY GGA 6 15 GLY GGC 19 48 GLY GGG 3 7 GLY GGU 11 28

HIS CAC 27 71 0.67 X 10 '

HIS CAU 11 28

ILE A U A 3 27 I LE AUC 5 45 ILE A U U 3 27

0.72 X I0 °

TERM U A A 7 53 TERM U A G 4 30 T E R M UGA 2 15

SER AGC 6 15 SER A G U 8 20 SER UCA 1 2 SER UCC 6 15 SER UCG 4 10 SER UCU 14 35

THR ACA 2 6 T H R ACC 20 60 T H R ACG 3 9 T H R ACU 8 24

TRP UGG 8 100

TYR UAC 8 72 TYR U A U 3 27

VAL G U A 6 10 VAL GUC 12 21 VAL GUG 32 57 VAL G U U 6 10

0.87 X 10:

0.43 X i0'

0.34 × i0 '

0.14 × 102

0.16 × 10:

0 .24 × 102

0.22 × 10'

0.33 X 102

o n t h e c o d o n . F o r e x a m p l e , t h e p o s s i b l e

s i n g l e p o i n t m u t a t i o n s f o r A A A a r e A A C ,

A A G , A A U , A C A , A G A , A U A , C A A , G A A

a n d U A A . G i v e n a d i s t a n c e m e a s u r e b e t w e e n

a n y t w o a m i n o a c i d s , w e c a n d e f i n e t h e

d i s t a n c e b e t w e e n t w o c o d o n s t o b e t h e s a m e

as t h e d i s t a n c e b e t w e e n t h e a m i n o a c i d s f o r

w h i c h t h e y c o d e . T h e d i s t a n c e s u m , D s , is

d e f i n e d as t h e s u m o f t h e d i s t a n c e s o f t h e

a m i n o a c i d c o d e d b y t h e i n i t i a l c o d o n t o a l l

Page 5: Evidence that natural selection acts on silent mutation

TABLE 3

Codon frequencies for alpha-hemoglobin sequences

105

Amino Codon Number Percent x 2 Amino Codon Number Percent ×~ acid found found acid found found

ALA GCA 1 1 LEU CUA 1 1 ALA GCC 42 58 LEU CUC 12 17

0,50 x I0= ALA GCG 11 15 LEU CUG 47 70 ALA GCU 18 25 LEU CUU 2 2

LEU UUA 0 0 ARG AGA 0 0 LEU UUG 5 7 ARG AGG 4 33 ARG CGA 0 0 0.11 x 102 LYS AAA 7 15 ARG CGC 1 8 LYS AAG 39 84 ARG CGG 2 16 ARG CGU 5 41

ASN AAC 13 92 MET AUG 5 100 0.10 X 10:

ASN AAU 1 7

ASP GAC 24 77 0.93 × 101 ASP GAU 7 22

PHE UUC 26 89 PHE UUU 3 10

PRO CCA 1 3 CYS UGC 5 100 PRO CCC 14 53 0.50 X 101 CYS UGU 0 0 PRO CCG 4 15

PRO CCU 7 26 GLN CAA 1 20

0.18 × I 0 ' GLN CAG 4 80

GLU GAA 8 38 0.11 X 10 ~

GLU GAG 13 61

0.44 X 102

GLY GGA 1 2 GLY GGC 26 72 GLY GGG 3 8 GLY GGU 6 16

HIS CAC 35 83 0.18 X I0=

HIS CAU 7 16

ILE AUA 0 0 ILE AUC 10 83 ILE AUU 2 16

0.14 X I 0 :

TERM UAA 4 100 TERM UAG 0 0 TERM UGA 0 0

SER AGC 15 35 SER AGU 2 4 SER UCA 1 2 SER UCC 13 30 SER UCG 0 0 SER UCU 11 26

THR ACA 0 0 THR ACC 35 92 THR ACG 0 0 THR ACU 3 7

TRP UGG 3 100

TYR UAC 8 61 TYR UAU 5 38

VAL GUA 3 6 VAL GUC 8 17 VAL GUG 32 71 VAL GUU 2 4

0.14 X I0 ~

0.22 × 102

0.18 × 10:

0.14 × 102

0.22 × 10 x

0.91 × I 0 :

0 .69 X 10 °

0.50 × 102

t h e a m i n o ac ids r e a c h a b l e w i t h s ingle p o i n t

m u t a t i o n s o f t h e i n d i v i d u a l bases w i t h i n t h e

c o d o n . T h e i r r e p l a c e a b i l i t y , I r , o f a g iven

c o d o n is t h e n d e f i n e d b y V o l k e n s t e i n ( 1 9 7 9 )

u s ing t h e e q u a t i o n

I r = 1 0 0 / ( ( D s / 3 ) + 50) ,

w h e r e t h e c h o i c e o f c o n s t a n t s d e t e r m i n e s

t h e sca l ing o f t h e i r r e p l a c e a b i l i t y va lues

b u t does n o t a l t e r t h e i r o r d e r .

T a b l e 7 l ists t h e i r r e p l a c e a b i l i t y scores f o r

t h e c o d o n s u s ing t w o d i f f e r e n t m e a s u r e s . T h e

f i r s t m e a s u r e is o n e c o m p u t e d by V o l k e n s t e i n

( 1 9 7 9 ) a n d is b a s e d o n B a c h i n s k i ' s m e a s u r e

Page 6: Evidence that natural selection acts on silent mutation

1 0 6

TABLE 4

Codon frequencies for be t a - hem og l ob i n sequences

A m i n o Codon N u m b e r Percen t × 2 A m i n o C o d o n N u m b e r Percen t × 2 acid f o u n d f o u n d acid f o u n d f o u n d

ALA GCA 12 5 LEU CUA 17 5 0.15 X 10 3 ALA GCC 110 49 LEU CUC 55 19 ALA GCG 7 3 LEU CUG 178 61 ALA GCU 94 42 LEU CUU 15 5

LEU U U A 2 0 A R G A G A 6 10 LEU UUG 22 7 A R G AGG 38 63 ARG CGA 0 0 ARG CGC 11 18 A R G CGG 4 6 ARG CGU 1 1

LYS A A A 31 16 0.81 X 101

LYS A A G 159 83

ASN AAC 74 69 0.15 × 105 MET A U G 26 100 ASN A A U 33 30

PHE UUC 60 47 ASP GAC 58 53 .o~a.~d × 100~ PHE U U U 66 52 ASP GAU 50 46

PRO CCA 10 14 CYS UGC 4 14 PRO CCC 22 30

0.14 x l 0 s CYS UGU 24 85 PRO CCG 7 9

PRO CCU 32 45 GLN CAA 8 14 0.32 × 102 GLN CAG 46 85

GLU GAA 45 34 0.11 × 105 GLU GAG 84 65

GLY G G A 23 13 GLY GGC 92 53 GLY G G G 16 9 GLY GGU 41 23

0 .82 X 10:

HIS CAC 64 54 0.10 X 10 '

HIS CAU 53 45

ILE A U A 9 16 ILE AUC 25 47 ILE AUU 19 35

T E R M UAA 5 45 T E R M U A G 0 0 T E R M U G A 6 54

0.73 X 101

SER AGC 20 15 SER A G U 35 26 SER UCA 1 0 SER UCC 44 33 SER UCG 1 0 SER UCU 29 22

THR ACA 9 9 THR ACC 50 50 THR ACG 0 0 THR ACU 40 40

TRP U G G 35 100

T Y R UAC 27 77 TYR UAU 8 22

VAL G U A 1 0 VAL GUC 38 17 VAL G U G 148 66 VAL G U U 36 16

0.45 × 10 3

0.86 x 102

0.28 x 10 °

0 .22 x 102

0 .73 x 102

0.69 x 102

0 .10 X 10 2

0.21 x 103

o f a m i n o a c i d s i m i l a r i t y . S i m i l a r i t y is d e -

f i n e d a p o s t e r i o r i i n t e r m s o f t h e m u t u a l

r e p l a c e a b i l i t y o f a m i n o a c i d r e s i d u e s i n

i s o f u n c t i o n a l p r o t e i n s . T h e s e c o n d m e a s u r e

is a d i s t a n c e m e a s u r e ( G o o d m a n a n d M o o r e ,

1 9 7 7 ) b a s e d o n C h o u - F a s m a n ( 1 9 7 4 ) o c c u r -

r e n c e d a t a f o r s p e c i f i c s e c o n d a r y c o n f o r -

m a t i o n a l s t r u c t u r e s . I n t h i s c a s e d i f f e r i n g

a m i n o a c i d s a r e r e g a r d e d as m o r e s i m i l a r i f

t h e y a p p e a r w i t h s i m i l a r f r e q u e n c i e s i n t h e

Page 7: Evidence that natural selection acts on silent mutation

T A B L E 5

C o d o n f requencies for h u m a n i n t e r f e r o n sequences

1 0 7

A m i n o C o d o n N u m b e r Pe rcen t x: A m i n o C o d o n N u m b e r Pe rcen t x: acid f o u n d f o u n d acid f o u n d found

ALA GCA 35 35 L E U CUA 29 9 ALA GCC 30 30 0.26 x 10: LEU CUC 56 18 ALA GCG 3 3 LEU CUG 53 17 A L A GCU 31 31 LEU CUU 65 21

LEU U U A 60 19 A R G A G A 57 50 LEU U U G 41 13 A R G A G G 42 36 A R G CGA 5 4 0.41 X 10: LYS A A A 70 61 A R G CGC 4 3 LYS A A G 44 38 A R G CGG 4 3 A R G CGU 2 1

ASN AAC 28 29 MET A U G 58 100 0 .15 X 10:

ASN A A U 66 70

ASP, GAC 38 48 0 .10 × 10 °

ASP G A U 41 51

PHE UUC 48 33 PHE U U U 96 66

PRO CCA 40 31 CYS UGC 37 26 PRO CCC 22 17

0.11 x 101 CYS U G U 47 63 PRO CCG 3 2

PRO CCU 63 49 GLN CAA 52 49

0.03 x 10 ° GLN CAG 54 50

GLU G A A 53 58 0 .24 X 10 ~ G L U G A G 38 41

0 .20 x 102

GLY G G A 47 43 GLY GGC 19 17 GLY GGG 20 18 GLY G G U 21 19

HIS CAC 29 36 0.60 × 101 HIS CAU 51 63

ILE A U A 34 23 ILE AUC 48 33 ILE A U U 63 43

0.87 × 101

T E R M UAA 37 27 T E R M UAG 16 11 T E R M U G A 84 61

SER AGC 28 11 SER A G U 28 11 SER UCA 63 25 SER UCC 47 19 SER UCG 1 0 SER UCU 77 31

T H R ACA 57 40 T H R ACC 30 21 T H R ACG 4 2 T H R ACU 49 35

T R P U G G 49 100

T Y R UAC 28 34 TYR U A U 54 65

VAL G UA 11 12 VAL GUC 18 20 VAL G U G 29 32 VAL G U U 32 35

0.17 x 102

0.59 X 101

0.16 × 10 2

0.61 X 102

0.92 X 10:

0.47 x 102

0.82 X 101

0.12 x 10:

s a m e s e c o n d a r y s t r u c t u r e s . U s i n g t h e s e m e a -

s u r e s o f i r r e p l a c e a b i l i t y , t h e i r r e p l a c e a b i l i t y

o f a r a n d o m p r o t e i n w i t h a m i n o a c i d d i s t r i -

b u t i o n s s i m i l a r t o r e a l p r o t e i n d a t a b u t w i t h

r a n d o m c o d o n d i s t r i b u t i o n s w a s c o m p u t e d .

W e a l s o c o m p u t e d i r r e p l a c e a b i l i t y m e a s u r e s

f o r t h e t r u e c o d o n s e q u e n c e s i n t h e d a t a .

T a b l e 8 s u m m a r i z e s t h e c o m p u t a t i o n s f o r

t h e p r o t e i n s u s e d i n t h i s s t u d y . A n e x a m i n a t i o n

o f t h e d a t a r e v e a l s t h a t t h e i r r e p l a c e a b i l i t y o f

Page 8: Evidence that natural selection acts on silent mutation

1 0 8

T A B L E 6

C o d o n frequencies for yeast c y t o c h r o m e c sequences

A m i n o C o d o n N u m b e r Percent x 2 A m i n o Codon N u m b e r Percen t x 2 acid found found acid f o u n d f o u n d

ALA GCA 7 28 LEU CUA 7 14 ALA GCC 6 24 LEU CUC 4 8

0.14 x 10 ' ALA GCG 4 16 LEU CUG 6 12 ALA GCU 8 32 LEU CUU 7 14

LEU UUA 15 30 ARG AGA 11 39 LEU UUG 11 22 A R G AGG 11 39 ARG CGA 2 7 LYS A A A 23 56 0.27 x 102 A R G CGC 1 3 LYS A A G 18 43 A R G CGG 0 0 A R G CGU 3 10

ASN AAC 10 52 MET A U G 11 100 0.50 X 10

ASN A A U 9 47

ASP GAC 7 46 ASP GAU 8 53 0.60 x 10

PHE UUC 8 24 PHE UUU 25 75

PRO CCA 6 30 CYS UGC 4 19 PRO CCC 1 5

0.80 X 10' CYS UGU 17 80 PRO CCG 4 20

PRO CCU 9 45 GLN CAA 10 52

0 .50 x 10 ' GLN CAG 9 47

GLU G A A 8 53 0 . 6 0 x 1 0 - '

GLU G A G 7 46

0 .62 x I0 '

GLY G G A 3 16 GLY GGC 3 16 GLY G G G 2 11 GLY G G U 10 55

HIS CAC 7 31 0.29 X I0 '

HIS CAU 15 68

ILE A U A 21 58 ILE AUC 3 8 ILE A U U 12 33

0.13 x 102

T E R M UAA 8 40 TE RM UAG 7 35 T E R M UGA 5 25

SER AGC 2 6 SER A G U 7 23 SER UCA 5 16 SER UCC 2 6 SER UCG 4 13 SER UCU 10 33

THR ACA 15 40 THR ACC 2 5 THR ACG 10 27 THR ACU 10 27

TRP U G G 5 100

TYR UAC 8 36 TYR U A U 14 63

VAL G U A 5 27 VAL GUC 5 27 VAL G U G 4 22 VAL G U U 4 22

0 . I i X 10 2

0 .60 × 10 °

0.87 × 10 '

0 .68 x 10 '

0 .96 × 10 '

0 .10 X 102

0.16 x 10 '

0 .33 X I 0 -~

t h e c o d o n s e q u e n c e s f o r h e m o g l o b i n a r e

e v e r y w h e r e l o w e r i n v a l u e t h a n a p r o t e i n o f

r a n d o m c o d o n d i s t r i b u t i o n w o u l d be . H o w -

e v e r , t h e p r o t e i n s c y t o c h r o m e c a n d i n t e r f e r o n

a r e c o d e d w i t h m o r e i r r e p l a c e a b l e c o d o n

s e q u e n c e s h a v i n g a h i g h e r v a l u e t h a n a r a n d o m

p r o t e i n w o u l d . T h e c o d o n s u s e d t o e n c o d e

c y t o c h r o m e c a n d i n t e r f e r o n a p p e a r t o b e

c h o s e n s o t h a t s m a l l c h a n g e s i n t h e c o d o n s

w o u l d c a u s e r e j e c t i o n o f t h e r e s u l t a n t p r o -

Page 9: Evidence that natural selection acts on silent mutation

TABLE 7

Codon irreplaceability scores

109

Codon Volkenstein Goodman Codon Volkenstein Goodman measure measure measure measure

AAA 0.806 0.689 GAA 0.757 0.578 AAC 0.800 0.653 GAC 0.769 0.507 AAG 0.806 0.729 GAG 0.757 0.578 AAU 0.800 0.653 GAU 0.769 0.507

ACA 0.534 0.709 GCA 0.515 0.632 ACC 0.520 0.714 GCC 0.518 0.602 ACG 0.534 0.709 GCG 0.515 0.632 ACU 0.520 0.714 GCU 0.518 0.602

AGA 0,699 0.877 GGA 0.574 0.684 AGC 0.793 0.781 GGC 0.552 0.724 AGG 0.699 0.775 GGG 0.574 0.684 AGU 0.793 0.781 GGU 0.552 0.724

AUA 0.645 0.757 GUA 0.526 0.719 AUC 0.666 0.719 GUC 0.546 0.684 AUG 1.265 0.662 GUG 0.549 0.680 AUU 0.666 0.719 GUU 0.546 0.684

CAA 0.854 0.694 UAA 0.840 -- CAC 0.943 0.588 UAC 0.980 0.781 CAG 0.854 0.694 UAG 1.162 -- CAU 0.943 0.588 UAU 0.980 0.781

CCA 0.598 0.436 UCA 0.561 0.847 CCC 0.606 0.436 UCC 0.558 0.714 CCG 0.598 0.436 UCG 0.561 0.847 CCU 0.606 0.436 UCU 0.558 0.714

CGA 0.512 0.862 UGA 1.162 -- CGC 0.617 0.757 UGC 1.149 0.781 CGG 0.512 0.735 UGG 1.785 0.564 CGU 0.617 0.757 UGU 1.149 0.781

CUA 0.487 0.763 UUA 0.684 1.098 CUC 0.571 0.724 UUC 0.862 0.884 CUG 0.495 0.751 UUG 0.694 0.980 CUU 0.571 0.724 UUU 0.862 0.884

re ins , i.e. c y t o c h r o m e c a n d i n t e r f e r o n are f ixed p r o t e i n s w h i c h wil l n o t evolve f r o m

the i r c u r r e n t c o n f i g u r a t i o n s . H e m o g l o b i n is c o d e d w i t h h i g h l y r e p l a c e a b l e c o d o n s a l low-

ing i t t o evolve m o r e f ree ly b y c r e a t i n g

f ewer r e j e c t a b l e p r o t e i n s , an a d v a n t a g e for a p r o t e i n t h a t n e e d s to evolve r a p i d l y in o rde r t o a c c o m m o d a t e s l ight c h a n g e s o f

pa r t i a l p ressures a n d a t m o s p h e r i c c o n d i t o n s .

I n all t hese c o m p a r i s o n s , t he m a g n i t u d e of d i f f e r e n c e is n o t a s i g n i f i c a n t f a c t o r as sca l ing o f t h e m e a s u r e s is a r b i t r a r y .

I n sum, th is i n v e s t i g a t i o n revea led t h a t t he e v o l u t i o n of c o d o n s e q u e n c e s does n o t p rogress in a p u r e l y r a n d o m m a n n e r . T h e r e is a n e x t r e m e l y h igh degree of bias in t he dis t r i -

Page 10: Evidence that natural selection acts on silent mutation

110

TABLE 8

Protein irreplaceability scores

Sequence Irreplaceability of Irreplaceability of given sequence random sequence a

Volkenstein Goodman Volkenstein Goodman

Pseudo-hemoglobin 0.707 0.708 0.711 0.715 Alpha-hemoglobin 0.687 0.694 0.693 0.703 Beta-hemoglobin 0.701 0.690 0.705 0.700 Cytochrome c 0.756 0.739 0.751 0.733 Interferon 0.755 0.781 0.752 0.779

aA random sequence is a neutral sequence of codons containing the same number of codons as the given sequence but with even distribution among silent mutations.

bu t ion o f d i f fe ren t c o d o n sequences coding for the same a m i n o acid. This is t rue for each of the a m i n o acid t ypes in each of the p ro te ins examined . I t is possible t ha t these biases are a consequence o f pas t evo lu t i ona ry accidents . However , this cou ld n o t be the m a j o r cause since biases appea r to be f ixed over t ime , an u n e x p e c t e d fea ture if si lent m u t a t i o n s are neut ra l to na tura l select ion. A second pos- sibil i ty is t ha t c o d o n usage is se lected to op t im ize the ra te of p ro te in synthes is by being selected to m a t c h t R N A frequencies in the cell. Aside f r o m possible t issue specif ic effects , the d i f fe rence in c o d o n d i s t r ibu t ion in genes cod ing for d i f f e ren t p ro te ins is i n c o m p a t i b l e wi th this hypo thes i s . I t is still poss ible t ha t c o d o n usage can be used as a means for cont ro l l ing the degree o f gene express ion . Genes wi th d i s t r ibu t ions which m o r e closely m a t c h t R N A dis t r ibu t ions would be m o r e s t rongly expressed . The da ta cu r r en t ly available are cons i s t en t wi th the hypo thes i s t h a t c o d o n usage is co r re la ted to the amenab i l i t y of a p ro te in to evo lu t ion in those cases where the c o d o n s coding for the same amino acids have d i f f e ren t replace- abilities. In this case the selective e f fec ts are m e d i a t e d by an indi rec t m e c h a n i s m some- t imes called h i tchh ik ing (cf. Conrad , 1982) . Codons having an enhanc ing e f fec t on the ra te of evo lu t ion could h i t chh ike a long with advantages at the individual organism level whose l ike l ihood t h e y increase.

A c k n o w l e d g e m e n t

M.C. acknowledges par t ia l s u p p o r t f r o m the Na t iona l Science F o u n d a t i o n (Gran t No. MCS-82-05423) .

References

Chou, P.Y. and Fasman, G.D., 1974, Conformational parameters for amino acids in helical beta-sheet and random coil regions calculated from proteins. Biochemistry 13,211--244.

Conrad, M., 1977, Evolutionary adaptability of biological macromolecules. J. Mol. Evol. 10, 87--91.

Conrad, M., 1978, Evolution of the adaptive landscape, in: Theoretical approaches to complex systems, Helm and Palm (Eds.) (Springer, New York).

Conrad, M., 1979a, Bootstrapping on the adaptive landscape. Biosystems 11,167--182.

Conrad, M., 1979b, Mutation-absorption model of the enzyme. Bull. Math. Biol. 47, 387--405.

Conrad, M., 1982, Natural selection and the evolution of neutralism, BioSystems 15, 83--85.

Conrad, M., 1983, Adaptability: The significance of variability from molecule to ecosystem (Plenum Press, New York).

Conrad, M. and Rizki, M.M., 1980, Computational illustrations of the bootstrap effect. BioSystems 13, 57--64.

Conrad, M. and Volkenstein, M.V., 1981, Replace- ability of amino acids and the self-facilitation of evolution. J. Theor. Biol. 92,293--299.

Dayhoff, M.O., Chen, H.R., Hunt, L.T., Barker, W.C., Yeh, L.S., George, D.G. and Orcutt, B.C., 1981, Nucleic acid sequence database (Georgetown University Medical Center National Biomedical Research Center, Washington).

Page 11: Evidence that natural selection acts on silent mutation

Goodman, M. and Moore, G.W., 1977, Use of Chow- Fasman amino acid conformational parameters to analyze the organization of the genetic code and to construct protein genealogies. J. Mol. Evol. 10, 7--47.

Goodman, M., 1981, Decoding the pattern of protein evolution. Prog. Biophys. Mol. Biol. 38, 105--164.

Grantham, R., Cautier, C., Gouy, M., Mercier, R. and Pave, A., 1980, Codon catalog usage and the

111

genome hypothesis. Nucleic Acids Res. 81, r49-- r62.

Volkenstein, M.V., 1979, Mutations and the value of information. J. Theor. Biol. 80, 155--169.

Zuckerkandl, E. and Pauling, L., 1965, Evolutionary divergence and convergence in proteins, in: Evolv- ing genes and proteins, V. Bryson and H.J. Vogel (eds.) (Academic Press, New York) p. 97.