Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected...

17
Low Codon Bias and High Rates of Synonymous Substitution in Drosophila hydei and D. melanogaster Histone Genes David H. A. Fitch* and Linda D. Strausbaugh? *Department o f Molecular Genetics, Albert Einstein College of Medicine; and TDepartment of Molecular and Cell Biology, The University of Connecticut We have evaluated codon usage bias in Drosophila histone genes and have obtained the nucleotide sequence of a $16 1-bp D. hydei histone gene repeat unit. This repeat contains genes for all five histone proteins (H 1, H2a, H2b, H3, and H4) and differs from the previously reported one by a second EcoRI site. These D. hydei repeats have been aligned to each other and to the 5.0-kb (i.e., long) and 4.8-kb (i.e., short) histone repeat types from D. melunoguster. In each species, base composition at synonymous sites is similar to the average genomic composition and approaches that in the small intergenic spacers of the histone gene repeats. Accumulation of synonymous changes at synonymous sites after the species diverged is quite high. Both of these features are consistent with the relatively low codon usage bias observed in these genes when compared with other Drosophila genes. Thus, the generalization that abundantly expressed genes in Drosophila have high codon bias and low rates of silent substitution does not hold for the histone genes. Introduction In most sequenced genes, codons within synonymous groups are not utilized with equal frequency (Ikemura 1985; Sharp and Li 1986; Shields et al. 1988). De- pending on the organism, this bias has been correlated with variation in either presumed mutational biases or selectional constraints as reflected by local G+C nucleotide com- position (Aota and Ikemura 1986; Bernardi and Bernardi 1986; Shields et al. 1988; Wolfe et al. 1989) or with presumed fitness differences among synonymous codons (Gouy and Gautier 1982; Ikemura 1985; Bulmer 1987; Shields et al. 1988). For ex- ample, in human genes, the G+C composition of synonymous sites correlates signif- icantly with that in associated introns, suggesting that relative synonymous codon usage is not influenced by selectional differences between synonymous codons but reflects mutational forces governing local G+C nucleotide composition (Shields et al. 1988). Synonymous codon usage appears to be determined similarly in other mammals (Aota and Ikemura 1986; Bernardi and Bernardi 1986; Wolfe et al. 1989). However, in bacteria and unicellular eukaryotes, synonymous codon usage appears to be deter- mined largely by selection for efficient translation, especially in highly expressed genes (Gouy and Gautier 1982; Ikemura 1985; Sharp et al. 1986; Bulmer 1987; Shields and Sharp 1987; Andersson and Kurland 1990). In Drosophila melanogaster, genes with high codon usage bias have higher G+C 1. Key words: synonymous codon usage, codon optimization, histone genes, Drosophila hydei, sophila melanogaster. Department of Molecular Address Biology, The for correspondence and reprints: University of Connecticut, Storrs, Linda D. Strausbaugh, Connecticut 06268. and Cell Dro- Mol. Biol. Evol. 10(2):397-413. 1993. 0 1993 by The University of Chicago. All rights reserved. 0737-4038/93/1002-0011$02.00 397

Transcript of Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected...

Page 1: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias and High Rates of Synonymous Substitution in Drosophila hydei and D. melanogaster Histone Genes ’

David H. A. Fitch* and Linda D. Strausbaugh? *Department o f Molecular Genetics, Albert Einstein College of Medicine; and TDepartment of Molecular and Cell Biology, The University of Connecticut

We have evaluated codon usage bias in Drosophila histone genes and have obtained the nucleotide sequence of a $16 1 -bp D. hydei histone gene repeat unit. This repeat contains genes for all five histone proteins (H 1, H2a, H2b, H3, and H4) and differs from the previously reported one by a second EcoRI site. These D. hydei repeats have been aligned to each other and to the 5.0-kb (i.e., long) and 4.8-kb (i.e., short) histone repeat types from D. melunoguster. In each species, base composition at synonymous sites is similar to the average genomic composition and approaches that in the small intergenic spacers of the histone gene repeats. Accumulation of synonymous changes at synonymous sites after the species diverged is quite high. Both of these features are consistent with the relatively low codon usage bias observed in these genes when compared with other Drosophila genes. Thus, the generalization that abundantly expressed genes in Drosophila have high codon bias and low rates of silent substitution does not hold for the histone genes.

Introduction

In most sequenced genes, codons within synonymous groups are not utilized with equal frequency (Ikemura 1985; Sharp and Li 1986; Shields et al. 1988). De- pending on the organism, this bias has been correlated with variation in either presumed mutational biases or selectional constraints as reflected by local G+C nucleotide com- position (Aota and Ikemura 1986; Bernardi and Bernardi 1986; Shields et al. 1988; Wolfe et al. 1989) or with presumed fitness differences among synonymous codons (Gouy and Gautier 1982; Ikemura 1985; Bulmer 1987; Shields et al. 1988). For ex- ample, in human genes, the G+C composition of synonymous sites correlates signif- icantly with that in associated introns, suggesting that relative synonymous codon usage is not influenced by selectional differences between synonymous codons but reflects mutational forces governing local G+C nucleotide composition (Shields et al. 1988). Synonymous codon usage appears to be determined similarly in other mammals (Aota and Ikemura 1986; Bernardi and Bernardi 1986; Wolfe et al. 1989). However, in bacteria and unicellular eukaryotes, synonymous codon usage appears to be deter- mined largely by selection for efficient translation, especially in highly expressed genes (Gouy and Gautier 1982; Ikemura 1985; Sharp et al. 1986; Bulmer 1987; Shields and Sharp 1987; Andersson and Kurland 1990).

In Drosophila melanogaster, genes with high codon usage bias have higher G+C

1. Key words: synonymous codon usage, codon optimization, histone genes, Drosophila hydei, sophila melanogaster.

Department of Molecular Address Biology, The

for correspondence and reprints: University of Connecticut, Storrs,

Linda D. Strausbaugh, Connecticut 06268.

and Cell

Dro-

Mol. Biol. Evol. 10(2):397-413. 1993. 0 1993 by The University of Chicago. All rights reserved. 0737-4038/93/1002-0011$02.00

397

Page 2: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

398 Fitch and Strausbaugh

composition at synonymous sites than do genes with low codon bias (Shields et al. 1988). In addition, no correlation exists between G+C composition at synonymous sites and that in associated introns. These observations suggest that codon bias in Drosophila is not simply the result of mutational biases (reflected by local G+C nu- cleotide composition) but may be influenced by selection (Shields et al. 1988) as hypothesized for Escherichia coli and Saccharomyces cerevisiae. One explanation for how selection may influence codon usage in these organisms and Drosophila but not in vertebrates is that the large effective population sizes of the former allow the slight fitness differences between synonymous codons to overcome genetic drift (Sharp and Li 1986; Bulmer 1987; Shields et al. 1988).

Two major hypotheses have been proposed to explain how fitness differences between synonymous codons might arise. First, selection drives tRNA abundance and codon frequencies to compatible quantities such that codons recognized by low-abun- dance tRNAs are infrequent. Second, there is a “preference” among the codons trans- lated by the most abundant tRNA (Sharp and Li 1986; Bulmer 1987). Both hypotheses assume that inefficient translation elongation can influence an organism’s reproductive fitness. Although these assumptions have not been directly tested, the bias in synon- ymous codon usage correlates well with cognate tRNA abundance in prokaryotes and unicellular eukaryotes (reviewed in Ikemura 1985; Sharp et al. 1986; Andersson and Kurland 1990). Also, more highly expressed genes show greater codon usage bias (“optimization”) than do less abundantly expressed genes (Gouy and Gautier 1982; Ikemura 1985; Sharp and Li 1986, 1987; Sharp et al. 1986; Shields and Sharp 1987). Such a trend is also apparent in Drosophila (Shields et al. 1988).

To further explore this correlation between codon usage bias and expressivity, we have studied the nucleosomal core histone genes of D. hydei and D. melanogaster. In both species, moderately repetitive tandem units each contain five genes, encoding the linker protein HI and the nucleosomal core histones H2a, H2b, H3, and H4 (Goldberg 1979; Fitch 1986; Fitch et al. 1990; Kremer and Hennig 1990). The genes are coordinately regulated and very abundantly translated, especially during embryo- genesis and early development (Anderson and Lengyel 1984). We have found that the base composition at synonymous sites in Drosophila histone genes is similar to the average genomic composition and approaches that in the small intergenic spacers of the histone repeat. In addition, rates of synonymous substitution are relatively high-and codon usage bias is relatively low- compared with other Drosophila genes. These data do not support the hypothesis (Wells and Herrmann 1989) that codon usage bias in these highly expressed genes is optimized by selection for translation efficiency.

Material and Methods DNA Sequencing

Drosophila hydei histone repeat plasmid DNA (pDhH5. l-la; Fitch 1986) was purified in two CsCl-ethidium bromide isopycnic gradients (Maniatis et al. 1982, pp. 93-94). Chemical sequencing was performed according to a method described else- where (Chang and Slightom 1984). Dideoxy chain-termination sequencing was per- formed on double-stranded template by using T7 DNA polymerase (Sequenase, version 1 .O) according to recommendations of the manufacturer (U.S. Biochemicals). Primers were synthesized by using an Applied Biosystems synthesizer and were based either on the highly conserved sequences (Wells and McBride 1989) of the D. melanogaster

Page 3: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 399

genes or on seq uences obtained from earlier sets of reactions. All of the sequence except positions 3 105-3 199 (within H4) was determined from both strands.

Sequence Alignment and Analysis

Coding sequences and flanking conserved regulatory elements were aligned by eye, on the basis of alignments of the very highly conserved open reading frames. For the H 1 genes, occasional stretches required gaps to improve the alignment, implying that some differences were due to frameshift changes. The very low level of interspecific sequence similarity in most regions of the intergenic spacers precluded a meaningful alignment. Nevertheless, alignments were made by using a dot matrix approach to identify possible conserved elements (EMBL accession no. DS8200). Similar align- ments were independently obtained by Kremer and Hennig ( 1990).

Synonymous and nonsynonymous sites and corresponding base compositions were determined according to the method of Nei and Gojobori ( 1986), except that initiation and termination codons were excluded. Divergences between histone genes were estimated from numbers of synonymous substitutions, according to the method of Li et al. ( 1985 ) or Lewontin ( 1989). Intergenic spacer divergences were determined by the method of Jukes and Cantor ( 1969 ) and counted indels (gaps) at single positions as single events, regardless of length; these divergences were consistently larger than those obtained (data not shown) by the two-parameter method of Kimura ( 1980), which does not consider indels.

Codon bias was calculated according to methods described elsewhere [i.e., relative synonymous codon usage (RSCU) and codon “adaptation” index (CAI); Sharp and Li 1986; Shields and Sharp 1987 (x2/n); Shields et al. 19881. The CA1 was originally designed to measure how similar a particular codon bias pattern is to a reference pattern of genes that are “very highly expressed” (Sharp and Li 1986). Because CA1 is actually based on codon bias and is unrelated to actual measures of fitness or expres- sion, CA1 is just a measure of bias similarity; the higher is the bias in the reference set, the more discriminating is the index (D. H. A. Fitch, unpublished data). A com- puter program, CODEVOLV, used for most calculations presented in the present paper, is available by sending a formatted IBM DOS-compatible 3.5-inch or 5.25-inch diskette to D.H.A.F. (specify whether you have a math chip).

Results and Discussion Comparisons between Drosophila hydei and D. melanogaster Histone Repeat Units

Kremer and Hennig ( 1990) published the nucleotide sequence of a histone gene repeat from a different D. hydei strain. Genomic blots show that the histone gene arrays in both strains contain two major types of repeats that grossly differ in the presence or absence of a second EcoRI site (Fitch 1986; Kremer and Hennig 1990). Since the repeat of Kremer and Hennig ( 1990) lacks this second site, and since our repeat bears this site in the H4-H2a intergenic spacer, these clones represent these two major repeat types. We refer to the sequence of Kremer and Hennig ( 1990) as “Dhy 1E” and to our sequence as “Dhy 2E,” to reflect this difference. Comparison of Dhy 1 E and Dhy 2E reveals that they differ at a total of 56 ( 1.09%) of 5,158 shared nucleotide positions [ 3 nonsynonymous transversions, 1 synonymous transversion, and 1 syn- onymous transition in the Hl gene; 4 synonymous transitions and 1 synonymous transversion in the nucleosomal genes; and 15 transitions, 18 transversions, and 13 indels (one encompassing an EcoRI site) in noncoding regions].

Page 4: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

400 Fitch and Strausbaugh

Hybridization matrices show that the patterns of histone gene organization in D. hydei and D. melanogaster are colinear (Fitch 1986)) despite a high average sequence divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The nucleotide sequence alignment (not shown) shows that the size difference between Dhy 2E (5.1 kb) and D. melanogaster repeat Dme S (short) (4.8 kb) is due to differences in the sizes of each intergenic spacer (table 1). The D. hydei-D. melanogaster sequence differences within each spacer (table 1) ap- proach those expected for random sequences (Doolittle 1986, p. 5). Conserved seg- ments are nevertheless apparent in these spacers; most define known regulatory ele- ments (Kremer and Hennig 1990). The D. melanogaster 5.0-kb repeat [ Dme L (long)] differs from the 4.8-kb repeat (Dme S) by a 244-bp insertion of a tRNA-related element (Matsuo and Yamazaki 1989b) into the large H l-H3 intergenic spacer; this insert is absent in D. hydei. In the Hl-H3 spacer, the amount of interspecific identity at shared A+T-rich positions is 77%, notably higher than the overall identity of 51%-52% in

Table 1 Comparisons between Noncoding Regions of Drosophila melanogaster and D. hydei Histone Repeats

VALUEFORINTERGENICSPACER"

PARAMETERANDREPEAT HI-H3 H3-H4 H4-H2a H2a-H2b H2bH I

Spacer length: Dme L . . . . Dme S . . . . . Dhy 2E . Dhy IE . . .

G+C composition: Dme L . . Dme S Dhy 2E . Dhy 1E .

Pairwise difference:’ Dme L/Dme S Dme L/Dhy 2E DmeL/DhylE .._... Dme S/Dhy 2E . Dme S/Dhy IE . Dhy2E/DhylE .__...

Pairwise divergence + SD:’ Dme L/Dme S Dme L/Dhy 2E DmeL/DhylE .___.. Dme S/Dhy 2E Dme S/Dhy lE DhyZE/DhylE .___..

1,399 nt _ 1,195 ntb

1,254 nt 1,253 nt

28% ~27%~

27% 27%

1.7% 48%d 49% 48% 48%

1.1%

1.7% f 0.7% 78% f4% 79% + 4% 76% f 6% 78% 2 7%

1.1% f 0.3%

296 nt 296 nt 315 nt 316 nt

40% 40% 38% 38%

0.0% 41% 42% 41% 42%

1 .O%

0.0% zk 0.0% 60% Z!I 6% 61% f6% 60% f 6% 61% &6%

1.0% f 0.6%

474 nt -469 ntb

603 nt 607 nt

33% m32%b

31% 33%

2.5% 44% 44% 52% 53%

2.6%

2.6% f 1.3% 66% ? 6% 66% f 6% 88% -+ 13% 93% f 14%

2.7% + 0.7%

226 nt 221 nt 247 nt 247 nt

41% 41% 33% 34%

4.9% 46% 47% 46% 47%

0.8%

5.1% f 1.5% 72% * 9% 73% + 9% 72% ?I 8% 73% f 9%

0.8% + 0.6%

406 nt 407 nt 505 nt 509 nt

33% 33% 41% 41%

0.8% 43% 44% 43% 44%

2.2%

0.8% +- 0.5% 64% + 6% 66% +- 6% 64% + 6% 67% f 6%

2.2% + 0.7%

a Spacers are designated by their flanking genes. b Value is estimated from incompletely sequenced regions of the Dme S repeat (Goldberg 1979); over regions with

missing data, estimates were based on the Dme L sequence. ’ Values are based only on shared positions and include not only substitutions but also indels, counted as single events

at single positions, regardless of length. d The difference between Dme L and Dhy 2E at shared positions containing A or T is 23%, much lower than that for

the total spacer. ’ Divergence values were corrected for superimposed substitutions according to the method of Jukes and Cantor ( 1969)

+ SD determined according to the method of Kimura and Ohta (1972).

Page 5: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophilu Histone Genes 401

this spacer. This relatively high identity may be due to the conserved A+T-rich tracts that encompass at least one of the scaffold-attachment regions in the D. melanogaster spacer (Mirkovitch et al. 1984; Gasser and Laemmli 1986), although the proposed element itself is not highly conserved in D. hydei. We suggest that the combination of these conserved features indicates a functional role for some sequences within the H 1 -H3 noncoding region.

Nonsynonymous positions in regions coding for the nucleosomal core show very few differences (&; table 2). This result is not surprising, since these proteins, especially H4, are among the most conserved proteins known (Hunt and Dayhoff 1982). How- ever, there is considerable divergence in the linker histone, H 1, at both the nucleotide level and the amino acid level (table 2). The 69 replacement differences are not ran- domly distributed; the largest conserved stretch is also conserved across large phylo- genetic distances (Wells and McBride 1989 ) . The H 1 genes of D. hydei and D. mel- anogaster also differ in size by six codons (table 2) and several frameshift differences.

Base Composition at Synonymous Sites

In a survey of D. melanogaster genes, Shields et al. ( 1988) found considerable variation in (G+C)s, i.e., the G+C composition of synonymous sites. Unlike the situation in vertebrate genes, (G+C)s in D. melanogaster does not correlate with either (G+C)i, i.e., the G+C composition of introns, or (G+C),, i.e., the composition of nonsynonymous sites. However, (G+C)s in D. melanogaster genes correlates signif- icantly with the degree of codon bias (Shields et al. 1988). Genes with higher codon bias have higher (G+C)s than do genes with less codon bias (table 3). The (G+C), in more highly biased genes also differs more from both the (G+C), and (G+C), in the same gene than in genes with less bias (Shields et al. 1988).

In histone genes of both D. hydei and D. melanogaster, (G+C), values are lower than the average for low-bias D. melanogaster genes and differ greatly from the average value for high-bias genes (table 3). Furtherrnore, ( G+C)s values are close to (G+C), values (table ‘3) and approach the composition of the intergenic spacers (table 1). From another viewpoint, the (G+C)s values of histone genes from both species are close to the presumed equilibrium composition of their respective genomes [ 40% and 44% G+C for D. melanogaster and D. hydei, respectively (Hess 1986; Shapiro 1976)]. Histone gene (G+C)s values in both species approach the average (G+C)i of D. mel- anogaster introns, -37% (Shields et al. 1988). In fact, ( G+C)4, the composition at fourfold degenerate sites, is slightly lower than (G+C)s in the histone genes (table 3). Since fourfold-degenerate sites are probably under less constraint than are twofold- degenerate sites, which are included in the (G+C)s value, this result suggests that G+C composition tends to approach the noncoding average composition when con- straints are relaxed.

Codon Usage Bias in Drosophila Histone Genes

In an analysis of codon bias in D. melanogaster, Shields et al. ( 1988) excluded genes encoding proteins with highly biased amino acid compositions (i.e., with high representation by one amino acid). However, such bias constitutes no a priori reason for excluding sequences that have significant representation of the other amino acids (e.g., if the sequences are long enough). Also, the metrics used in both their study and ours are relatively insensitive to differential representation among different syn- onymous codon groups, as long as each group is represented by a reasonable number

Page 6: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Table 2 Comparisons between Drosophila hydei and D. melunoguster Histone Coding Regions

Gene Size” (nt) Indels f @(O/O) Rh KAi (SE’) Sk KS’ (SE’) Kd”’ (SE’)

HI . . H2a . . H2b . . . H3 . . . . H4 . . . . Cores’ . . Group I Group II Group III

. . . 768 (256)“, 747 (249)o 99 105 1.08 0.79 16 29.7 69 0.2 I (0.02) 123 1.50 (0.25) 1.17 (0.21) . . 372 (124) 42 23 2.07 1.67 0 17.5 0 0.01 (0.01) 63 1.36 (0.33) 1.25 (0.39)

. . 369 (123) 40 18 2.49 1.69 0 15.7 1* <o.o 1 57 1.40 (0.34) 1.24 (0.44)

. . 408 (136) 41 27 1.68 0.91 0 16.7 19 0.01 (0.01) 65 1.19 (0.22) 1.01 (0.23) . . 309 (103) 22 20 1.17 0.60 0 13.6 0 0.00 42 0.83 (0.16) 0.74 (0.17)

. . . 1,458 (486) 145 88 1.83 1.11 0 16.0 2 <o.o 1 227 1.17 (0.12) 1.02 (0.13)

. . . . . . . . . . . * * . . . . . . . . . * . . . . . . . . . 0.89 (0.07)” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.34 (0.06)’ . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.05 (0.00)” . . .

NOTE.-Sequences used in this comparison are from the Dhy 2E and Dme L repeats. a Does not include termination codons. Nos. in parentheses are no. of amino acids. b No. of total observed transitional differences. c No. of total observed transversional differences. d Ratio of total no. of observed transitional differences to total no. of observed transversional differences. ’ Ratio of estimated synonymous transitional changes to synonymous transversional changes occurring only at fourfold synonymous sites, where, presumably, both transitions and transversions

are equally unconstrained (Li et al. 1985). ‘No. of indel (gap) differences, as suggested by the alignment. g Percent difference (i.e., the proportion of differences between the sequences over shared positions X 100) when each indel is counted as a single change at a single site, regardless of length, and

is not corrected for superimposed substitutions. h No. of observed amino acid replacements. i No. of nonsynonymous substitutions estimated per nonsynonymous site (Li et al. 1985). j Standard error. Ir Total no. of synonymous substitutions. ' No. of synonymous substitutions estimated per synonymous site (Li et al. 1985). m No. of synonymous substitutions estimated per fourfold-degenerate site (Li et al. 1985). n Size of Dme L H 1 coding region only. The alignment proposes that Dhy 2E and Dme L H 1 genes share 74 1 nucleotide (and indel) positions; this number is used for the calculation of d between

Dme L and Dhy 2E Hl sequences. o Size of Dhy 2E H 1 coding region only (see previous footnote). P The corresponding nucleotide differences occur at alignment positions 4901 and 4902; Dme L encodes Thr and Dhy 2E encodes Asn. q The corresponding nucleotide difference occurs at alignment position 2567; Dme L encodes Ile, and Dhy 2E encodes Val. r Values are determined for all of the nucleosomal core histone genes, concatenated. s Mean value, from a comparison between seven group I genes (including H&2) with low levels of synonymous substitution and from species of subgenera Drosophila and Sophophoru (Moriyama

and Gojobori 1992) the subgenera to which, respectively, D. hydei and D. melanogaster belong (Throckmorton 1975). t Mean value, from comparisons between five group II genes with moderate to high levels of synonymous substitution and from species of subgenera Sophophoru and Drosophila (Moriyama and

Gojobori 1992). ” From a comparison between the P2-tub&r genes from D. melunogasfer and D. hydei, representing the group III genes with very high levels of synonymous substitution (Moriyama and Gojobori 1992).

Page 7: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 403

Table 3 Histone Gene Base Composition

Repeat and Gene (G+C)s= G+C),b G+C),’

@) (%) VJ)

Dme L. HI ............ H2a ........... H2b ........... H3 ............ H4 ............ Coresd .........

Dme S:” Hl ............ H2a ........... H2b ........... H3 ............ H4 ............ Coresd .........

Dhy 2E: HI ............ H2a ........... H2b ........... H3 ............ H4 ............ Coresd .........

Dhy 1E: Hl ............ H2a ........... Hlb ........... H3 ............ H4 ............ Coresd .........

Other genes:f 15Highbias .... 15 Medium bias . 15Lowbias ....

TESTSEQp ...

45.5 43.4 48.4 47.7 43.7 55.7 57.8 53.7 47.8 55.2 52.7 56.4 52.1 50.0 55.1 53.1 49.8 53.8

45.5 43.4 48.4 47.0 42.9 55.7 57.9 53.8 48.3 53.5 50.7 56.4 49.1 46.5 56.4 51.9 48.3 54.2

36.9 32.4 48.7 45.3 41.8 54.3 50.8 44.2 46.6 42.2 37.0 56.3 38.9 35.6 52.9 44.3 39.4 52.6

37.1 32.4 48.8 43.1 38.8 54.3 50.0 44.2 46.5 41.6 37.0 56.2 38.9 35.6 52.9 43.3 38.6 52.5

80.0 zk 3.9 75.3 k 4.5 62.1 f 7.0 49.2

. . .

. . .

. . . 50.0

48.4 f 4.9 47.6 + 4.2 48.1 -t 3.0 52.0

’ G+C composition of synonymous positions. b G+C composition of fourfold-degenerate positions. ’ G+C composition of nonsynonymous positions. d Values for the nucleosomal core histone genes, concatenated. e Composition information for these genes is based on incomplete sequences (Goldberg 1979;

Drosophila melanogaster H 1 sequence is from Murphy and Blumenfeld 1986). f Data are mean and SD values for three groups of I5 genes each, determined by Shields et al.

(1988) to have high, medium, or low codon bias, depending on their rank in a correspondence analysis of RSCU values.

g A hypothetical sequence with zero bias (equal usage of synonymous codons).

of codons. Because expression and function of the histone genes are coordinated, and because separation of the coding regions by intergenic spacers is somewhat analogous to an exon-intron arrangement, we have considered them as a single functional unit for measuring codon bias. Several frameshift and nonsynonymous differences have occurred between the D. hydei and D. melanogaster Hl genes; associated shifts in codon usage would reflect these mutational differences rather than possible fitness

Page 8: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Table 4 Codon Usage in Nucleosomal Core Histone Genes and Other Drosophila Genes

AMINO ACID AND CODON

NUCLEOSOMAL CORE HISTONE GENES

Dme L Dhy 2E

Usage RSCU Usage RSCU

OTHER Drosophila GENES’

Medium Low High Bias Bias Bias

- -

RSCU WY RSCU RSCU

Phe: TTT ....... TTC .......

Leu: TTA ....... TTG ....... CTT ....... CTC ....... CTA ....... CTG .......

Ile: ATT ....... ATC ....... ATA .......

Val: GTT ....... GTC ....... GTA ....... GTG .......

Ser: TCT ....... TCC ....... TCA ....... TCG ....... AGT ....... AGC .......

Pro: CCT ....... ccc ....... CCA ....... CCG .......

Thr: ACT ....... ACC ....... ACA ....... ACG .......

Ala: GCT ....... GCC ....... GCA ....... GCG .......

Tyr: TAT ....... TAC .......

His: CAT ....... CAC .......

3 0.67 4 0.89 0.16 0.09 0.40 0.84 6 1.33 5 1.11 1.84 1 .oo 1.60 1.16

2 0.29 1 0.15 0.05 0.01 0.08 0.37 12 1.76 20 2.93 0.68 0.16 1.03 1.09

3 0.44 9 1.32 0.25 0.06 0.33 0.61 7 1.02 1 0.14 0.75 0.18 1.03 0.89 3 0.44 3 0.44 0.12 0.03 0.33 0.60

14 2.05 7 1.02 4.14 1 .oo 3.22 2.44

12 1.24 12 1.29 0.67 0.29 0.76 1.16 12 1.24 16 1.71 2.33 1 .oo 1.92 1.21

5 0.52 0 0.00 0.00 0.002 b 0.33 0.63

12 1.66 12 1.60 0.62 0.34 0.42 0.78 4 0.55 10 1.33 1.46 0.80 1.29 0.97 3 0.41 0 0.00 0.10 0.06 0.28 0.37

10 1.38 8 1.07 1.82 1.00 2.00 1.88

4 1.04 4 1.04 0.64 0.24 0.27 0.56 2 0.52 1 0.26 2.69 1 .oo 1.89 1.29 1 0.26 4 1.04 0.11 0.04 0.23 0.62 5 1.30 3 0.78 1.21 0.45 1.33 1.29 4 1.04 4 1.04 0.02 0.01 0.63 0.96 7 1.83 7 1.83 1.34 0.50 1.65 1.28

8 2.00 7 1.75 0.41 0.15 0.39 0.43 3 0.75 4 1 .oo 2.71 1 .oo 1.68 1.12 3 0.75 3 0.75 0.61 0.23 0.69 0.92 2 0.50 2 0.50 0.27 0.10 1.25 1.53

6 0.77 12 1.60 0.5 1 0.16 0.42 0.66 17 2.19 6 0.80 3.24 1 .oo 1.77 1.45 3 0.39 9 1.20 0.07 0.02 0.68 0.69 5 0.65 3 0.40 0.18 0.06 1.13 1.20

25 20

5 5

3 12

5 4

1.82 24 2.18 0.88 0.32 0.56 0.87 1.45 19 1.73 2.77 1 .oo 2.09 1.76 0.36 10 0.91 0.17 0.06 0.49 0.74 0.36 2 0.18 0.18 0.07 0.85 0.62

0.40 1.60

0.67 0.37 0.23 0.46 1.33 1.63 1 .oo 1.54

1.11 0.89

5 10

5 4

1.11 0.31 0.18 0.63 0.89 1.69 1 .oo 1.37

0.90 1.10

0.88 1.12

404

Page 9: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Table 4 (Continued)

NUCLEOSOMALCORE OTHER Drosophilu GENE?? HISTONE GENES

Medium Low Dme L Dhy 2E High Bias Bias Bias

AMINO ACIDAND --

CODoN Usage RSCU Usage RSCU RSCU w, RSCU RSCU

Gln: CAA ....... CAG .......

Asn: A AT ....... AAC .......

Lys: AAA ....... AAG .......

Asp: GAT ....... GAC .......

Glu: GAA ....... GAG .......

Arg: CGT ....... CGC ....... CGA ....... CGG ....... AGA ....... AGG .......

Gly : GGT ....... GGC ....... GGA ....... GGG .......

cys: TGT ....... TGC .......

Met:c ATG .......

TqxC TGG .......

Ter:C TAA ....... TAG ....... TGA .......

9 1.06 6 0.71 0.14 0.08 0.35 0.66 8 0.94 11 1.29 1.86 1 .oo 1.65 1.34

3 0.43 7 0.93 0.17 0.09 0.79 1.04 11 1.57 8 1.07 1.83 1 .oo 1.21 0.96

12 0.41 17 0.59 0.07 0.04 0.34 0.67 46 1.59 41 1.41 1.93 1 .oo 1.66 1.33

5 0.83 7 1.17 0.88 0.79 0.91 1.14 7 1.17 5 0.83 1.12 1 .oo 1.09 0.86

9 0.75 17 1.42 0.18 0.10 0.31 0.68 15 1.25 7 0.58 1.82 1 .oo 1.69 1.32

22 2.64 28 3.36 2.42 0.77 0.92 0.95 18 2.16 15 1.80 3.13 1 .oo 2.66 1.69 4 0.48 1 0.12 0.13 0.04 0.59 1.02 2 0.24 1 0.12 0.00 0.005 b 0.83 1.18 1 0.12 2 0.24 0.03 0.01 0.33 0.50 3 0.36 3 0.36 0.29 0.09 0.67 0.66

13 15 15 0

1 0

10

0

4 0 0

1.21 1.40 1.40 0.00

2.00 0.00

. . .

. . .

3.00 0.00 0.00

17 17 9 0

1 0

10

0

4 0 0

1.58 1.39 0.81 0.69 0.97 1.58 1.72 1 .oo 1.99 1.47 0.84 0.89 0.52 1.04 1.21 0.00 0.00 0.003 b 0.28 0.36

2.00 0.00

. . .

. . .

3.00 0.00 0.00

0.11 0.06 0.38 0.65 1.89 1 .oo 1.62 1.35

. . . . . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . . . .

2.80 0.20 0.00

2.00 1.25 0.75 1 .oo 0.25 0.75

’ Values are from three groups of 15 genes each, determined by Shields et al. (1988) to have high, medium, or low codon bias, depending on their rank in a correspondence analysis of RSCU values.

b For calculating w,, a value of 0.50 is arbitrarily assigned to the observed usage of codons that are not represented in the high-bias sample, as suggested by Sharp and Li (1987).

c Codon was eliminated from calculations of the codon-bias indices.

405

Page 10: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

406 Fitch and Strausbaugh

differences between synonymous codons. Codon bias is therefore presented only for the highly conserved core histones, but not for Hl.

Codon bias in the core histone genes was measured first as the degree to which the frequency of codon synonyms differed from equality. RSCU values (Sharp et al. 1986) for each codon were calculated (table 4). The RSCU value for each codon in the histone genes can be compared with that calculated for the same codon in other Drosophila genes, since RSCU values are insensitive to either length or amino acid composition differences (Sharp and Li 1986). Table 4 shows that the codons of the core histone genes of both species have RSCU values that are closest to values from the medium- or low-bias genes of Shields et al. ( 1988). Consistent with decreased bias in these genes is the observation that, although D. melanogaster genes usually “prefer” SSC-type codons over their SST synonyms (Shields et al. 1988), T-ending codons in the histone genes are about as frequent as their C-ending synonyms (table 4).

An index that measures average codon usage bias over an entire gene is x2/n, i.e., a x2 statistic calculated for deviation from equal usage of codons within synon- ymous groups that is divided by the total number, n, of codons in the gene, less Trp, Met, and termination codons (Shields and Sharp 1987; Shields et al. 1988). The bias away from equal usage of synonymous codons is low to intermediate in Dme L core histones and is intermediate in the Dhy 2E core histones (table 5 ) .

In determining whether codon bias is due to selectional forces or mutational forces, the x2/n value is of limited use, since both types of forces could cause non- randomness of codon usage at quantitatively similar levels. Even genes with “low bias” (table 4) are significantly biased from equal usage of synonyms (G = 1,006, df = 4 1, P < 0.00 1). More likely than not, however, qualitatively different sets of codons would be favored by the different forces. We have calculated the CA1 (Sharp and Li 1986, 1987) to measure the degree to which synonymous codon usage bias is both quantitatively and qualitatively similar to that of 15 highly biased D. melanogaster genes (Shields et al. 1988). Using this index, we find that the core histones from both species have codon biases that are qualitatively and quantitatively most like the D. melanogaster genes with the low or medium RSCU bias (table 5 ).

Last, codon biases of the D. melanogaster and D. hydei core histone genes (Dme L and Dhy 2E) were compared in a goodness-of-fit test, by using the null hypothesis that histone gene codon bias was the same in both species. Histone gene codon biases between the two species are significantly different (G = 120, df = 39; P < 0.00 1: two codon classes, GGG and TGC, were excluded because their expected frequencies were c1.0).

Rates of Synonymous Changes in Drosophila Histone Genes

Because Drosophila histone genes demonstrate comparatively low bias in codon usage (possibly because of relaxed selection at synonymous sites), we predicted that they should have high rates of synonymous changes. We estimated the numbers of synonymous changes occurring between Dme L and Dhy 2E by using two methods ( Li et al. 1985; Lewontin 1989) that assume different evolutionary models to compare and evaluate the robustness of the estimates. When the first method (Li et al. 1985) is used, the KS value for the concatenated core histone genes of Dhy 2E and Dme L is 1.17 k 0.12 (tables 2 and 6). For these genes, &, the number of synonymous changes per fourfold synonymous site, is not significantly different from Ks (tables 2 and 6). In the second method (Lewontin 1989), the number of changes occurring per codon is estimated by assuming one of several different models. One model assumes

Page 11: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 407

Table 5 Codon Usage-Bias Indices for Histone Genes and Other Drosophila Genes

Genes CA1 x2/n

15 High bias . . . 15 Medium bias . . 15 Low bias . . Dme L core histones Dhy 2E core histones TEST.SEQ” . . . . . . .

. 0,723 0.886

. . 0.442 0.417 . 0.310 0.153

. . 0.387 0.377 0.350 0.440 0.186 0.000

’ An artificial sequence with unbiased (equal) codon usage.

that the equilibrium proportions of synonyms within a codon group equal the average codon usage observed between the two sequences. This model is inappropriate for the histone genes because the codon usage bias is significantly different between the two species (discussed above). A second model uses an assumed transition / transversion bias to calculate substitution probabilities. We estimated the ratio of transition / trans- version probabilities (Ps/Pv) as the observed proportion of transitions and transversions at all sites (Y T; table 2) or at fourfold synonymous sites ( r4; table 2). This model is insensitive to a range of transition / transversion probability ratios ( Lewontin 1989 ) . The numbers of synonymous changes per codon that are estimated from this method were converted to K’s values (synonymous changes per synonymous site) by using the number of synonymous sites calculated according to the method of Li et al. ( 1985). When the second model was used, the K’s value obtained (with 95% confidence limits) was 1.20 (0.87-1.72) where Ps/Pv = rT = 1.83 and was 1.17 (0.87-1.68) where Ps/Pv = r4 = 1.11. The K’s and KS values are not significantly different.

Conclusions and Speculations

As in unicellular organisms, genes for very abundant proteins show a high codon usage bias in Drosophila (Shields et al. 1988). Also, pairs of genes with known dif- ferences in relative expression levels differ in codon bias such that the more highly expressed gene has higher bias (Shields et al. 1988). This trend suggests that change at synonymous sites in highly expressed genes may be constrained in D. melanogaster, presumably by selection for translational efficiency (Shields et al. 1988).

The nucleosomal core histone genes of D. melanogaster and D. hydei stand in contrast to this trend. Evidence presented herein suggests that evolutionary changes at synonymous sites in these abundantly expressed genes are under constraints similar to those in genes showing low codon bias. The G+C composition at synonymous sites is considerably lower than that in genes with high bias-and often is lower than that in genes with low bias (see Shields et al. 1988). At these sites, the composition is similar to the species’ average genomic equilibrium value and approaches that of the intergenic spacers. Core histone codon usage is qualitatively and quantitatively more like that in low- or medium-bias genes than like that in high-bias genes. Last, the number of synonymous changes between D. (Sophophora) melanogaster and D. (Dro- sophila) hydei histone genes is relatively high when compared with that in other genes shared between species of the same two subgenera.

The observations that we have made for the histone genes are consistent with

Page 12: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

408 Fitch and Strausbaugh

Table 6 Divergence Matrix for Concatenated Coding Regions of Nucleosomal Histone Genes

Dme L Dme S Dhy 2E Dhy 1E

Dme L . Dme S . .

Dhy 2E . .

DhylE . . .

0.006 (0.004) 1.166 (0.116) 1.125 (0.111) 0.004 (0.004) 1.226 (0.130) 1.178 (0.123) 0.00 1 (0.00 1) 1.018 (0.131) 1.042 (0.142) 0.0 15 (0.007) 0.005 (0.002) 0.007 (0.003) 0.986 (0.125) . 1.006 (0.134) 0.0 12 (0.007)

0.005 (0.002) 0.007 (0.003) 0 (0)

Nom.-Above the diagonal, values are KS (SE); below the diagonal, the upper value of each pair is K., (SE), and the lower value is KA (SE).

the general trends observed in these species- i.e., that the rate of synonymous change is inversely correlated with both the G+C content at synonymous positions of codons and the degree of codon bias (Sharp and Li 1989; Moriyama and Gojobori 1992). In a recent study ( Moriyama and Gojobori 1992)) the KS values of genes in a comparison of species belonging to the same subgenera suggested that these genes could be placed into three broad groups on the basis of the level of accumulation of synonymous changes (which should nevertheless be recognized as a continuum of values). The KS for the concatenated core histone genes of Dhy 2E and Dme L (table 2) is slightly greater than that for the engrailed genes (intermediate codon bias), is 60% greater than that for the high-bias Hsp82 genes (see Shields et al. 1988 ), and is closest to the mean KS of the “group II” genes with moderate to high levels of synonymous substi- tution ( Moriyama and Gojobori 1992 ) .

The high rate of synonymous change in the nucleosomal histone genes is even more striking in light of their very low rate of nonsynonymous change compared with that in other genes (Li et al. 1985). Rates of synonymous change are often lower in genie regions with lower rates of nonsynonymous change than they are in regions with higher nonsynonymous rates (Lipman and Wilbur 1985; Schaeffer and Aquadro 1987). This trend is also evident among the histone genes; H4 is the most conserved (there are no nonsynonymous substitutions between D. hydei and D. melanogaster) and has the lowest synonymous substitution rate. In a comparison between D. melanogaster and D. simulans, divergence at synonymous sites in the H3 gene was found to be greater than that in the Adh gene (Matsuo and Yamazaki 1989a), which has a non- synonymous substitution rate considerably higher than that in the histones (see Schaef- fer and Aquadro 1987 ) . These results suggest that the synonymous rates in Drosophila histone genes are high for such conserved proteins and that codon optimization is probably not a major determinant of codon usage in these genes. Either the rule that codon usage is optimized in highly expressed genes is not generally true in Drosophila, or else the histone genes represent a special case. Here we present several arguments in support of the latter.

Tissue-specific or stage-specific tRNA pools could pose a selective constraint on codon usage of genes expressed at high levels. Such a case exists in the silk gland of Bombyx mori, in which fibroin and sericin are abundantly expressed and preferentially utilize codons recognized by abundant tRNAs (Garel 1974). However, D. melanogaster does not show major changes in relative tRNA abundances during development (White et al. 1973). Regardless of whether limiting tRNA pools occur in some temporal or

Page 13: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 409

spatial compartment in Drosophila development, we favor the interpretation that histones may escape such constraints, because of their somewhat atypical translational environments.

Histones are rapidly and abundantly translated early in embryogenesis (Anderson and Lengyel 1984). The very high level of de novo synthesis of histones during this period reflects ( 1) the requirement of histones and DNA in equal mass to form chro- matin, (2) the extremely fast rate of DNA replication in the Drosophila embryo, a rate that is among the most rapid in any eukaryote, and ( 3) the lack of maternally stored histone protein (Anderson and Lengyel 1984). While effects due to selection on codon usage would be expected to occur at this time of maximal translation, it is unlikely that codon optimization plays a role in this environment. Early embryos are packed with maternally derived components for protein synthesis (Davidson 1986, p. 75 ff.), and we speculate that tRNA pools are not sufficiently limiting at this stage to have noticeable effects on translation efficiency.

In Drosophila, the histone genes expressed during early embryogenesis are the same ones expressed during S-phase of the cell cycle (Hampikian 1990). Expression of most other genes is repressed during S-phase (Edgar and Schubiger 1986). We speculate that histone message has relatively little competition for protein synthesis components at this time and that tRNA availability may not be a limiting factor for cell cycle-regulated histone translation (see Davidson 1986, p. 77 ff.) .

The absence of selective constraint at synonymous sites is further supported by a possible correlation between G+C composition at these sites and the chronology of histone gene replication. Replication of the histone genes in D. melanogaster is restricted to the last fifth of S-phase (Grell 1978). During S-phase in mammalian cells, the pools of precursor nucleotides change from being abundant in G+C to being A+T rich (Leeds et al. 1985). Synonymous sites- and especially fourfold-degenerate sites-are much more AST rich (especially T-rich) in the histone genes than they are in most Drosophila genes (table 3; see Shields et al. 1988). If the composition of precursor nucleotide pools changes similarly in the Drosophila cell cycle and influences muta- tional direction, as has been proposed for mammals (Wolfe et al. 1989)) then the codon usage of Drosophila histone genes would seem more likely to be determined by mutational forces than by selection for translational efficiency.

Alternatively, selection pressure for translational efficiency may be just as great on the histone genes as it is on other highly expressed genes, but genetic drift may be greater. One genetic process that probably accelerates genetic drift and that is unique to duplicated and tandemly repeated genes is concerted evolution, in which repeats do not evolve independently but often share derived changes (Coen et al. 1982; Dover et al. 1982). Both the D. melanogaster and D. hydei histone genes are tandemly repeated and demonstrate concerted evolution (Lifton et al. 1978; Coen et al. 1982; Fitch 1986). Rapid rates of concerted evolution are indicated by the high similarities between paralogous repeats within each species. In a study of one D. melanogaster population, Matsuo and Yamazaki ( 19.89a) found that interchromosomal divergence between histone repeats within a I-kb segment (5’ end of H4 through H3 and part of the large intergenic spacer) was approximately two to six substitutions. If we assume that this segment is representative with respect to substitutions, we would predict lo- 30 substitutions over an entire 5-kb repeat. Similar comparisons from another D. melanogaster population reveal even greater levels of differences between paralogues (M. Bourke and L. D. Strausbaugh, unpublished data). Also, strains of D. melanogaster often differ in the presence, absence, or relative abundances of repeats with sizes of

Page 14: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

410 Fitch and Strausbaugh

4.0-5.5 kb (Strausbaugh and Weinberg 1982). The two D. hydei histone repeats com- pared in the present paper represent the only two different types of repeat to be dis- tinguished so far in this species (both types being present in the two strains represented). They differ by a total of 56 substitutions plus indels and, in length, by only 8 bp. Because divergences between paralogues within these species are similar, rates of con- certed evolution at the histone loci are probably similar in these species.

Concerted evolution (e.g., through gene conversion) has been implicated in caus- ing both repeat identity (i.e., “homogenization” by conversion across an entire repeat sequence) and a greater diversity of repeats than would be possible by mutation alone (i.e., by conversion across small segments of divergent repeats) (Dover 1987; Basten and Ohta 1992). Together, these homogenization and diversification processes probably lead to rapid rates of fixation of different repeat types at different times, a process often referred to as “molecular drive” (Coen et al. 1982; Dover et al. 1982). We speculate that, in their genetic context as a tandemly repeated multigene family, Dru- sophila histone genes probably undergo fundamentally different evolutionary dynamics than do nonrepeated genes, such that increased rates of genetic drift through molecular- drive mechanisms may swamp out effects of small fitness differences among synon- ymous codons.

As another alternative, selection may actually result in increased diversification of codon usage. This might be the case, for example, if histone mRNA stability or function requires a particular compositional balance (Huynen et al. 1992), or if ho- monucleotide runs were disadvantageous. We performed a statistical test (the “NT test” of Perrin and Grantham 1988) to determine whether such runs were significantly avoided or prevalent in the nucleosomal histone genes. Whereas runs of C’s and T’s are not significant, runs of G’s are avoided [ 2.4 standard deviations (SD) from the expected probability], and runs of A’s are prevalent (by 4.9 SD from the expected probability). The runs of A’s may be due to the high percentage of lysine residues in the histones (codons AAA and AAG). The avoidance of runs of G’s is reflected in the lack of GGG (Gly) codons. Thus, selective forces acting at a level other than tRNA-codon recognition may have resulted in the low codon bias observed for Dro- sophila histone genes. This hypothesis, however, is weakened by the high rates of change at synonymous sites.

In conclusion, the histone genes are an exception to the rule that codon selection in abundantly expressed Drosophila genes is optimized for translational efficiency (Shields et al. 1988). However, by being an exception to the rule, the case of the histone genes emphasizes the importance of considering the complexity of genetic and developing systems in generalized models.

Sequence Availability

The complete Drosophila hydei histone repeat sequence reported in this paper (Dhy 2E; fig. 2) has been submitted to the EMBL data base and has accession number X52576. The alignment is also available on the EMBL file server by sending to [email protected] a mail message that includes the line GET ALIGN: DS8200.DAT.

Acknowledgments

We thank Dr. M. Goodman for use of lab facilities during part of this project, Dr. H. Krider for Drosophila hydei flies, Dr. R. Lewontin for his computer program SYNSUB, Dr. M. Riley for stimulating discussions and comments on the manuscript,

Page 15: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 411

two anonymous reviewers for their many insightful suggestions, and Drs. J. Powell and R. DeSalle for many helpful discussions and for providing a stimulating environ- ment during a sabbatical visit (by L.D.S.) supported by an Alfred P. Sloan Foundation Sabbatical Supplement Award in Molecular Studies of Evolution. This work was sup- ported in part by National Science Foundation grant BSR-9009938 to L.D.S. and in part by National Institutes of Health Postdoctoral Fellowship GM 13652 to D.H.A.F.

LITERATURE CITED

ANDERSON, K. V., and J. A. LENGYEL. 1984. Histone gene expression in Drosophila devel- opment: multiple levels of gene regulation. Pp. 135-161 in G. S. STEIN, J. L. STEIN, and W. F. MARZLUFF, eds. Histone genes: structure, organization, and regulation. John Wiley & Sons, New York.

ANDERSSON, S. G. E., and C. G. KURLAND. 1990. Codon preferences in free-living microor- ganisms. Microbial. Rev. 54: 198-2 10.

AOTA, S.-I., and T. IKEMURA. 1986. Diversity in G+C content at the third positions of codons in vertebrate genes and its cause. Nucleic Acids Res. 14:6345-6355.

BASTEN, C. J., and T. OHTA. 1992. Simulation study of a multigene family, with special reference to the evolution of compensatory advantageous mutations. Genetics 132:247-252.

BERNARDI, G., and G. BERNARDI. 1986. Compositional constraints and genome evolution. J. Mol. Evol. 24: I- 11.

BULMER, M. 1987. Coevolution of codon usage and tRNA abundance. Nature 325:728-730. CHANG, L.-Y. E., and J. L. SLIGHTOM. 1984. Isolation and nucleotide sequence analysis of the

P-type globin pseudogene from human, gorilla and chimpanzee. J. Mol. Biol. 180:767-784. COEN, E., T. STRACHAN, and G. DOVER. 1982. Dynamics of concerted evolution of ribosomal

DNA and histone gene families in the melanogaster species subgroup of Drosophila. J. Mol. Biol. 158:17-35.

DAVIDSON, E. H. 1986. Gene activity in early development, 3d ed. Academic Press, New York. DOOLITTLE, R. F. 1986. Of URFs and ORFs: a primer on how to analyze derived amino acid

sequences. University Science Books, Mill Valley, Calif. DOVER, G. 1987. DNA turnover and the molecular clock. J. Mol. Evol. 26:47-58. DOVER, G., S. BROWN, E. COEN, J. DALLAS, T. STRACHAN, and M. TRICK. 1982. The dynamics

of genome evolution and species differentiation. Pp. 343-372 in G. A. DOVER and R. B. FLAVELL, eds. Genome evolution. Academic Press, New York.

EDGAR, B. A., and G. SCHUBIGER. 1986. Parameters controlling transcriptional activation during early Drosophila development. Cell 44:87 l-877.

FITCH, D. H. A. 1986. Characterization of the histone genes of Drosophila hydei. Ph.D. diss., University of Connecticut, Storm.

FITCH, D. H. A., L. D. STRAUSBAUGH, and V. BARRETT. 1990. On the origins of tandemly repeated genes: does histone gene copy number in Drosophila reflect chromosome location? Chromosoma 99: 118- 124.

GAREL, J.-P. 1974. Functional adaptation of tRNA population. J. Theor. Biol. 43:21 l-225. GASSER, S. M., and U. K. LAEMMLI . 1986. Cohabitation of scaffold binding regions with upstream

enhancer elements of three developmentally regulated genes of D. melanogaster. Cell 46: 521-530.

GOLDBERG, M. L. 1979. Sequence analysis of Drosophila histone genes. Ph.D. diss., Stanford University, Stanford.

GOUY, M., and C. GAUTIER. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:7055-7074.

GRELL, R. F. 1978. High frequency recombination in centromeric and histone regions of Dro- sophila genomes. Nature 272:78-79.

HAMPIKIAN, G. 1990. Cell-cycle expression of Drosophila melanogaster histone genes. Ph.D. thesis, University of Connecticut, Storm.

Page 16: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

412 Fitch and Strausbaugh

HESS, 0. 1976. Genetics of Drosophila hydei Sturtevant. Pp. 1343-l 363 in M. ASHBURNER and E. NOVITSKI, eds. The genetics and biology of Drosophila. Vol. lc. Academic Press, New York.

HUNT, L. T., and M. 0. DAYHOFF. 1982. Evolution of chromosomal proteins. Pp. 193-239 in M. GOODMAN, ed. Macromolecular sequences in systematics and evolutionary biology. Plenum, New York.

HUYNEN, M. A., D. A. M. KONINGS, and P. HOGEWEG. 1992. Equal G and C contents in histone genes indicate selection pressures on mRNA secondary structure. J. Mol. Evol. 34: 280-29 1.

IKEMURA, T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2~13-34.

JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 l-l 32 in H. N. MUNRO, ed. Mammalian protein metabolism. Vol. 3. Academic Press, New York.

KIMURA, M. 1980. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 11 l-120.

KIMURA, M., and T. OHTA . 1972. On the stochastic model for estimation of mutational distance between homologous proteins. J. Mol. Evol. 2:87-90.

KREMER, H., and W. HENNIG. 1990. Isolation and characterization of a Drosophila hydei histone DNA repeat unit. Nucleic Acids Res. 18: 1573- 1580.

LEEDS, J. M., M. B. SLABOURGH, and C. K. MATTHEWS. 1985. DNA precursor pools and ribonucleotide reductase activity: distribution between the nucleus and cytoplasm of mam- malian cells. Mol. Cell. Biol. 5:3443-3450.

LEWONTIN, R. C. 1989. Inferring the number of evolutionary events from DNA coding sequence differences. Mol. Biol. Evol. 6: 15-32.

Lr, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nu- cleotide and codon changes. Mol. Biol. Evol. 2: 150- 174.

LIFTON, R. P., M. L. GOLDBERG, R. W. KARP, and D. S. HOGNESS. 1978. The organization of the histone genes in Drosophila melanogaster: functional and evolutionary implications. Cold Spring Harb. Symp. Quant. Biol. 42: 1047- 105 1.

LIPMAN, D. J., and W. J. WILBUR. 1985. Interaction of silent and replacement changes in eukaryotic coding sequences. J. Mol. Evol. 21: 16 1- 167.

MANIATIS, T., E. F. FRITSCH, and J. SAMBROOK. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

MATSUO, Y., and T. YAMAZAKI. 1989a. Nucleotide variation and divergence in the histone multigene family in Drosophila melanogaster. Genetics 122:87-97.

- 1989b. tRNA derived insertion element in histone gene repeating unit of Drosophila . melanogaster. Nucleic Acids Res. 17:225-238.

MIRKOVITCH, J., M.-E. MIRAULT, and U. K. LAEMMLI . 1984. Organization of the higher-order chromatin loop: specific DNA attachment sites on nuclear scaffold. Cell 39:223-232.

MORIYAMA, E. N., and T. GOJOBORI. 1992. Rates of synonymous substitution and base com- position of nuclear genes in Drosophila. Genetics 130:855-864.

MURPHY, T. J., and M. BLUMENFELD. 1986. Nucleotide sequence of a Drosophila melanogaster Hl histone gene. Nucleic Acids Res. 14:5563.

NEI, M., and T. GOJOBORI. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:4 18-426.

PERRIN, P., and R. GRANTHAM. 1988. Avoidance of base runs in switch regions of immune- system genes. Mol. Biol. Evol. 5: 14 l-l 53.

SCHAEFFER, S. W., and C. F. AQUADRO. 1987. Nucleotide sequence of the Adh gene region of Drosophila pseudoobscura: evolutionary change and evidence for an ancient gene duplication. Genetics 117:6 l-73.

SHAPIRO, H. S. 1976. Distribution of purines and pyrimidines in deoxyribonucleic acids. Pp.

Page 17: Low Codon Bias and High Rates of Synonymous Substitution ......divergence of 45% + 0.2% (corrected for superimposed substitutions; Jukes and Cantor 1969; Kimura and Ohta 1972). The

Low Codon Bias in Drosophila Histone Genes 413

24 l-28 1 in G. D. FASMAN, ed. CRC handbook of biochemistry and molecular biology, 3d. ed. Section B. Nucleic acids, vol. 2. CRC, Cleveland.

SHARP, P. M., and W.-H. LI . 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24:28-38.

- . 1987. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222-230.

- 1989. On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398- . 402.

SHARP, P. M., T. M. F. TUOHY, and K. R. MOSURSKI. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14:5 125- 5143.

SHIELDS, D. C., and P. M. SHARP. 1987. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 15:8023-8040.

SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT. 1988. “Silent” sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5: 704-7 16.

STRAUSBAUGH, L. D., and E. S. WEINBERG. 1982. Polymorphism and stability in the histone gene cluster of Drosophila melanogaster. Chromosoma 85:489-505.

THROCKMORTON, L. H. 1975. The phylogeny, ecology, and geography of Drosophila. Pp. 42 l- 469 in R. C. KING, ed. Handbook of genetics. Vol. 3: Invertebrates of genetic interest. Plenum, New York.

WELLS, D., and J. HERRMANN. 1989. Functionally constrained codon usage in histone genes. Int. J. Biochem. 21:1-6.

WELLS, D., and C. MCBRIDE. 1989. A comprehensive compilation and alignment of histones and histone genes. Nucleic Acids Res. 17:r3 1 l-r346.

WHITE, B. N., G. M. TENER, J. HOLDEN, and D. T. SUZUKI. 1973. Analysis of tRNAs during the development of Drosophila. Dev. Biol. 33: 185-195.

WOLFE, K. H., P. M. SHARP, and W.-H. LI. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.

BARRY G. HALL, reviewing editor

Received June 2, 1992; revision received September 24, 1992

Accepted September 24, 1992