Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd)...

5
Proc. Natl. Acad. Sci. USA Vol. 90, pp. 7475-7479, August 1993 Evolution Evidence for adaptive evolution of the G6pd gene in the Drosophila melanogaster and Drosophila simulans lineages (molecular evolutIon/amIno acid sequence) WALTER F. EANES*, MICHELE KIRCHNER, AND JEANNE YOON Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794 Communicated by Robert R. Sokal, April 23, 1993 (received for review December 7, 1992) ABSTRACT Proponents of the neutral theory argue that evolution at the molecular level lagely reflects a process of random genetic drift of neutral mutations. Under this theory, levels of interspeific divergence and intapecfic polymor- phism are expected to be correlated across dasses of nudeotide or amino acid sequences with different degrees of functional constraint, such as synonymous and replcement sites. Nude- otide sites with reduced polymorphbm should show compara- bly reduced levels of interspecific divergence. To examine this hypothesis, we have sequenced 32 and 12 copies of the glucose- 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela- nogaster and Drosophila sinulans, respectively. Both species exhibit similar levels of nucleotide polymorphism at synony- mous sites. D. melanogaster shows two amino acid polymor- phisms, one associated with the cosmopolitan aflozyme poly- morphism and a second with an allozyme polymorphism en- demic to European and North African populations. In contrast, D. simulans shows no replacement polymorphism. While syn- onymous divergence between specks is 10%, which is typical of other genes, there are 21 replacement differences. This level of amino acid sequence divergence, when contrasted with levels of amino acid polymorphism, silent polymorphism, and diver- gence, is in 10-fold excess over that expected under the neutral model of molecular evolution. We propose that this excess divergence reflects episodes of natural selecton on G6pd re- sulting in fixation of advantageous amino acid mutations in these two recentiy separated lineages. The extent that intra- and interspecific variation in DNA and amino acid sequences reflects a process of adaptation or is simply molecular noise remains one of the enduring questions in evolutionary biology. The alternative model to simple adaptive change, or neutral theory, assumes the action of natural selection at the molecular level, but only acting in a purifying fashion. Thus, most amino acid mutations are assumed to be eliminated by natural selection. A small minority, those satisfying the criterion that they minimally disrupt protein function and thus confer no fitness loss on the individual, face eventual extinction or fixation through a stochastic process whose transition time depends on popu- lation size. Insofar as observed patterns of DNA sequence divergence between species and standing levels of molecular polymorphism within species are concerned, the theory and its proponents dismiss a significant positive role for selection, either in adaptive substitution or in the maintenance of molecular polymorphism (1, 2). Two types of quantitative analyses have been used to examine the hypothesis that protein sequence divergence at the interspecific level is neutral. One argument for adaptive amino acid change uses the index of dispersion to infer heterogeneous rates of substitution across phylogenies (3). The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 7475 An alternative approach, made possible with the introduction of DNA sequence polymorphism data, contrasts the relative levels of divergence between, and polymorphism within, species for replacement and silent sites (4, 5). Under the neutral theory, within and between species sequence diver- gence is expected to covary because intraspecific polymor- phism is the transient phase of the stochastic process culmi- nating in either fixation or loss of alleles. The expected rate of fixation of neutral mutations is simply the neutral mutation rate, whereas the steady-state level of neutral polymorphism depends on the neutral mutation rate and effective population size. Both measures of divergence are correlated because each depends on the same neutral mutation parameter. Therefore, genes or nucleotide sequences with relatively high levels of interspecific divergence should show corresponding high levels of intraspecific polymorphism. In principle, this expected concordance presents an opportunity for a direct test among genes, or between parts of genes, where observed numbers of polymorphisms in a sample are contrasted with the observed number of divergent sites between species. This approach has been introduced in the study of balancing polymorphism for the Adh F/S electrophoretic polymor- phism (4) and a study of amino acid divergence at the Adh locus across the Drosophila melanogaster, Drosophila sim- ulans, and Drosophila yakuba lineages (5). The Eanes laboratory has studied a number of features of the electrophoretic polymorphism for glucose-6-phosphate dehydrogenase (G6PD) in D. melanogaster (6-9) as an em- pirical model. The function of G6PD in D. melanogaster is well established as the initial enzymatic step in the pentose shunt pathway, and its evolutionary and functional homolo- gies with yeast and mammalian G6PD are clear (10). The role of the pentose shunt is to maintain the NADP/NADPH balance of the cell in the face of demands that draw on the NADPH pool, such as fatty acid synthesis and detoxification. In D. melanogaster, it has been shown that 40%6 of the reduced NADPH is provided by this pathway (11). Glucose flux through the pentose shunt is very sensitive to activity variation at G6PD, as might be expected for a regulatory enzyme located at the branch point between metabolic path- ways (6, 8, 9). Because of this important regulatory role, G6PD should be expected to be relatively conservative and resistant to both amino acid polymorphism and fixation. As part of our studies, we have sequenced 32 copies of the coding region of the G6pd locus in D. melanogaster and 12 copies in its close relative D. simulans, a species without any common electrophoretic polymorphism for G6PD.t In this report, we describe the pattern of amino acid polymorphism and divergence within and between the D. melanogaster and D. simulans lineages. We examine the concordance of levels Abbreviations: G6PD, glucose-6-phosphate dehydrogenase; HKA, Hudson-Kreitman-Aguade. *To whom reprint requests should be addressed. the sequences reported in this paper have been deposited in the GenBank data base (accession nos. L13876-L13920). Downloaded by guest on December 26, 2020

Transcript of Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd)...

Page 1: Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster andDrosophila sinulans, respectively. Both species

Proc. Natl. Acad. Sci. USAVol. 90, pp. 7475-7479, August 1993Evolution

Evidence for adaptive evolution of the G6pd gene in the Drosophilamelanogaster and Drosophila simulans lineages

(molecular evolutIon/amIno acid sequence)

WALTER F. EANES*, MICHELE KIRCHNER, AND JEANNE YOONDepartment of Ecology and Evolution, State University of New York, Stony Brook, NY 11794

Communicated by Robert R. Sokal, April 23, 1993 (receivedfor review December 7, 1992)

ABSTRACT Proponents of the neutral theory argue thatevolution at the molecular level lagely reflects a process ofrandom genetic drift of neutral mutations. Under this theory,levels of interspeific divergence and intapecfic polymor-phism are expected to be correlated across dasses of nudeotideor amino acid sequences with different degrees of functionalconstraint, such as synonymous and replcement sites. Nude-otide sites with reduced polymorphbm should show compara-bly reduced levels of interspecific divergence. To examine thishypothesis, we have sequenced 32 and 12 copies of the glucose-6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster and Drosophila sinulans, respectively. Both speciesexhibit similar levels of nucleotide polymorphism at synony-mous sites. D. melanogaster shows two amino acid polymor-phisms, one associated with the cosmopolitan aflozyme poly-morphism and a second with an allozyme polymorphism en-demic to European and North African populations. In contrast,D. simulans shows no replacement polymorphism. While syn-onymous divergence between specks is 10%, which is typical ofother genes, there are 21 replacement differences. This level ofamino acid sequence divergence, when contrasted with levels ofamino acid polymorphism, silent polymorphism, and diver-gence, is in 10-fold excess over that expected under the neutralmodel of molecular evolution. We propose that this excessdivergence reflects episodes of natural selecton on G6pd re-sulting in fixation of advantageous amino acid mutations inthese two recentiy separated lineages.

The extent that intra- and interspecific variation in DNA andamino acid sequences reflects a process of adaptation or issimply molecular noise remains one ofthe enduring questionsin evolutionary biology. The alternative model to simpleadaptive change, or neutral theory, assumes the action ofnatural selection at the molecular level, but only acting in apurifying fashion. Thus, most amino acid mutations areassumed to be eliminated by natural selection. A smallminority, those satisfying the criterion that they minimallydisrupt protein function and thus confer no fitness loss on theindividual, face eventual extinction or fixation through astochastic process whose transition time depends on popu-lation size. Insofar as observed patterns of DNA sequencedivergence between species and standing levels of molecularpolymorphism within species are concerned, the theory andits proponents dismiss a significant positive role for selection,either in adaptive substitution or in the maintenance ofmolecular polymorphism (1, 2).Two types of quantitative analyses have been used to

examine the hypothesis that protein sequence divergence atthe interspecific level is neutral. One argument for adaptiveamino acid change uses the index of dispersion to inferheterogeneous rates of substitution across phylogenies (3).

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

7475

An alternative approach, made possible with the introductionofDNA sequence polymorphism data, contrasts the relativelevels of divergence between, and polymorphism within,species for replacement and silent sites (4, 5). Under theneutral theory, within and between species sequence diver-gence is expected to covary because intraspecific polymor-phism is the transient phase of the stochastic process culmi-nating in either fixation or loss of alleles. The expected rateoffixation ofneutral mutations is simply the neutral mutationrate, whereas the steady-state level of neutral polymorphismdepends on the neutral mutation rate and effective populationsize. Both measures of divergence are correlated becauseeach depends on the same neutral mutation parameter.Therefore, genes or nucleotide sequences with relatively highlevels of interspecific divergence should show correspondinghigh levels of intraspecific polymorphism. In principle, thisexpected concordance presents an opportunity for a directtest among genes, or between parts ofgenes, where observednumbers of polymorphisms in a sample are contrasted withthe observed number of divergent sites between species. Thisapproach has been introduced in the study of balancingpolymorphism for the Adh F/S electrophoretic polymor-phism (4) and a study of amino acid divergence at the Adhlocus across the Drosophila melanogaster, Drosophila sim-ulans, and Drosophila yakuba lineages (5).The Eanes laboratory has studied a number of features of

the electrophoretic polymorphism for glucose-6-phosphatedehydrogenase (G6PD) in D. melanogaster (6-9) as an em-pirical model. The function of G6PD in D. melanogaster iswell established as the initial enzymatic step in the pentoseshunt pathway, and its evolutionary and functional homolo-gies with yeast and mammalian G6PD are clear (10). The roleof the pentose shunt is to maintain the NADP/NADPHbalance of the cell in the face of demands that draw on theNADPH pool, such as fatty acid synthesis and detoxification.In D. melanogaster, it has been shown that 40%6 of thereduced NADPH is provided by this pathway (11). Glucoseflux through the pentose shunt is very sensitive to activityvariation at G6PD, as might be expected for a regulatoryenzyme located at the branch point between metabolic path-ways (6, 8, 9). Because of this important regulatory role,G6PD should be expected to be relatively conservative andresistant to both amino acid polymorphism and fixation.As part of our studies, we have sequenced 32 copies of the

coding region of the G6pd locus in D. melanogaster and 12copies in its close relative D. simulans, a species without anycommon electrophoretic polymorphism for G6PD.t In thisreport, we describe the pattern of amino acid polymorphismand divergence within and between the D. melanogaster andD. simulans lineages. We examine the concordance of levels

Abbreviations: G6PD, glucose-6-phosphate dehydrogenase; HKA,Hudson-Kreitman-Aguade.*To whom reprint requests should be addressed.the sequences reported in this paper have been deposited in theGenBank data base (accession nos. L13876-L13920).

Dow

nloa

ded

by g

uest

on

Dec

embe

r 26

, 202

0

Page 2: Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster andDrosophila sinulans, respectively. Both species

Proc. Natl. Acad. Sci. USA 90 (1993)

of amino acid polymorphism and divergence with the samemeasures for silent or synonymous sites that are presumed tobe neutral. We conclude that the divergence at the amino acidlevel is in significant excess relative to the levels of intraspe-cific amino acid polymorphism.

MATERIALS AND METHODSWild and Mutant Lines. The origin and genetic extraction

ofX chromosomes for D. melanogaster have been described(7). The 32 lines come from Watsonville, CA (9 lines), Mt.Sinai, NY (11 lines), Turingen, Germany (6 lines), Orchid, FL(1 line), Okavango Delta, Botswana (4 lines), and Mdndtrdol,France (1 line). The D. simulans lines were collected at DavisPeach Farm, Mt. Sinai, NY, in 1990 (4 lines), Montpelier,France, in 1991 (4 lines), and Vera Cruz, Mexico, in 1990 (4lines). Single X chromosomes from D. simulans were genet-ically isolated using an attached-X stock from J. A. Coyne(University of Chicago).PCR Amplication and DNA Sequencing. The DNA se-

quence representing 1673 nt of the G6pd locus (nt 545-2216in ref. 12) in D. melanogaster was amplified via PCR (13)from prepared genomic DNA (14). Approximately 10 ng ofgenomic DNA was amplified in 50 ,l of 10 mM Tris HCI, pH8.3/50 mM KCl/0.01% gelatin/l mM MgCl2/2 units of Am-pliTaq polymerase (Perkin-Elmer)/30 ng ofeach primer. Theresulting amplified 1.67-kb fragment was excised from a 3%NuSieve agarose gel and used as template to amplify twosmaller segments (nt 545-1436 and nt 1408-2216). Single-strand template for sequencing was generated by kinase-treated primer/A exonuclease digestion (15). DNA templatewas separated from PCR primers using Millipore filters.Primers for Sanger dideoxynucleotide sequencing (16) usingSequenase (United States Biochemical) were spaced aboutevery 300 bp. Reaction products from each sequencingreaction mixture were electrophoresed on both standardpolyacrylamide gels with an electrolyte gradient (17) andLong Ranger polyacrylamide gels (AT Biochem, Malvern,PA). Both strands were completely sequenced for each allele,with rare gaps of 5-10 bases (<1-2% of the total sequence),where only one strand produced readable sequence. Allpolymorphisms and differences were confirmed on bothstrands, and no errors were observed.

Statistical Analysis. To contrast the relative levels of di-vergence and polymorphism at both replacement and silentsites, we used two statistical analyses that depend on differ-ent assumptions. First, we use the likelihood ratio, or G test,as proposed by McDonald and Kreitman (5). This test is a 2x 2 contingency test, with replacement and silent classesdesignated as columns, and "fixed" and polymorphic sitesdesignated as row categories. Fixed sites are those divergedsites at the interspecific level that are invariant in oursamples. Use ofthe G test assumes complete linkage betweenreplacement and silent sites, thus both classes of sites haveidentical genealogies and share stochastic deviations. Wealso apply a version of the test outlined in Hudson et al. (4),termed the Hudson-Kreitman-Aguade (HKA) test. This testassumes a Wright-Fisher population model, neutral muta-tion, free recombination between classes of sites, and com-plete linkage within classes. Under this neutral model, theexpected levels of polymorphism within, and divergencebetween, hypothetical species A and B and their stochasticvariances are determined by the four parameters given byequations 1-4 in ref. 5. These parameters are 0= 4N,u (whereN is the effective population size of species A, and A is theneutral mutation rate), T is the time in 2N generations sinceseparation of species A and B, andfis the relative proportionof population size of species B to species A. Table 1 provides,as a function of these parameters, the expected number ofpolymorphic sites within, and number of site differences

Table 1. Expected numbers of polymorphic and diverged sites asa function of the parameters for the modified HKA test of twospecies A and B

Replacement SilentSpecies A polymorphism Oj[C(nAj] Os[C(nAJ]Species B polymorphism fOR[C(nB)I fOs[C(nB)]Divergence OR[T + (1 + f)/2] Os[T + (1 + f)/2]

Note, C(n) = XJ!Jj'1/j, where n is the sample size for each species.

between random sequences of species A and B. Note, underthis model the ratio of replacement to silent sites for bothpolymorphism and divergence will simply reflect the ratio ofneutral mutation rates, OR/Os. The test is more conservativethan the G test, and a test statistic that is approximatelydistributed as x2 with two degrees of freedom has beenproposed (4). In the absence of free recombination this testbecomes increasingly conservative.

RESULTSDesignation of a G6PD Consensus Sequence. We have

determined likely errors in the original Fouts et al. (12)sequence that was derived from strains Oregon-R and Can-ton-S. Our reference sequence is line OK93 (Botswana)shown in Fig. 1. The sequence spans 1943 nt and does notinclude most of intron 1, which is highly (2.7-12 kb) variablefor size in natural populations due to insertion variants (7). Tobe as consistent as possible with the numbering used in theoriginal sequence, exon 2 is started at nt 543. Our sequencehas 2 additional bases added at position 807 (seen in all lines),23 bases added in intron 3, and 3 bases added in exon 4 thatresult in an additional amino acid residue. The amino acidresidues differing from the sequence in Fouts et al. (12) areunderlined.

Sequence Polymorphism and Divergence. There are 85positions within the G6pd 1548-nt coding region that areeither polymorphic within or different between the twospecies. These sites and their states are listed in Table 2. Thelevels of silent polymorphism within each species are similar,and no polymorphisms are shared. Assuming a Wright-Fisher model, the estimated neutral parameter 0 for exons2-4, excluding introns, is 0.0036 and 0.0029 for the 32 D.melanogaster and 12 D. simulans lines, respectively. Thecorresponding pair-wise differences (ir) are 0.38 and 0.34%.There are 19 aa differences between the species. Changes atnt 766 and 767 and at nt 1258 and 1260 result in Val -- Asnand Gln -* Asn amino acid changes, respectively, but eachinvolves two replacement events (two double changes).Overall, 21 replacement events can be inferred from thenucleotide sequence. Our D. melanogaster sample showstwo replacement-site polymorphisms. These correspond tothe well-characterized A/B allozyme polymorphism distrib-uted throughout the global population (18) involving a Pro -*Leu change at position 1817, and the unique AFI electro-phoretic allele seen in European populations that results froma Gly -- Cys change at nt 619. D. simulans has no replace-ment polymorphisms in the sample of 12 copies, and noallozyme polymorphism has ever been reported or seen bythis laboratory for this species.The summary numbers for polymorphism, "fixed sites,"

and divergence (between lines OK93 and DPF88S) are givenin Table 3. For the G test, we pooled the number of poly-morphisms for the two species, yielding counts of 2 and 36polymorphic replacement and silent sites, respectively. Thecorresponding numbers of fixed replacement and silent sitesare 21 and 26. The 2 x 2 contingency G test of this dataconfiguration is highly significant (G = 18.%; P << 0.001).For the HKA test, we have counts of 2 and 22 and of 0 and14 replacement and silent polymorphisms in D. melanogaster

7476 Evolution: Eanes et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 26

, 202

0

Page 3: Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster andDrosophila sinulans, respectively. Both species

Evolution: Eanes et al. Proc. Natl. Acad. Sci. USA 90 (1993) 7477

10 20 30 40 50 60 70 80 90 100ATGGCACGCAAAAGGIAaggtgagtatcgcagctttgggcaat tggtatcgaat tggat tcctcaccgcgtactgt taccaat tcgat tttcgggcgcgt

11T Q 15E

110 120 130 140 150 160 170 180 190 200acttgccgcgatcactgtcgcgccaggtggactggct ttccaatcggaatccgat tgcgagtgcgagagggtcacccagtgccagtgccagtgcccgtgc

210 220 230cagtgcgcccttcaaatgtgctcttcgcctggcac ......... Intron 1...

540 550 560 570 580 590 600. ........................................gAGATCACACCGCCCTGGATCTCATAATCAAGTCACTCAAGTCGCCTACAATGGTCTGC

D N T A L D L I I K S L K S P TM V C

610 620 630 640 650 660 670 680 690 700GAGGGAACCCACTTTGACGGCAAGATTCCGCACACGTTCGTCATCTTTGGGGCGTCGGGCGATCTGGCCMGAAGAAGATCTACCCCACGCTCTGGTGGCE G T H F D G K I P H T F V I F G A S G D L A K K K I Y P T L UW

710 720 730 740 750 760 770 780 790 800TCTACCGCGATGACCTGCTGCCCAAGCCGACCMGTTCTGCGGCTATGCCCGTTCCATGCTGACCGTCGATAGCATCMGGAGCAGTGTCTGCCGTACATL Y R D D L L P K P T K F C G Y A R S M L T V D S I K E Q C L P Y M

810 820 830 840 850 860 870 880 890 900GAAGGTgcgt tcgggat ttggtcataggccatt tgcatcgcat taacccaacccat tgccacacaggtCCAGCCGCACGAGCAGAAGAAGTACGAGGAGTK V a P H E Q K K Y E E

910 920 930 940 950 960 970 980 990 1000TCTGGGCCCTCAATGAGTACGTGTCCGGCAGATATGACGGACGCACTGGCTTCGAGCTGCTTAACCAGCAGCTGGAGATTATGGAGAACMGAACMGGCF U A L N E Y V S G R Y D G R T G F E L L N OQ L E I H E N K N K A

1010 1020 1030 1040 1050 1060 1070 1080 1090 1100CAACCGCATCTTCTATCTGGCCCTGCCGCCCAGCGTCTTCGAGGAGGTGACTGTCAACATCAAGCAGATCTGCATGTCGGTCTGgtgagtataagatcas

N R I F Y L A L P P S V F E E V T V N I K O I C H S V C

1110 1120 1130 1140 1150 1160 1170 1180 1190 1200gatcoagatccagatgcat ttaatttgaaacccgatact tatgtacacctataacccgt tcgct tgcagCGGT TGGAACCGCGTGAT TATCGAGAAGCCT

G W N R V I I E K P

1210 1220 1230 1240 1250 1260 1270 1280 1290 1300TTCGGCCGGGATGACGCCTCCTCGCAGGCGCTGAGCGACCATCTGGCCGGTCTGTTCCAGGAGGATCAGCTGTACCGCATCGATCACTACCTGGGCAAGGF G R D D A S S a A L S D H L A G L F O E D O L Y R I D N Y L G K

1310 1320 1330 1340 1350 1360 1370 1380 1390 1400AGATGGTGCAGAACCTGATGACCATACGCTTCGGCAACAAGATCCTCAGCTCGACGTGGAACCGCGAGAACATCGCCTCCGTGCTGATCACGT TCAAGGAE M V Q N L N T I R F G N K I L S S T U N R E N I A S V L I T F K E

1410 1420 1430 1440 1450 1460 1470 1480 1490 1500GCCCTTCGGCACGCAGGGTCGTGGCGGCTACTTCGACGAGTTCGGCATCATACGCGACGTTATGCAGAACCATCTGCTGCAGATCCTCTCGCTGGTGGCCP F G T Q G R G G Y F D E F G I I R D V M O N H L L a I L S L V A

1510 1520 1530 1540 1550 1560 1570 1580 1590 1600ATGGAGAAGCCGGTGAGCTGCCACCCGGACGACATTCGTGACGAGAAGGTCAAGGTGCTGAAGAGCAT CGAGGCCCTGACGCTGGATGACAT GGTGCTGGN E K P V S C H P D D I R D E K V K V L K S I E A L T L D D M V L

1610 1620 1630 1640 1650 1660 1670 1680 1690 1700GCCACTACCTGGGCAATCCGCAGGGCACAAACGATGATGCGCGCACGGGCTACGTGGAGGACCCCACCGTTAGCMCGATTCGAACACGCCCACCTACGCG a Y L G N P a G T N D D A R T G Y V E D P T V S N D S N T P T Y A

1710 1720 1730 1740 1750 1760 1770 1780 1790 1800CCTCGGCGTGCTCAAGATCAACAACGAGCGCTGGCAGGGAGTGCCCTTCATCCTGCGCTGCGGCMGGCGCTGAACGAGCGCAGGCGGAGGTGCGCATC

L G V L K I N N E R UW G V P F I L R C G K A L N E R K A E V R I

1810 1820 1830 1840 1850 1860 1870 1880 1890 1900CAGTACCAGGACGTGCCCGGCGACATCHTCGAGGGCAATACGAAGCGCAACGAGCTGGTCATCCGCGTCCAGCCGGGCGAGGCCCTGTACTTCAAGATGAO Y O D V P G D0I F E G N T K R N E L V I R V O P G E A L Y F K N

1910 1920 1930 1940 1950 1960 1970 1980 1990 2000TGACCAAGAGCCCCGGCATCACGTTCGACATCGAGGAGACGGAGCTGGACCTCACCTACGAGCACCGCTACAAGGACTCCTACCTGCCGGACGCGTACGAN T K S P G I T F D I E E T E L D L T Y E H R Y K D S Y L P D A Y E

2010 2020 2030 2040 2050 2060 2070 2080 2090 2100GCGTCTCATCCTCGACGTCTTCTGCGGCTCCCAGATGCACTTCGTCCGCTCGGACGAGCTGCGCGAGGCGTGGCGCATATTTACGCCCATTCTGCACAGR L I L D V F C G S O N H F V R S D E L R E A U R I F T P I L H a

2110 2120 2130 2140 2150 2160 2170 2180 2190 2200ATCGAGAAGGAGCACATTCGGCCAATCACCTACCAGTACGGATCGCGCGGTCCCAAGGAGGCGGACCGTAAGTGCGAGGAGAACAATTTCAAGTACTCCG

I E K E H I R P I TTOY G S R G PK E A D R K C E E N N F K Y S

2210 2220 2230 2240 2250 2260 2270 2280 2290 2300GCTCCTACAAGTGGCACGGCGGCAAGGCGGCCACGTCCAATCACTGAgcgtt tggagcoacacaatccccgggctgggt tcgcaggatactcctctctgtG S Y K: W H G G K: A A T 5 N H

FIG. 1. Nucleotide sequence of the G6pd gene in the OK93 line of D. melanogaster. Underlined regions represent amino acid differencesfrom the original Fouts et al. (12) Oregon-R sequence. In natural populations, intron 1 is highly variable in size (7).

and D. simulans, respectively. Using the D. melanogaster DISCUSSIONOK93 and D. simulans DPF88S lines as random single The most significant feature of the G6PD sequence compar-sequences, we observe 21 replacement and 35 silent differ-

ison iS the large number of ammo acid dffferences betweenences. Substituting these values for Si and Di in the four iso the lare n era of amino acid differences fbetweensimultaneous equations presented in ref. 4, teprmtrforulthneoHKA tesutinare est entedas re.4 the parameters these two closely related species. When the level of inter-for the HKA test are estimated as OR = 1.56 x 10-3; OS = specific replacement divergence (between OK93 and8.6910-3; t = 8.47; andf = 0.763. (In our treatment, D. DPF88S) is compared to the level of silent divergence andmelanogaster and D. simulans are designated as species A replacement polymorphism is compared to silent site poly-and B, respectively.) Using these four parameters to estimate morphism, a clear discordance emerges. The ratio of numberexpected numbers and variances in our adaptation of the of silent to replacement differences between species is 1.67;HKA test, we reject the null hypothesis (X2 = 7.90 with 2 df; the comparable ratio for polymorphism is 18. This discor-P < 0.020). dance is confirmed by both the likelihood ratio G test, which

Dow

nloa

ded

by g

uest

on

Dec

embe

r 26

, 202

0

Page 4: Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster andDrosophila sinulans, respectively. Both species

Proc. Natl. Acad. Sci. USA 90 (1993)

Table 2. List of all polymorphic and diverged nucleotidepositions in the sample of 32 D. melanogaster and 12 D.simulans lineagesSite MEL SIM Type588 T G 4-syn591 A C 4-syn606 A C 4-syn615 T C 2-syn619 G/T G Gly/Cys poly627 T C 3-syn642 C/G C 4-syn648 T T/C 2-syn657 G C 4-syn672 G/A G 2-syn766 G A Replacement767 T C Replacement771 T A Asp/Glu773 G C Ser/Thr775 A C Ile/Leu887 G T Lys/Asn914 T C 2-syn917 G T Glu/Asp935 T C 2-syn948 G A Gly/Ser957 C G Leu/Val962 T T/G 4-syn967 A G Gln/Arg968 G/A G 2-syn980 T G Ile/Met1001 C G 4-syn1016 T C 2-syn1080 G C Val/Leu1188 T/C T 3-syn1200 T C 4-syn1206 C/A A 4-syn1258 C A Replacement1260 G C Replacement1341 G A 2-syn1410 C/T T 4-syn1434 C/T C 2-syn1461 T/G C 4-syn1479 G C 4-syn1488 C G 4-syn1500 C G/C 4-syn1536 T C 3-syn1539 T A/C 4-syn

Site MEL SIM Type1572 G A 2-syn1626 C/T C 4-syn1631 A C Asn/Thr1635 T G Asp/Glu1680 T C 2-syn1683 G G/A 4-syn1686 C C/T 2-syn1713 C/T C 4-syn1752 C T/C 3-syn1797 C/T C 4-syn1817 C/T C Pro/Leu poly1833 G/A G 2-syn1838 A G Asn/Ser1839 T C 2-syn1863 C/A C 3-syn1893 T/C C 2-syn1920 C/A C 3-syn1923 G C 4-syn1947 G/A G 4-syn1956 C/A C 4-syn1965 C C/T 2-syn1992 C T 2-syn1995 G C 4-syn2004 T G 4-syn2046 C C/T 4-syn2079 A/C A 3-syn2082 T/C T 2-syn2085 G G/A 4-syn2107 A C Lys/Arg2108 A G 4-syn?2114 A G His/Arg2115 C A/G 4-syn2116 A G Ile/Val2118 T G 4-syn2121 G G/C 4-syn2121 G/T G 4-syn2151 T T/C 4-syn2169 T/C T 4-syn2187 T C 2-syn2220 C C/T 4-syn2243 A T His/Leu2244 C/T C 2-syn

syn, synonymous; MEL, D. melanogaster; SIM, D. simulans;poly, polymorphism.

assumes effectively complete linkage between replacementand silent sites, and our variation of the conservative HKAtest. A causal explanation for this discordance is that re-placement substitutions are not neutral but have been peri-odically selected through the populations of one, or bothspecies, as advantageous amino acid mutations. McDonaldand Kreitman (5) reached the same conclusion about theevolution of Adh based on sequence comparisons from D.melanogaster, D. simulans, and D. yakuba.To place the extent of amino acid replacement at G6PD in

perspective, it can be compared to other genes sequenced inthese species. Table 4 shows the amount of sequence diver-gence at silent and replacement sites in five genes (G6pd,Sod, Adh, Adh-dup, and ci) that have been sequenced in bothspecies. Four genes show about 11% divergence at silentsites, with an overall 2-fold variation due to the low value forAdh. The high rate of replacement substitution for cubitusinterruptis (ci) is difficult to assess, since no silent polymor-phism is seen, and only a single replacement polymorphismwas observed. The low level of polymorphism at ci can be

Table 3. Summary data for number of segregating nucleotidepolymorphisms in D. melanogaster and D. simulans, as well asthe number of fixed differences between the samples of thetwo species

No.

Replacement SilentPolymorphismD. melanogaster 2 22D. simulans 0 14Pooled 2 36

Fixed 21 26Divergence OK93

vs. DPF88S 21 35

Divergence OK93 vs. DPF88S shows the number of site differ-ences between two randomly selected lines (OK93 and DPF88S).

attributed to a hitchhiking effect of the fourth chromosome(20), making it difficult to assess the possibility that the highlevel of divergence at replacement sites is due to either lowfunctional constraint on the ci protein or fixation of adaptivemutations. The rate of amino acid replacement at G6pd istwice that of Adh-dup and five times that of Adh. Thecorresponding likelihood ratio tests for these two genes arenot significant. A test for Sod is not possible, since nopolymorphism data are available, although it is known thatthis locus carries a rare electrophoretic polymorphism.From the joint configuration of divergence and polymor-

phism for the two species, it is possible to estimate populationparameters using the sampling theory and estimators pro-posed by Sawyer and Hartl (22). These estimators are scaledagainst the haploid effective population size, and becauseG6pd is X chromosome-linked, all scaled values have beenadjusted relative to Adh using an X chromosome-linkedhaploid population size that is 75% of the autosomal value.From the distribution of monomorphic and polymorphicnucleotide frequencies ofthe four bases at regular silent sites(see ref. 22) in each species, it is estimated that the scaledsynonymous mutation rate is 0.013/Ne mutations per site pergeneration compared to 0.014/Ne for Adh. By using the jointconfiguration of monomorphic and polymorphic sites, thescaled time of divergence tdiv is estimated to be 5.11N,generations for G6pdand 1.21N. forAdh. From the HawaiianDrosophila, a silent rate of0.015 changes per site per millionyears has been proposed, and from this, it has been suggestedthat N, generations equal 0.645 million years (22, 23). There-fore, we estimate the D. melanogaster-simulans time ofdivergence at 3.3 million years. It should be noted that, whilethe configuration of silent site polymorphism estimates sim-ilar neutral mutation rates for both G6pd and Adh (which isconsistent with their similar levels of codon-usage bias, seeref. 24), G6pd exhibits a substantially higher level of diver-gence at silent sites. It is also possible to estimate theparameter y, which is a scaled selection coefficient (y = aN),reflecting the average amount of selection (c) on replacementchanges. When y > 0, positive selection is involved; when y

Table 4. Summary data for percent divergence between D.melanogaster and D. simulans (OK93 and DPF88S) forreplacement and effectively silent sites for G6pd and four otherloci examined in other studies

% divergenceLocus Replacement Silent Ref.

G6pd 1.73 10.5Adh 0.35 5.2 19Adh-dup 0.95 13.3 19ci 2.80 10.6 20Sod 0.00 10.4 21

7478 Evolution: Eanes et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 26

, 202

0

Page 5: Evidence G6pd in the Drosophila melanogaster and Drosophila · 6-phosphate dehydrogenase (G6pd) gene in Drosophila mela-nogaster andDrosophila sinulans, respectively. Both species

Proc. Natl. Acad. Sci. USA 90 (1993) 7479

< 0, selection is deleterious. For the D. melanogaster-simulans lineages, the Adh estimate of 'y is 1.72, whereas theadjusted X chromosome-linked estimate for G6pd is 7.45. Byusing an effective haploid X chromosome-linked populationsize of 2 x 106 (20), this reflects a positive selection coeffi-cient of oa = 3.72 x 10-6, associated with replacementchanges.Are there explanations for this discordance other than

adaptive amino acid substitutions? Our null hypothesis as-sumes all differences that we observe are completely neutral.One concern that could be raised is that most amino acidchanges are not neutral but are instead slightly deleterious,falling in the range of so-called nearly neutral variation (25).If the distinction of becoming slightly deleterious differen-tially affects the relative probability of fixation and contri-bution to polymorphism of a mutation (relative to the neutralcase), then this might explain the distortion of ratios. Forinstance, slightly deleterious mutations have a lower proba-bility of being fixed relative to the neutral case, but reducedsojourn times in the transient polymorphic phase may resultin even greater reduction in the contribution to polymor-phism. However, theoretical studies indicate that, relative tothe neutral case, the contribution of a slightly deleteriousmutation to polymorphism decreases more slowly than itscontribution to fixed differences (1, 26). Thus, our observa-tions for G6PD, while providing evidence against the neutralmodel, provide stronger evidence against the argument thatmost amino acid polymorphisms and fixations are slightlydeleterious. Finally, as McDonald and Kreitman (5) havepointed out, the consequences of varying population size aredifficult to evaluate. However, Sawyer and Hartl (22) haverecently argued that the 2 x 2 homogeneity test appears ingeneral to be statistically appropriate and (in the absence ofadaptive selection on the locus) should be robust to historicalpopulation size variation.

It was proposed by Whittam and Nei (27) in a critique ofMcDonald and Kreitman (5) that the excess of silent poly-morphisms at Adh could be inflated by balancing selection onthe F/S allozyme polymorphism, and the amino acid poly-morphism at G6PD might be under balancing selection aswell. However, while a balanced polymorphism will, becauseof hitchhiking, inflate the number of linked synonymous(neutral) site polymorphisms (28), its effect should be thesame on neutral replacement mutations.These tests assume that sequences have been sampled at

random with respect to different classes of variation. Our D.melanogaster sequences represent a quasirandom samplingof G6pd sequences from three continents. Lines werescreened for electrophoretic phenotype and selected to en-sure a representative sampling of electrophoretic alleles bylocality. The A/B allozyme polymorphism is common andwide spread and would have a very low probability of beingmissed in any global sample of 32 sequences (18). However,we have also included the polymorphism for a second com-mon electrophoretic allele, AFI, unique to Europe and north-ern Africa (Tunisia). Its frequency in those populations variesbetween 3 and 20%, and its exclusion would only haveincreased the significance of the outcome.

This study shows that, relative to the levels of contempo-rary amino acid polymorphism in the two species, the numberof amino acid changes that have arisen in these lineages isextraordinarily high, consistent with the model of episodicnatural selection envisioned by Gillespie (3). Either theseamino acid substitutions represent changes incorporated bynatural selection to maintain constant pentose shunt flux inthe face of changing environments or, alternatively, flux isunder selection to vary over evolutionary time. In some

human populations, natural selection has favored low pen-tose shunt function in environments associated with themalarial parasite, and the responding target of this naturalselection on the shunt has been G6PD (29). In D. melano-gaster, the replacement-site polymorphisms at both positionsare associated with reduced in vivo activity (6, 8, 9), and theA/B allozyme is polymorphic worldwide with reciprocatinglatitudinal clines in both northern and southern hemispheres(18). These results forcibly argue that much amino acidsubstitution at G6PD is the result of natural selection.

We thank Marty Kreitman for his advice and interest in this projectand for introducing us to PCR-facilitated DNA sequencing. Thanksare extended to Cedric Wesley in the early phases of the project andto Joanne Labate, Dan Dykhuizen, and members of Chip Aquadro'slab for reading early versions of the manuscript. Stanley Sawyerprovided us with an updated copy of his computer program toestimate population parameters, and Dave Guttman kindly assistedin further programming. This study was supported, in part, by GrantsBSR-907391 from the National Science Foundation and GM-45247from the National Institutes of Health. This is contribution no. 833from the Graduate Program in Ecology and Evolution, State Uni-versity of New York at Stony Brook.

1. Kimura, M. (1983) The Neutral Theory ofMolecular Evolution(Cambridge Univ. Press, New York).

2. Kimura, M. (1991) Jpn. J. Genet. 66, 367-386.3. Gillespie, J. H. (1991) The Causes of Molecular Evolution

(Oxford Univ. Press, New York).4. Hudson, R. R., Kreitman, M. & Aguadd, M. (1987) Genetics

116, 153-159.5. McDonald, J. H. & Kreitman, M. (1991) Nature (London) 351,

652-654.6. Eanes, W. F. (1984) Genetics 106, 95-107.7. Eanes, W. F., Ajioka, J. W., Hey, J. & Wesley, C. (1989) Mol.

Biol. Evol. 6, 384-397.8. Eanes, W. F., Katona, L. & Longtine, M. (1990) Genetics 125,

845-853.9. Labate, J. & Eanes, W. F. (1992) Genetics 132, 783-787.

10. Persson, B., Jormvall, H., Wood, I. & Jeffrey, J. (1991) Eur. J.Biochem. 198, 485-491.

11. Geer, W., Lindel, D. L. & Lindel, D. M. (1979) Biochem.Genet. 17, 881-896.

12. Fouts, D., Ganguly, R., Gutierrez, A. G., Lucchesi, J. C. &Manning, J. E. (1988) Gene 35, 261-275.

13. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi,R. G., Horn, G. T., Mullis, K. B. & Erlich, H. A. (1988)Science 239, 487-491.

14. McGinnis, W., Shermoen, A. W. & Beckendorf, S. K. (1983)Cell 34, 75-84.

15. Higuchi, R. G. & Ochman, H. (1989) Nucleic Acids Res. 17,5865.

16. Sanger, F. S., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.Acad. Sci. USA 74, 5463-5467.

17. Sheen, J. & Seed, B. (1988) BioTechniques 6, 942-944.18. Oakeshott, J. G., Chambers, G. K., Gibson, J. B., Eanes,

W. F. & Willcocks, D. A. (1983) Heredity 50, 67-72.19. Kreitman, M. & Hudson, R. R. (1991) Genetics 127, 565-582.20. Berry, A. J., Ajioka, J. W. & Kreitman, M. (1991) Genetics

129, 1111-1117.21. Kwiatowski, J., Skarecky, D. & Ayala, F. J. (1992) Mol. Phyl.

Evol. 1, 72-82.22. Sawyer, S. A. & Hartl, D. L. (1992) Genetics 132, 1161-1176.23. Rowan, R. G. & Hunt, J. A. (1991) Mol. Biol. Evol. 8, 49-70.24. Shields, D. C., Sharp, P. M., Higgins, D. G. & Wright, F.

(1988) Mol. Biol. Evol. 5, 704-716.25. Ohta, T. (1973) Nature (London) 246, 96-98.26. Ohta, T. & Tachida, H. (1990) Genetics 126, 219-229.27. Whittam, T. S. & Nei, M. (1991) Nature (London) 354, 115-

116.28. Kaplan, N. L., Darden, T. & Hudson, R. R. (1988) Genetics

120, 819-829.29. Beutler, E. (1991) N. Engl. J. Med. 324, 169-174.

Evolution: Eanes et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 26

, 202

0