triose-phosphate isomerasegene

4
Proc. Natl. Acad. Sci. USA Vol. 92, pp. 8503-8506, August 1995 Evolution Evidence against the exon theory of genes derived from the triose-phosphate isomerase gene (intron evolution/origin of genes/exon shuffling/insect genes/phylogenetic inference) JAN KWIATOWSKI*t, MICHAL KRAWCZYK*t, MACIEJ KORNACKI*, KEVIN BAILEYt, AND FRANcIsco J. AYALAtt *Institute of Botany, Warsaw University, Al. Ujazdowskie 4, 00-478 Warsaw, Poland; and tDepartment of Ecology and Evolutionary Biology, University of California, Irvine, CA 92717 Contributed by Francisco J. Ayala, June 1, 1995 ABSTRACT The exon theory of genes proposes that the introns of protein-encoding nuclear genes are remnants of the DNA spacers between ancient minigenes. The discovery of an intron at a predicted position in the triose-phosphate isomer- ase (EC 5.3.1.1) gene of Cukx mosquitoes has been hailed as an evidential pillar of the theory. We have found that that intron is also present in Aedes mosquitoes, which are closely related to Culex, but not in the phylogenetically more distant Anopheks, nor in the fly CaUliphora vicina, nor in the moth Spodoptera littoralis. The presence of this intron in Culex and Aedes is parsimoniously explained as the result of an insertion in a recent common ancestor of these two species rather than as the remnant of an ancient intron. The absence of the intron in 19 species of very diverse organisms requires at least 10 independent evolutionary losses in order to be consistent with the exon theory. The exon theory of genes, also known as the "introns-early" theory, proposes that the exons of protein-coding genes are remnants of ancient minigenes and the introns derive from the spacers between them (1-3). There are two ways of testing the exon theory: functional and evolutionary. Functional tests investigate whether introns divide genes into segments that code for functional subunits of protein structure. Evolutionary tests ascertain whether the pattern of intron distribution across taxa conforms to the phylogenetic relationships among the taxa. Functional tests that yield results consistent with the theory may not be definitive, since constraints may exist at the protein or genome levels that restrict, or make more likely, the insertion of introns at certain sites. A bulwark of the exon theory is the triose-phosphate isomerase (EC 5.3.1.1) gene (Tpi), a gene of ancient origin. The intron distribution of Tpi has been argued to be nonran- dom (4-6) and, moreover, had elicited the prediction that an additional intron would be found at a site that would split the long exon III (5). The discovery of such an intron in the mosquito, Culex tarsalis (7), was, accordingly, hailed as strong support for the exon theory. We have investigated the presence of this intron in other dipterans and one lepidopteran and conclude that its taxonomic distribution is not consistent with the exon theory.§ MATERIALS AND METHODS The four insects studied are, in increasing phylogenetic dis- tance from Culex, the mosquitoes Aedes sp. and Anopheles sp., the fly Calliphora vicina, and the moth Spodoptera littoralis. The DNA of these species was isolated, amplified, and se- quenced by described methods (8). The PCR products were cloned into the PCRII vector according to the manufacturer's (Invitrogen) protocol. Two primers, 5'-CGTKGGNG- GNAACTGGAAGATGAAYGG-3' (sense) and 5'-CGCT- CYGAGTGBCCCAGGAYSACCCA-3' (antisense) (K = G, T; S = G, C; Y = C, T; B = G, C, T; N = G, A, C, T), were derived from the conserved protein fragments VGGNWK- MNG and WVILGHSER, respectively. The DNA sequences of other organisms were obtained from the GenBank data base maintained at IUBio archives at Indiana University. The DNA sequences were aligned with the CLUSTAL V program (9). A maximum parsimony consensus tree (100 bootstrap resamplings) was obtained by consecutive execution of the SEQBOOT, PROTPARS, and CONSENSE programs of the PHYLIP 3.5 phylogenetic inference package (10). The branch lengths of the tree were based on accepted point mutation values (ref. 11; obtained with the PROTDIST program of PHYLIP) and were calculated with the FITCH program of PHYLIP. RESULTS Fig. 1 gives the alignment of 19 triose-phosphate isomerase protein sequences. The sequences for the mosquitoes Aedes andAnopheles, the f ly Calliphora, and the moth Spodoptera are only partial, corresponding to DNA fragments that include the region where intron 5 is found in Culex mosquitoes. This intron is present inAedes, a close relative of Culex, but not in the more distantly related Anopheles mosquitoes, nor in the flies Dro- sophila and Calliphora, nor in the moth Spodoptera. The absence of intron 5 from mosquitoes other than Culex and Aedes has been reported earlier (ref. 24; M. Tyshenko and V. K. Walker, personal communication). Fig. 2 is a consensus tree of the 19 protein sequences, which corresponds well with phylogenies obtained by various meth- ods, including ribosomal RNA gene sequences (25, 26). As noted earlier by Palmer and Logsdon (27), species in the early branches of the eukaryotic tree (Giardia, Trypanosoma, and Leishmania) have no Tpi introns. Plasmodium has one intron, the yeasts Saccharomyces and Schizosaccharomyces have none, but the mold Aspergillus has five. Introns are more abundant in animals and plants, with introns 3, 7, 8, 10, and 14 found in both, while introns 2, 13, and 15 are found in plants but not animals, and intron 12 is found only in animals. Drosophila has only one intron (at position 12). Culex also has only one intron, at position 5, shared with Aedes but not Anopheles, as noted. DISCUSSION The exon theory of genes proposes that modern protein- encoding genes were assembled from ancient minigenes, which correspond to modern exons, whereas the introns would be remnants of DNA spacers that separated the ancestral mini- Abbreviation: Tpi, triose-phosphate isomerase gene. tTo whom reprint requests should be addressed. §The Tpi fragment sequences of Anopheles sp., Aedes sp., Calliphora vicina, and Spodoptera littoralis have been deposited in the GenBank data base (accession nos. L38617, L42109, L38975, and L39011, respectively). 8503 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Transcript of triose-phosphate isomerasegene

Page 1: triose-phosphate isomerasegene

Proc. Natl. Acad. Sci. USAVol. 92, pp. 8503-8506, August 1995Evolution

Evidence against the exon theory of genes derived from thetriose-phosphate isomerase gene

(intron evolution/origin of genes/exon shuffling/insect genes/phylogenetic inference)

JAN KWIATOWSKI*t, MICHAL KRAWCZYK*t, MACIEJ KORNACKI*, KEVIN BAILEYt, AND FRANcIsco J. AYALAtt*Institute of Botany, Warsaw University, Al. Ujazdowskie 4, 00-478 Warsaw, Poland; and tDepartment of Ecology and Evolutionary Biology,University of California, Irvine, CA 92717

Contributed by Francisco J. Ayala, June 1, 1995

ABSTRACT The exon theory of genes proposes that theintrons of protein-encoding nuclear genes are remnants of theDNA spacers between ancient minigenes. The discovery of anintron at a predicted position in the triose-phosphate isomer-ase (EC 5.3.1.1) gene of Cukx mosquitoes has been hailed asan evidential pillar of the theory. We have found that thatintron is also present in Aedes mosquitoes, which are closelyrelated to Culex, but not in the phylogenetically more distantAnopheks, nor in the fly CaUliphora vicina, nor in the mothSpodoptera littoralis. The presence of this intron in Culex andAedes is parsimoniously explained as the result of an insertionin a recent common ancestor of these two species rather thanas the remnant ofan ancient intron. The absence of the intronin 19 species of very diverse organisms requires at least 10independent evolutionary losses in order to be consistent withthe exon theory.

The exon theory of genes, also known as the "introns-early"theory, proposes that the exons of protein-coding genes areremnants of ancient minigenes and the introns derive from thespacers between them (1-3). There are two ways of testing theexon theory: functional and evolutionary. Functional testsinvestigate whether introns divide genes into segments thatcode for functional subunits of protein structure. Evolutionarytests ascertain whether the pattern of intron distribution acrosstaxa conforms to the phylogenetic relationships among thetaxa. Functional tests that yield results consistent with thetheory may not be definitive, since constraints may exist atthe protein or genome levels that restrict, or make more likely,the insertion of introns at certain sites.A bulwark of the exon theory is the triose-phosphate

isomerase (EC 5.3.1.1) gene (Tpi), a gene of ancient origin.The intron distribution of Tpi has been argued to be nonran-dom (4-6) and, moreover, had elicited the prediction that anadditional intron would be found at a site that would split thelong exon III (5). The discovery of such an intron in themosquito, Culex tarsalis (7), was, accordingly, hailed as strongsupport for the exon theory. We have investigated the presenceof this intron in other dipterans and one lepidopteran andconclude that its taxonomic distribution is not consistent withthe exon theory.§

MATERIALS AND METHODS

The four insects studied are, in increasing phylogenetic dis-tance from Culex, the mosquitoes Aedes sp. and Anopheles sp.,the fly Calliphora vicina, and the moth Spodoptera littoralis.The DNA of these species was isolated, amplified, and se-quenced by described methods (8). The PCR products werecloned into the PCRII vector according to the manufacturer's(Invitrogen) protocol. Two primers, 5'-CGTKGGNG-

GNAACTGGAAGATGAAYGG-3' (sense) and 5'-CGCT-CYGAGTGBCCCAGGAYSACCCA-3' (antisense) (K = G,T; S = G, C; Y = C, T; B = G, C, T; N = G, A, C, T), werederived from the conserved protein fragments VGGNWK-MNG and WVILGHSER, respectively. The DNA sequencesof other organisms were obtained from the GenBank data basemaintained at IUBio archives at Indiana University.The DNA sequences were aligned with the CLUSTAL V

program (9). A maximum parsimony consensus tree (100bootstrap resamplings) was obtained by consecutive executionof the SEQBOOT, PROTPARS, and CONSENSE programs of thePHYLIP 3.5 phylogenetic inference package (10). The branchlengths of the tree were based on accepted point mutationvalues (ref. 11; obtained with the PROTDIST program of PHYLIP)and were calculated with the FITCH program of PHYLIP.

RESULTSFig. 1 gives the alignment of 19 triose-phosphate isomeraseprotein sequences. The sequences for the mosquitoes AedesandAnopheles, the fly Calliphora, and the moth Spodoptera areonly partial, corresponding to DNA fragments that include theregion where intron 5 is found in Culex mosquitoes. This intronis present inAedes, a close relative of Culex, but not in the moredistantly related Anopheles mosquitoes, nor in the flies Dro-sophila and Calliphora, nor in the moth Spodoptera. Theabsence of intron 5 from mosquitoes other than Culex andAedes has been reported earlier (ref. 24; M. Tyshenko andV. K. Walker, personal communication).

Fig. 2 is a consensus tree of the 19 protein sequences, whichcorresponds well with phylogenies obtained by various meth-ods, including ribosomal RNA gene sequences (25, 26). Asnoted earlier by Palmer and Logsdon (27), species in the earlybranches of the eukaryotic tree (Giardia, Trypanosoma, andLeishmania) have no Tpi introns. Plasmodium has one intron,the yeasts Saccharomyces and Schizosaccharomyces have none,but the mold Aspergillus has five. Introns are more abundantin animals and plants, with introns 3, 7, 8, 10, and 14 found inboth, while introns 2, 13, and 15 are found in plants but notanimals, and intron 12 is found only in animals. Drosophila hasonly one intron (at position 12). Culex also has only one intron,at position 5, shared with Aedes but not Anopheles, as noted.

DISCUSSIONThe exon theory of genes proposes that modern protein-encoding genes were assembled from ancient minigenes, whichcorrespond to modern exons, whereas the introns would beremnants of DNA spacers that separated the ancestral mini-

Abbreviation: Tpi, triose-phosphate isomerase gene.tTo whom reprint requests should be addressed.§The Tpi fragment sequences of Anopheles sp., Aedes sp., Calliphoravicina, and Spodoptera littoralis have been deposited in the GenBankdata base (accession nos. L38617, L42109, L38975, and L39011,respectively).

8503

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: triose-phosphate isomerasegene

8504 Evolution: Kwiatowski et al. Po.Ni.Aa.Si S 2(95

Cu LexAedesAnopheLesCaL LiphoraDrosophi LaSpodopteraHumanMonkeyChickenSch istosomaMaizeRiceAspergi LLusSaccharomycesSch izosaccharomycesP LasmodiurnLeishmaniaTrypanosomaGi ardi a

1(2),2 3(1),4M--CRKFCVC CNWKMNCDKA SIADLCKVLT TCPLNADT-- EVVVCCPAPY

... .TE.... ..........

.. .TE. .. .T.S A....DPN........* ..Q ..EIA.T.. CAT.DDN... .I.L.. .FI.

.... O....K ...EIA.T.S SAA.DPN ....I.... I.*.K QVTEIVET.K K. ..DSNV.. :..V.1

.APS.. F. ..F. ..R.0 .LCE.ICT.N AAKVP. .CAP.TA.

.APS. ..F......R.O NLGE.ICT.N AAKvP.*L.....CAP.TA..A-.F. .....K .LGE.IHT.N CAK.S. .CAS1

.SCS....F......SRD DNDK.L.L.S EAKFDDN... ...LIAP.SVF.....F.....C;:.TTD QVEKIV.T.N E.OVPPSDVV .. ..SP.YVF.....F... ..-:. TTD OVDKIV.I.N E.QIASTDVV :.:...SP.YVF.P.-...F.. ..F.;.NAE .TTS1I.N.N SAN.DKS-.V ....SP..L..A.-.T.F.. ..F.L..S.Q ..KEIVER.N .ASIPEN-.V ... ICP..T..A.-. ..F.. ..F. .....SLE .MKTIIEC.N .1K..VCDV. .T.IFPQNM..A.-. .YF.A A... C. .TLE . .KS.TNSFN NLDFDPSKL. LI.. .FPVSVH.SAKPOPIAA A... C. .TT. . .EK.VO.FN EHTISH. -.V OC. .APTFVK.S.KPQPIAA A... C. .SOO .LSE.IDLFN STSI.H.-.V OC. .ASTFVH..PA.RPFI. ..F.C..SLD F.KSHVAAIA AHKIPDS-.V D..IAPS.VH

5LTLARSQLP- DSVCVAAQN:C....L... A...D....

.5.....L... ETIC....

.SY. .ELF.. AKFNIS...

.MY. .NL... CELCL.G. .A

.EYVQ.IV.. NTIN. .CT..IIDF. .OK.D. PKIA....IDF. .QK.D. PKIA.....DF. .OK.D. AKIC.....HEI.KS.K. KEIH.....PvvK. ..R. OEFH.....PVVK. .. .R. PEIQO.....L0. .EVAN. KEIC...VI.DYSV.LVKK PO.T.C. .. .A.ITT.Q.VK. KDIC.C ..V.YDHT.KL.OS -KFSTCI..VIP.VOAK.RN PKYVIS.E.A.AMTKER.SH PKFVI. ....A.ST.IAANTS KOLRI. ....V

6(1)7(2)YKVPKCAFIC EISPAMLKDL NICWVILCHS

A...A . ...... L....A .. . . . . . . . L..... .... ... .... V CAN

A.... .......ICGAD ....

W.S..I..I CAN...TN.... G.1..C CAT..V.......TN ... C.1::....C CAT..V........... .. ..GAA ....

......~...IR.I CCD....WVKKC... V.:AE..VN. CVP....WVKKC..;..:AE. .VN. S.P....FDK.N......VQO.REA ..D.T....LKAS.....N.VDQI..V CAK....FDKKN..Y.. .N.AQS.I.A A.TYTLT...S.FCN.SY.. .V.AEIA....EY..I..FIAKS-.....V.MPI ...1 CVH....IAKS-.....V.LPI ...F CVN.IV.....LECN. .W.. .T.VE. .Q.M CLKH. .V...

CuLexDrosophilaHumanMonkeyChickenSch istosomaMaizeRiceAspergi ILusSaccharomycesSch izosaccharomycesPLasmodiumLeishmaniaTrypanosomaCiardia

8 9 10(1) 11(2) 128 13ERRAIFCESD ELIADKVAHA LAECLKVIAC ICETLOEREA COTEAVCFRO TKAIADKV-K ---DWSNVVI AYEPVWAICT GKTASPEQAQ EVHAALRKWF......A ...E.AE ..........E...K.NE.VA.. MC.Y.Q.I.....K...V.T.... ..5.0.

...HV.. Co....C...G....K.D.. 1.K.V.E...V . K..L ..... .. TOQ" EKC..LHV C.O .... C...G....K.D.. 1.K.V.E. ..V..,N ......K..L ..... .. T.'0 EK..C.L

.HV CO.GQ.C.....G....K.D.. 1.K.V.E ....:N......K..L ...... ..T. 0:i:i: .EK..C.LE... ..Q....S......S...S NK..E..v.. L....N.l.. SADE.KR..V ...... .V.T.0. . NF...

...LL..N ::fVGC. Y- ....V. V.... EQ... .S.MD.VAA....E.I....... V ...... .V.T.A.. S. D.L511SL .N FVC....Y.S ....V....EQ..S .S.MD.VAA. .SERI .....T...V ...... .V.T1D.. :~:DGC..LV. LK. T. P*R.TKA. IEC. .Q..F.P..... NK.ID.VT.. LN.A.KEL.- sKEO.AK...... V.TT ......S1..L

...SY.H.D. KF....TKF. GCO.VC..L....E.KK. .K.LD.VE.. LN.VLEE......T...V .......LA.T..D.. DJ..SI..FL...K. V.TKF. .EQ..T.v.....AD... NE.IT.vv.. LN.. -.. ..OQN..KI .C ..... .T..E....EI...A

...KY.H.T. .DVRE.LOAS .KNN..AVV. F..S.EQ..O NK.IE.ITK. V..FV.LI.- ..DNFD..IL ....L......T....L..KEI..iv

...TYY..T. .IV.Q..SE. CKO.FM .0....Q... N..AK.VLS. .S...A.L.T .KDA.NO..L ...... .V.T......LL....V

....YY..TN .IV...A. V.S.FM ........S .R.AV.VLT. IA...K.L.. .KA..AK ........ .V.T.Q. LI...S.V

...R.M..T. .OS.K.AKR. .EK.MT..F. V....D..K. NR.ME.NIA. LE.LCKELCE SKML.KE......S... .VV.T....E ...VC...

Culex TENVSADVSADrosophiLa SD.I.KE...Human KS....DA.AOmonkey KS....EA.AQChicken KSH. .DA.AOSchistosoma KT.APNC.DEMaize KT.A.PE.AERice AA. . ..E.AEAspergiLtus KDAI..EAAESaccharomyces ASKLCDKAASSchi zosaccharomyces .NKLCG.S.AEP LasmodiurnLeishmaniaTrypanosomaCiardia

KDTCCEKOANS.. ICT. .A.SSKICG .. .ACA. K.C. ECAQ

AIRIQYCCSV TAANCRELAA KPDIDCFLVC CASLKPEF-1 QIVNA---RO -SL ..... ..AK...K ...... .... D.I .....STG..T.K...S O..V .........V D.I...K.

ST..I. C.T.K...S Q..V .........V 0.1...K.ST. .1I. G. K...S OH.V .........V D.I....KH

K.. .....K...Q OH.V .........T E.CK.... RST..lI......K.... Q..V..... ..... D.1..ATVKS AST. .1 .G..K... V...v... ...... D.I.SATVKS ANT..-I....SEK..KD..K EA ..... ...A'.~'.V D....R.L--EL..L....A NCS.AVTFKD .A.'V..........V D.I.SRN.--CL.VI .. G.K.FLK FH ..... ....PT N...VHSL--

NTE..SS.IQ QE.....N....ES..V D.IKSAM.--KL..L....N...AAT.Y...N.........R D.ID.T...-EL..L....NCK.A.T.YO OR.VN.........V D.IK.T..-.H...lI....A NCS..EK.CO C.N..........M TMIDILTKTR T

FIG. 1. Alignment of the amino acid sequence of triose-phosphate isomerase in 19 species. Positions identical to those of Culex are marked bydots. Intron positions are numbered and denoted by shading over two residues, when the site falls between two codons, or over one residue, whenthe codon is split, in which case the number in parentheses refers to the base in the triplet just before the intron. Proximal intron sites separatedby 10 or fewer nucleotides are spaced by commas. Sources for published sequences are as follows: Culex (7), Drosophila (12), human (GenBankaccession no. X69723), rhesus monkey (13), chicken (4), Schistosoma (14), rice (15), maize (16), Aspergillus (17), Saccharomyces (18),Schizosaccharomyces (19), Plasmodium (20), Leishmania (21), Trypanosoma (22), and Giardia (23).

genes. The absence of introns in eubacteria, archaebacteria,and several protist phyla would be due to their complete lossthrough evolutionary time in these groups of organisms. Manyintrons would also have been lost in genes from other protistsand multicellular organisms. An alternative hypothesis (the"insertional" or "introns-late" theory) proposes that splitgenes have arisen from continuous genes by the insertion ofintrons. According to the introns-late theory, spliceosomalintrons were never present in the ancestors of groups oforganisms that now lack them; in other groups, introns havebecome inserted and occasionally deleted throughout theirevolutionary history.The claim that introns and exons derive from minigenes

preexisting the divergence of bacteria and eukaryotes demandsthat the exon theory be tested in ancient genes, present in allsorts of organisms, such as Tpi-hence, the significance at-tached to the predicted discovery of Tpi intron 5 in Culexmosquitoes, splitting a DNA segment too long to correspondto only one ancient minigene.Our results are not, however, consistent with the exon

theory. The presence of intron 5 in Culex and Aedes but not in

any of the other species represented in Fig. 2 is parsimoniouslyinterpreted as the result of an evolutionary insertion at theposition indicated by the arrow in Fig. 2. If this intron werepresent in the ancestor of all species shown in the figure, itsabsence from 17 of the 19 species would require a minimum of10 independent evolutionary deletions of the intron.More generally, the pattern of presence/absence of introns

in the Tpi gene (Fig. 1) is more consistent with a dynamicprocess of occasional evolutionary insertions and deletionsthan with the exon theory.The exon theory implies that introns at all 16 positions were

present in the ancestral organisms, but each one was lost innumerous evolutionary events independently occurring ineach of numerous lineages. The number of postulated eventscan be reduced by claiming that the pairs 1/2, 3/4, 12/13, and15/16 represent each only one original intron that has slid insome lineages (17). The occurrence of intron slippage remainsto be demonstrated and, in the case of Tpi, implies that it hasoccurred along as many as 7 or 9 nucleotides (for introns 15/16and 12/13, respectively) that are as many as occur in someexons found in protein-coding genes (28).

Proc. Natl. Acad. Sci. USA 92 (1995)

Page 3: triose-phosphate isomerasegene

Proc. Natl. Acad. Sci. USA 92 (1995) 8505

+

L;CulexLAedesAnophelesCalliphora

. Drosophila

.SpodopteraHuman

LMonkeyChicken. SchistosomaMaize

Aspergi 11 us_ Saccharomyces

. SchizosaccharomycesPlasmodi um

_ Leishmania. Trypanosoma- Giardia

FIG. 2. Phylogenetic tree of 19 species derived from the Tpi proteinsequences. The arrow indicates the postulated evolutionary insertionof intron 5. Branch lengths reflect genetic distances between se-

quences. The top six (insect) branches are proportionally based on only77 amino acids. We first obtained a phylogeny that only included allcomplete sequences. The insect phylogeny, based on the 77 aminoacids available for all six insect species, was separately obtained andthen incorporated into the larger phylogeny.

The insertional theory accounts for introns as the result ofa dynamic process of occasional insertions and deletions. Aprediction made by this theory is that introns will be predom-inantly distributed in phylogenetic clusters, which allows forconsiderable elasticity and makes the theory all but untestable.Nevertheless, the number of independent evolutionary eventsrequired by this theory for Tpi is much smaller than the numberrequired by the exon theory. In the case of intron 5, theinsertional theory requires only one event (insertion in theCulex/Aedes ancestor) rather than at least 10 independentdeletions. Other introns are shared by phylogenetically relatedorganisms and can be explained by insertions in a commonancestor. Introns at sites 3, 7, 8, 10, and 14 are present in plantsand animals and may have been inserted before the divergenceof these two kingdoms. Intron 8 is present in the moldAspergillus as well, which is consistent with its insertion beforethe divergence of multicellular organisms (although its ab-sence from the two yeast species is also consistent with an

independent insertion in the mold lineage). Introns 2, 13, and15 are specific to plants, but only two species are representedin Fig. 1. Intron 12 may have been inserted early in theevolution of metazoans, since it is present in Drosophila as wellas in the vertebrates. The mold Aspergillus has four introns inaddition to the one at site 8, none of which is present in the twoyeasts. Whether any or all of these represent early insertion in thefungi followed by loss in yeasts, or later insertion only in the modlineage, is equally plausible on the basis of the data in Fig. 1.

Insects usually have fewer introns in protein-coding genesthan vertebrates, which often share introns at specific sites,such as 3, 7, 8, 10, 12, and 14 for Tpi. This difference may beaccounted for as a consequence of a distinctive high incidenceof insertions early in the evolution of chordates. But it may alsobe, as often assumed, that many of these introns predate thedivergence of insects and vertebrates but were mostly lost ininsects (8, 28). The matter can only be settled after extensivedata are collected for other animal phyla. In any case, Tpi

conforms to the general pattern. Only one intron occurs inDrosophila, at site 12 where an intron is also present invertebrates; and only one intron occurs in Culex, but at site 5,shared with Aedes but not with any other organisms. Introninsertions have rarely been detected in insects, and thus the Tpigain of intron 5 in the Aedes/Culex lineage is particularlynoteworthy. In dipterans, an intron insertion has been reportedin a globin gene of the midge Chironomus (29), and we haveuncovered in our laboratory an Xdh intron inserted in theDrosophila willistoni lineage, which is absent in other lineagesof the subgenus Sophophora to which D. willistoni belongs, aswell as in other Drosophila subgenera and related genera, suchas Chymomyza and the medfly Ceratitis capitata (F. Rodriguez-Trelles and R. Tarrio, personal communication).Logsdon et al. (30) have recently sequenced the Tpi gene in

the insect Heliothis virescens, the nematode Caenorhabditiselegans, and the fungus Coprinus cinereus and have foundintrons at seven novel positions. They argue that their analysisshowing the distribution of 21 intron sites in 20 differentspecies is inconsistent not only with the exon theory of genesbut also with the derivative notion that protein-coding genesare assembled by exon "shuffling." They argue that the largenumber of intron sites and their distribution conform to arandom model of intron insertion. The exon-shuffling hypoth-esis maintains that introns are likely sites for intragenicrecombination and thus contribute to the evolution of chimericgenes from preexisting sets of exons (31). This hypothesis isconsistent with the exon theory of genes, but it does notspecifically support it, since it is also consistent with theinsertional theory. Exon shuffling may be a method of creatingnew genes, whether or not introns derive from the spacersbetween the minigenes of primeval organisms. Support forexon shuffling has been derived from statistical analysis ofexon sequences (32) and from the tendency of intron positionsto correspond to the recombinant junctions in some chimericsequences, even though these may not be ancient (31, 33-35).Our analysis showing 16 intron sites within a limited number

of species, as well as the similar results of Logsdon et al. (30),does not particularly favor the existence at the genome orprotein level of constraints that would allow for the insertionof introns at only a few specific sites within a gene. As Logsdonet al. (30) have argued, the finding of seven novel intronpositions in the Tpi genes of just three, although diverse,organisms implies that more and more intron positions will befound as more Tpi genes are sequenced from remotely relatedeukaryotes. Stoltzfus et al. (31) have, moreover, concludedfrom the analysis of Tpi and three other ancient genes thatthere is no statistically significant correspondence betweenexons and units of protein structure, contrary to earlier claims(4-6).

We thank Elzbieta Wegner and Maciej Pszczolkowski for insectspecies; Joseph Felsenstein for making available the PHYLIP package;Jeffrey Palmer for making available the paper by J. M. Logsdon et al.(30) prior to publication; and Walter Fitch, Richard Hudson, JeffreyPalmer, and Virginia Walker for comments about the manuscript. Thiswork was supported by Grant 6P20303504 from the Committee forScientific Research (Poland) to J.K. and GM42397 from the NationalInstitutes of Health to F.J.A.

1. Darnell, J. E. J. (1978) Science 202, 1257-1260.2. Doolittle, W. F. (1978) Nature (London) 272, 581-582.3. Gilbert, W. (1987) Cold Spring Harbor Symp. Quant. Bio. 52,

901-905.4. Straus, D. & Gilbert, W. (1985) Mol. Cell. Biol. 5, 3497-3506.5. Gilbert, W., Marchionni, M. & McKnight, G. (1986) Cell 46,

151-154.6. Gilbert, W. & Glynias, M. (1993) Gene 135, 137-144.7. Tittiger, C., Whyard, S. & Walker, V. K. (1993) Nature (London)

361, 470-472.

Evolution: Kwiatowski et aL

Page 4: triose-phosphate isomerasegene

8506 Evolution: Kwiatowski et al.

8. Kwiatowski, J., Skarecky, D. & Ayala, F. J. (1992) Moi. Phylo-genet. Evol. 1, 72-82.

9. Higgins, D. G., Bleasby, A. J. & Fuchs, R. (1992) Comput. Appl.Biosci. 8, 189-191.

10. Felsenstein, J. (1989) Cladistics, 5, 164-166.11. Dayhoff, M. D. (1978) Atlas of Protein Sequences and Structure

(Natl. Biomed. Res. Found., Washington, DC).12. Shaw-Lee, R. L., Lissemore, J. L. & Sullivan, D. T. (1991) Mol.

Gen. Genet. 230, 225-229.13. Old, S. E. & Mohrenweiser, H. W. (1988) Nucleic Acids Res. 16,

9055.14. Reis, M. G. I., Davis, R. E., Singh, H. & Shoemaker, C. B. (1993)

Mol. Biochem. Parasitol. 59, 235-242.15. Xu, Y., Harris-Haller, L. W., McCollum, J. C., Hardin, S. H. &

Hall, T. C. (1993) Plant Physiol. 102, 697.16. Marchionni, M. & Gilbert, W. (1986) Cell 46, 133-141.17. McKnight, G. L., O'Hara, P. J. & Parker, M. L. (1986) Cell 46,

143-147.18. Albert, T. & Kawasaki, G. (1982)J. Mol. Appl. Genet. 1, 419-434.19. Russell, P. R. (1985) Gene 40, 125-130.20. Ranie, J., Kumar, V. P. & Balaram, H. (1993) Mol. Biochem.

Parasitol. 61, 159-169.21. Kohl, L., Callens, M., Wierenga, R. K., Oppeerdoes, F. R. &

Michels, P. A. M. (1994) Eur. J. Biochem. 220, 331-338.

Proc. Natl. Acad. Sci. USA 92 (1995)

22. Swinkels, B. W., Gibson, W. C., Osinga, K. A., Kramer, R.,Veeneman, G. H., van Boom, J. H. & Borst, P. (1986) EMBO J.5, 1291-1298.

23. Mowatt, M. R., Weinbach, E. C., Howard, T. C. & Nash, T. E.(1994) Exp. Parasitol. 78, 85-92.

24. Hurst, L. D. (1994) Nature (London) 371, 381-382.25. Smothers, J. F., von Dohlen, C. D., Smith, L. H. J. & Spall, R. D.

(1994) Science 265, 1719-1721.26. Sogin, M. L. (1991) Curr. Opin. Genet. Dev. 1, 457-463.27. Palmer, J. D. & Logsdon, J. M. J. (1991) Curr. Opin. Genet. Dev.

1, 470-477.28. Hawkins, J. D. (1988) Nucleic Acids Res. 16, 9893-9908.29. Kao, W.-Y., Trewitt, P. M. & Bergtrom, G. (1994) J. Mol. Evol.

38, 241-249.30. Logsdon, J. M., Jr., Tyshenko, M. G., Dixon, C., D.-Jafari, J.,

Walker, V. K. & Palmer, J. D. (1995) Proc. Natl. Acad. Sci. USA92, 8507-8511.

31. Stoltzfus, A., Spencer, D. F., Zuker, M., Logsdon, J. M. J. &Doolittle, W. F. (1994) Science 265, 202-207.

32. Dorit, R. L., Schoenbach, L. & Gilbert, W. (1990) Science 250,1377-1382.

33. Doolittle, R. F. (1985) Trends Biochem. Sci. 10, 233-237.34. Doolittle, R. F. (1991) Science 253, 677-680.35. Patthy, L. (1991) BioEssays 13, 187-192.