PII: S0022-2836(03)00057-3

13
Characterisation of a Non-canonical Genetic Code in the Oxymonad Streblomastix strix Patrick J. Keeling* and Brian S. Leander Department of Botany Canadian Institute for Advanced Research University of British Columbia 3529-6270 University Boulevard, Vancouver BC, Canada V6T 1Z4 The genetic code is one of the most highly conserved characters in living organisms. Only a small number of genomes have evolved slight varia- tions on the code, and these non-canonical codes are instrumental in understanding the selective pressures maintaining the code. Here, we describe a new case of a non-canonical genetic code from the oxymonad flagellate Streblomastix strix. We have sequenced four protein-coding genes from S. strix and found that the canonical stop codons TAA and TAG encode the amino acid glutamine. These codons are retained in S. strix mRNAs, and the legitimate termination codons of all genes examined were found to be TGA, supporting the prediction that this should be the only true stop codon in this genome. Only four other lineages of eukaryotes are known to have evolved non-canonical nuclear genetic codes, and our phylogenetic analyses of a-tubulin, b-tubulin, elongation factor-1 a (EF-1 alpha), heat-shock protein 90 (HSP90), and small subunit rRNA all confirm that the variant code in S. strix evolved independently of any other known variant. The independent origin of each of these codes is particularly interesting because the code found in S. strix, where TAA and TAG encode glutamine, has evolved in three of the four other nuclear lineages with variant codes, but this code has never evolved in a prokaryote or a prokaryote-derived organelle. The distribution of non-canonical codes is probably the result of a combination of differences in translation termination, tRNAs, and tRNA synthetases, such that the eukaryotic machinery preferentially allows changes involving TAA and TAG. q 2003 Elsevier Science Ltd. All rights reserved Keywords: genetic code; oxymonad; glutamine; translation; evolution *Corresponding author Introduction The genetic code is the key to information flow in cellular life, and therefore one of the most highly conserved characteristics of all living systems. Virtually all genetic systems, including bacteria, archaebacteria, eukaryotic nuclei, organelles, and viruses use the same genetic code, which has been called the Universal genetic code. It is now widely accepted that this standard or canonical genetic code originated early in cellular evolution, prior to the common ancestor of all extant life. However, despite the widespread and near-absolute conser- vation of the genetic code, the code is not quite universal. A small number of genomes have been found to possess slightly different genetic codes, referred to here as non-canonical genetic codes. All non-canonical codes differ from the standard code only slightly, showing beyond any doubt that they have evolved by making small changes to the standard code. Various models have been suggested to explain how changes to the genetic code can be made, the most widely discussed being the codon-capture 1 and ambiguous inter- mediate 2 models. In codon-capture, shifts in the frequency of various codons, typically due to a strong bias in AT content of the genome, lead to the total loss of a particular codon. Once lost, the translational machinery responsible for recognising that codon can be lost or altered, leaving the codon “unassigned”. Unassigned codons may then be reclaimed by the appearance of a mutant tRNA that can recognise them, which in turn allows the codon to reappear in the genome, potentially encoding a new amino acid. 1,3 The ambiguous 0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved E-mail address of the corresponding author: [email protected] Abbreviations used: SSU, small subunit; EF, elongation factor; RT, reverse transcriptase; UTR, untranslated region; RACE, rapid amplification of cDNA ends; DIC, differential interference contrast. doi:10.1016/S0022-2836(03)00057-3 J. Mol. Biol. (2003) 326, 1337–1349

Transcript of PII: S0022-2836(03)00057-3

Page 1: PII: S0022-2836(03)00057-3

Characterisation of a Non-canonical Genetic Codein the Oxymonad Streblomastix strix

Patrick J. Keeling* and Brian S. Leander

Department of BotanyCanadian Institutefor Advanced ResearchUniversity of British Columbia3529-6270 UniversityBoulevard, VancouverBC, Canada V6T 1Z4

The genetic code is one of the most highly conserved characters in livingorganisms. Only a small number of genomes have evolved slight varia-tions on the code, and these non-canonical codes are instrumental inunderstanding the selective pressures maintaining the code. Here, wedescribe a new case of a non-canonical genetic code from the oxymonadflagellate Streblomastix strix. We have sequenced four protein-codinggenes from S. strix and found that the canonical stop codons TAA andTAG encode the amino acid glutamine. These codons are retained inS. strix mRNAs, and the legitimate termination codons of all genesexamined were found to be TGA, supporting the prediction that thisshould be the only true stop codon in this genome. Only four otherlineages of eukaryotes are known to have evolved non-canonical nucleargenetic codes, and our phylogenetic analyses of a-tubulin, b-tubulin,elongation factor-1 a (EF-1 alpha), heat-shock protein 90 (HSP90), andsmall subunit rRNA all confirm that the variant code in S. strix evolvedindependently of any other known variant. The independent origin ofeach of these codes is particularly interesting because the code found inS. strix, where TAA and TAG encode glutamine, has evolved in three ofthe four other nuclear lineages with variant codes, but this code hasnever evolved in a prokaryote or a prokaryote-derived organelle. Thedistribution of non-canonical codes is probably the result of a combinationof differences in translation termination, tRNAs, and tRNA synthetases,such that the eukaryotic machinery preferentially allows changes involvingTAA and TAG.

q 2003 Elsevier Science Ltd. All rights reserved

Keywords: genetic code; oxymonad; glutamine; translation; evolution*Corresponding author

Introduction

The genetic code is the key to information flowin cellular life, and therefore one of the most highlyconserved characteristics of all living systems.Virtually all genetic systems, including bacteria,archaebacteria, eukaryotic nuclei, organelles, andviruses use the same genetic code, which has beencalled the Universal genetic code. It is now widelyaccepted that this standard or canonical geneticcode originated early in cellular evolution, prior tothe common ancestor of all extant life. However,despite the widespread and near-absolute conser-vation of the genetic code, the code is not quite

universal. A small number of genomes have beenfound to possess slightly different genetic codes,referred to here as non-canonical genetic codes.All non-canonical codes differ from the standardcode only slightly, showing beyond any doubt thatthey have evolved by making small changes tothe standard code. Various models have beensuggested to explain how changes to the geneticcode can be made, the most widely discussedbeing the codon-capture1 and ambiguous inter-mediate2 models. In codon-capture, shifts in thefrequency of various codons, typically due to astrong bias in AT content of the genome, lead tothe total loss of a particular codon. Once lost, thetranslational machinery responsible for recognisingthat codon can be lost or altered, leaving the codon“unassigned”. Unassigned codons may then bereclaimed by the appearance of a mutant tRNAthat can recognise them, which in turn allows thecodon to reappear in the genome, potentiallyencoding a new amino acid.1,3 The ambiguous

0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved

E-mail address of the corresponding author:[email protected]

Abbreviations used: SSU, small subunit; EF, elongationfactor; RT, reverse transcriptase; UTR, untranslatedregion; RACE, rapid amplification of cDNA ends; DIC,differential interference contrast.

doi:10.1016/S0022-2836(03)00057-3 J. Mol. Biol. (2003) 326, 1337–1349

Page 2: PII: S0022-2836(03)00057-3

intermediate model postulates that mutant tRNAsappear to claim active codons, for a time resultingin codons with two possible meanings. Eventually,the tRNA or termination factor originally recog-nising the codon is lost or altered so that it nolonger recognises the codon, resulting in a newcode.2,4 While the order of events in these twomodels may be different, the basic mechanism issimilar in some respects.

Although the differences between non-canonicalgenetic codes and the standard code are typicallyslight, these differences are important, sincethey reflect rare escapes from the strong selectivepressures that constrain the code; namely, theactivity of the protein translational machinery. Tounderstand these pressures, it is necessary to havesome appreciation for the variant codes that haveevolved, and the frequency with which differentnon-canonical codes have appeared in differentlineages. Mitochondrial genomes are a particularlyfertile source of genetic code variations, likelybecause they are small genomes with unusualpopulation dynamics.5,6 In contrast, plastid gen-omes, which are similar in many respects tomitochondrial genomes, use the standard geneticcode. Among eubacteria, a variant code has beendocumented in mycoplasmas,7 but no variantcode has been observed in any archaebacterium.In eukaryotic nuclear genomes, four lineages havebeen documented to alter their genetic code, andthe distribution and nature of these changes isquite interesting. One unique variant is found inseveral species of the yeast Candida, where CUGencodes serine rather than leucine.8 In ciliates, noless than three non-canonical genetic codes haveevolved: in Euplotes, UGA encodes cysteine ratherthan stop,9 while in Blepharisma and Colpoda,UGA encodes tryptophan,10 and in most other cili-ates TAA and TAG encode glutamine rather thanstop.11 – 13 This last variation is most interesting,since it appears to have evolved within ciliatesmore than once,10,14 and has been documented indasycladacean green algae15 and in hexamitiddiplomonads.16,17 These non-canonical codes haveyielded some intriguing insights into the stabilityof the genetic code in different lineages, but thenumber of variant codes characterised to date arestill few. Therefore, the debates as to how thesechanges take place and how they may be a reflec-tion of the translational apparatus have notabated.18 – 20

The oxymonads are a group of flagellates thatrank among the most poorly studied eukaryotesknown. They are anaerobes or microaerophilesthat have no recognisable mitochondrion. Thegroup is morphologically very diverse, but itsmembers are not abundant in nature, beingrestricted to the guts of animals, predominantlytermites.21 Living in complex associations withother microorganisms and their animal hosts,oxymonads are not available in cultivation andaccordingly it is only recently that molecular datahave been characterised from any oxymonad.22 –24

The molecular data remain sparse and restrictedto two closely related genera, so we soughtto characterise several gene sequences fromStreblomastix strix, an unusual oxymonad foundin the hindgut of the damp-wood termiteZootermopsis angusticolis. Unexpectedly, S. strix wasfound to employ a non-canonical genetic code,where the amino acid glutamine is encoded by thecanonical CAA and CAG codons, and by TAAand TAG codons.

This is only the fifth lineage of eukaryotesknown to employ a non-canonical genetic code inthe nuclear genome, but already the fourth lineagewhere this particular variant is found. The recur-rent evolution of this variant code in nuclear gen-omes suggests that these codons are particularlyfree to evolve a new meaning in eukaryotes, andthe different constraints on the genetic code ineukaryotes and prokaryotes likely reflect basicdifferences in translation.

Results and Discussion

Characterisation of molecular sequences fromStreblomastix, and in situ hybridization

Fractionated Z. angusticolis hindgut materialenriched with S. strix (see Figure 1(a) and (b) formorphology) was used to amplify several differentgene sequences. Initially, the small subunit ribo-somal RNA (SSU rRNA) gene was amplified toevaluate the proposed relationship betweenS. strix and oxymonads, as oxymonads are verypoorly studied and morphologically diverse,and S. strix is a relatively unusual flagellate.25 –27

Amplifications of SSU rRNA genes from fraction-ated gut material yielded two products: one small(1574 bp), faint product corresponding to the sizeexpected of parabasalia (which also are foundin the Z. angusticolis hindgut), and a second,abundant and very large product (2471 bp).Both products were cloned and sequenced. Thesequence of the smaller product corresponded tothe recently reported sequence of Trichomitopsistermopsidis,28 as expected, while the larger productcorresponded to an SSU gene most similar to thelarge homologue (2012 bp) from the oxymonadPyrsonympha. Preliminary phylogenetic analysis(not shown) confirmed this relationship. The puta-tive S. strix SSU rRNA gene contains numerouslarge insertions, some at positions shared by inser-tions in the Pyrsonympha SSU, and others atunique positions.

To confirm the origin of the SSU rRNA gene,one unique insertion was used as a target for aS. strix-specific probe in fluorescent in situ hybridi-sation. When hybridised to whole gut contents,the probe specifically recognised S. strix (Figure1(e) and (f)). Some wood-eating parabasalia wereobserved to fluoresce weakly at this wavelength,but these cells autofluoresced at the same inten-sity in controls that were not exposed to the

1338 A Non-canonical Genetic Code in Streblomastix strix

Page 3: PII: S0022-2836(03)00057-3

rhodamine-labeled probe. In these controls,Streblomastix cells did not autofluoresce (Figure1(c) and (d)). Moreover, in the preparationsexposed to the probe, the degree of fluorescence inStreblomastix was noticeably higher than in para-basalids. As further confirmation, exact-match SSUrRNA primers were used to amplify the same SSUgene from manually isolated Streblomastix cells.

Fractionated gut material was used to amplifyStreblomastix a-tubulin genes. A single product ofthe expected size was observed and three indi-vidual clones were sequenced. All three clonesencoded distinct but similar a-tubulin genes thatvaried predominantly at synonymous positions.In preliminary phylogenetic analyses, these genesshowed a very close relationship with the pre-viously characterised a-tubulins from the oxymo-nads Pyrsonympha and Dinenympha, suggestingthat the genes were indeed from Streblomastix.Moreover, the PCR-based approach using exact-match primers on 50 manually isolated Streblo-mastix cells produced a fragment of the samea-tubulin gene. Sequencing this product con-firmed that these sequences are derived fromStreblomastix.

The inferred translation of the PCR productsrevealed that all three copies of the a-tubulin thatwere sequenced encoded TAG codons at position234, and two of the three copies encoded TAAcodons at position 69. The third copy encoded aCAA glutamine codon at position 69 (Figure 2(a)).TAA and TAG are termination codons in thecanonical genetic code, and both position 69 and234 are highly conserved for glutamine in othereukaryotes, suggesting that TAA and TAG perhapsencode glutamine in the Streblomastix genomerather than acting as termination codons.

To determine whether this characteristic iscommon to other Streblomastix genes, genesencoding b-tubulin, translation elongation factor-1a (EF-1 a), and heat-shock protein 90 (HSP90)were amplified, and several individual clonessequenced for each gene (five for b-tubulin andHSP90, and three for EF-1 a). As with a-tubulin,the individual clones for each of these genesencoded nearly identical proteins, but displayed agreat deal of variation at synonymous positions(in HSP90, an extremely variable acidic domainwas found to vary in sequence and length at theinferred amino acid level). Considering that thesource of the material is a natural populationand not a culture, this is perhaps not surprising:a natural population would be expected to haveconsiderably more variability than normally seenin cultivated protozoa. As with a-tubulin, pre-liminary phylogenetic analysis confirmed that theStreblomastix EF-1 a branched with other knownoxymonad genes (not shown). No other oxymonadb-tubulins or HSP90 genes are known, but oxy-monads are known to be close relatives of theflagellate Trimastix.22 Phylogenies of all four pro-tein-coding genes including homologues from Tri-mastix (unpublished Trimastix marina sequenceswere made available by A. B. G. Simpson & A. J.Roger) showed Streblomastix and Trimastix tobe closely related (not shown). To further confirmthe origin of these sequences, Streblomastix wasmanually isolated once more, and fragments ofb-tubulin and HSP90 were amplified from twopools of 50 isolated cells using exact-matchprimers. In each case, a product of the expectedsize was obtained and sequencing confirmedthat the product matched the sequences of theStreblomastix genes.

Figure 1. Representative light and scanning electron micrographs of Streblomastix strix; all scale bars represent10 mm. (a) Light micrograph using differential interference contrast (DIC) showing the general appearance of a singlecell. This appearance matches previous descriptions of Streblomastix.26,27 (b) Scanning electron micrograph showingthe rod-shaped epibiotic bacteria oriented longitudinally along the surface of a cell, a feature that is diagnostic ofS. strix.25 (c) DIC micrograph of a cell prepared for in situ hybridisation but without exposure to a rhodamine-labeledprobe. (d) The same cell as in (c) observed under a 546 nm excitation wavelength; there was no evidence of auto-fluorescence. (e) DIC micrograph of a cell exposed to a rhodamine-labeled probe specific to S. strix. (f) Fluorescenceemitted from the same cell as in (e) (under a 546 nm excitation wavelength) provides qualitative evidence for specificbinding of the rhodamine-labeled SSU rDNA probe to S. strix.

A Non-canonical Genetic Code in Streblomastix strix 1339

Page 4: PII: S0022-2836(03)00057-3

More importantly, every copy of each of thesefour genes encoded TAA and TAG codons atpositions otherwise highly conserved for gluta-mine (Figure 2(b)–(d)). This confirms that thepresence of TAA and TAG codons in a-tubulin isnot gene-specific, and is a general feature of thegenome. Indeed, a total of 116 TAA and TAGcodons were observed at 32 positions throughoutthe various copies of the four genes, which is asubstantial representation. Significantly, of the32 positions where TAA or TAG codons wereobserved, CAA or CAG codons were used atthe same position in one or more of the othercopies of the gene in 26 cases (over 81% ofthe time), suggesting that these codons areinterchangeable.

Characteristics of Streblomastix mRNA,termination codons, and 30 UTRs

If Streblomastix uses a non-canonical geneticcode where TAA and TAG encode glutamine,then all protein-coding genes in the genome arepredicted to terminate with TGA codons. To testthis, we performed 30 rapid amplification of cDNAends (RACE) on all four protein-coding genes.This method is extremely informative for twoadditional reasons. First, protein-coding genesequences from oxymonads were, until now,restricted to a-tubulin and EF-1 a from two closelyrelated genera, Pyrsonympha and Dinenympha,and an EF-1 a from an unidentified oxymonad.In total, these sequences include 24 positions

Figure 2. Examples of position and variability of TAA and TAG codons in Streblomastix genes. (a) a-Tubulin;(b) b-tubulin; (c) EF-1 a; (d) HSP90. Each section shows small blocks from a protein alignment with TAA or TAGcodons in Streblomastix represented as an asterisk ( p ). Numbers above the sequence indicate the codon number inthe Streblomastix sequence. In cases where different Streblomastix sequences encoded different codons at the positionin question, all sequences are shown.

1340 A Non-canonical Genetic Code in Streblomastix strix

Page 5: PII: S0022-2836(03)00057-3

encoding glutamine, and in none of these has aTAA or TAG codon been observed, suggestingthat these oxymonads use the canonical geneticcode. However, every one of these genes wasacquired by reverse transcriptase (RT)-PCR frommRNA, leaving open the formal, albeit unlikely,possibility that oxymonads use a T-to-C RNA edit-ing mechanism, and that we are observing pre-edited genes in Streblomastix. Second, generalcharacteristics of protein carboxy termini andmRNA 30 untranslated regions (UTRs) hold infor-mation about potentially important characteristicsof the genome, and whether different transcriptsare from different alleles or loci.

For each of the four genes, 30 RACE products ofthe expected size were amplified, cloned, and fiveindividual clones sequenced. As with the genomicclones, a substantial degree of synonymous varia-tion was observed between different RACE clones,and between the RACE clones and the genomicPCR products, such that only three redundantsequences were found. In b-tubulin and HSP90,all 30 UTRs were clearly distinct but related,while the 30 UTRs of a-tubulin and EF-1 a fell intotwo distinct classes that shared no detectablesequence similarity. This suggests that the varia-tion observed in coding regions might reflectdifferent alleles (those with similar UTRs) aswell as different loci (those with different UTRs).In either case, each distinct RACE product isclearly derived from a unique coding sequence,and the termination codon of each product isaccordingly evolving independently. It is signifi-cant, therefore, that every one of the 16 uniqueRACE products terminated at the expected pos-ition with a TGA codon (many genes terminatewith two TGA codons spaced 3 bp apart). Inaddition, all RACE products contained TAA andTAG codons at positions otherwise conserved forglutamine, and the proportion of CAR to TARmatches closely the pattern observed in genomicsequences (2:1). Altogether, there is no indicationof RNA editing. This suggests that this non-canonical code is either restricted to Streblomastix,or that Pyrsonympha and Dinenympha use thenon-canonical codons at a much lower frequency.To determine the distribution of this code in oxy-monads, it will be necessary to characterise the 30

ends of protein-coding genes from Pyrsonympha

and Dinenympha, and to characterise more pro-tein-coding genes from diverse oxymonads. Giventhe poor sampling of molecular data from oxy-monads, it is reasonable to suspect that thischaracter may be widespread among species notyet examined at the molecular level.

Codon usage in Streblomastix

Comparing the sequences of both genomicDNAs and mRNAs from all copies of all fourprotein-coding genes reveals that virtually everyamplification product characterised is unique(nearly all of the variation being synonymous).The 30 UTR sequences suggest that most of thisvariation can be attributed to variation betweenalleles, since in most cases the 30 UTR sequencesfor a given gene are clearly homologous. In EF-1 aand a-tubulin mRNAs, however, two entirely non-homologous classes of 30 UTR were observed. Thissuggests that, in at least these two instances, twodistinct loci were characterised.

If TAA and TAG encode glutamine, then TAR/CAR variation should closely resemble othersynonymous variation (this would not be observedif there was a very strong bias toward one or theother, but this is not the case). Taking a representa-tive amino acid sequence for each gene, the poten-tial number of synonymous and non-synonymoussubstitutions can be calculated (Table 1), and com-pared with the actual number observed betweenthose copies of the gene that have been sequenced.For each gene, these ratios are similar (somedifference being attributable to the number ofcopies of the gene that have been characterised:see Materials and Methods), and the differencebetween the two is about an order of magnitude(Table 1, rows 2 and 3). If substitutions at both thefirst and third positions of glutamine codons areexamined, the ratios are found to be in line withratios calculated for synonymous positions forthat gene (Table 1, rows 4 and 5), suggesting thatC-T substitutions at the first position aresynonymous.

If the variation in codon use between differentalleles and loci are considered to be at least partlyindependent, a table of codon frequencies for30,804 bp of coding sequence can be compiledfrom all unique sequences. While this does not

Table 1. Synonymous substitution patterns in Streblomastix

a-Tubulin b-Tubulin EF-1 a HSP90

Total TAR þ CAR 12 19 19 18Synonymous substitutions (observed:potentiala) 0.1845 0.3904 0.3657 0.2448Non-synonymous substitutions (observed:potentialb) 0.0103 0.0066 0.0369 0.0205YAA-YAG conversions (observed:potential) 0.2500 0.2632 0.3158 0.2222TAR-CAR conversions (observed:potential) 0.1667 0.3158 0.5789 0.3889

a The frequency of an amino acid multiplied by the number of positions within its codons that can change resulting in a synony-mous substitution (see Materials and Methods).

b The frequency of an amino acid multiplied by the number of positions within its codons that can change resulting in a non-synonymous substitution (see Materials and Methods).

A Non-canonical Genetic Code in Streblomastix strix 1341

Page 6: PII: S0022-2836(03)00057-3

represent the codon biases throughout the Streblo-mastix genome, because this is a skewed set ofgenes (three being constitutively highly expressed),it does reveal some trends. First, there is anobvious bias in favour of the canonical CAR gluta-mine codons over the non-canonical TAR codons,as they appear at a ratio of approximately 2:1.There is no consistent AT or GC bias (for instance,asparagine codons are biased to C at the thirdposition, aspartate codons are biased to T).Similarly, there is no consistent purine or pyrimi-dine bias (for instance, there is an A bias in serine,proline, threonine, and alanine and a T bias invaline). The most conspicuous bias is towards Aresidues at the third position of codon quartets.While it is true that existing sequences cannotreveal all past mutation biases in a genome, theexisting data from Streblomastix reveal no strongbias that could be interpreted as a driving force tochange the genetic code.

Phylogeny and the distribution of non-canonical genetic codes in eukaryotes

Traditionally,29,30 and more recently,31 oxymo-nads have most often been considered to berelated to retortamonads and diplomonads. Recentmolecular data have suggested that this is not thecase22 – 24 but, nevertheless, the occurrence of thesame non-canonical genetic code in an oxymonadand the hexamitid diplomonads16,17 does requiresome careful attention to determine whether thetwo variant codes really evolved independently.To this end, phylogenetic trees were inferred usingthe five genes characterised to determine the distri-bution of this code among eukaryotes. Trees basedon the four protein-coding genes are shown inFigure 3, and a phylogeny of SSU rRNA genes isshown in Figure 4, where all lineages with non-canonical nuclear genetic codes are indicated.In all cases where other oxymonad sequences areknown (SSU rRNA, a-tubulin, and EF-1 a), Streblo-mastix branches with the other oxymonads, asexpected. In no case does Streblomastix or theoxymonads in general show a specific relationshipto any other lineage that uses this or any othernon-canonical code. In particular, no phylogenyshows a close relationship between Streblomastixand the hexamitid diplomonads, although theb-tubulin phylogeny does show Streblomastix as aclose relative of the parabasalia and diplomonadsas a whole. Even in this case, the two lineagessharing the same divergent code are separatedby the parabasalia and the diplomonad Giardiaintestinalis, both of which use the standardcode.16,17 Altogether, the phylogenies do not showstrong support for any particular position forStreblomastix in the eukaryotic tree, other thanbeing related to other oxymonads and Trimastix.Regardless of the precise evolutionary origin ofStreblomastix and the oxymonads, the phylogeniesdemonstrate unambiguously that the non-canoni-cal code in Streblomastix evolved independently

of other non-canonical codes, including that foundin diplomonads.

The evolution of the genetic code

Figure 4 shows an SSU rRNA phylogeny inclu-ding all eukaryotic lineages known to use a non-canonical genetic code, and indicates how eachdiffers from the standard code. It is difficult todetermine exactly how many times the code haschanged in eukaryotes because the phylogeny ofciliates is uncertain and the code has changedmany times in ciliates (likely a result of ciliatesadopting their unique system of differentiatedgerm-line and somatic nuclei32), but it is clear thatthe code has changed in five distinct lineages. It isinteresting that in four of these lineages, the samenon-canonical code has appeared, and there arestrong reasons to believe it has appeared morethan once independently in ciliates,10,14 while theother three codes known in nuclear genomes areeach found only once (although a good case canbe made for two origins of UGA encoding trypto-phan in Blepharisma and Colpoda10).

Code changes involving stop codons aredetected far more easily than changes wherecodons shift from encoding one amino acid toanother, so it is possible that these dominate ourunderstanding of code changes simply becauseadditional changes have gone undetected in poorlystudied genomes. However, there are reasons tobelieve that stop codons could be reassigned moreeasily than other codons. Most importantly, stopcodons are rare in any genome (each representeda maximum of once per gene, and more likelymuch less), and the fewer the number of codonsin question the easier it is to reassign them, sinceit demands fewer events and fewer potentiallydeleterious intermediates. Nevertheless, it wouldappear that TAA and TAG are especially prone toreassignments in eukaryotic genomes, and whenthey are reassigned, they always encode glutamine.Why such a bias would exist, and why it wouldbe phylogenetically restricted are interestingquestions, and the answer is probably a complexmix of the various factors that constrain the geneticcode. We will consider three such factors: thetranslation termination apparatus, tRNAs andtRNA synthetases, and mutation frequencies.

First, the translation termination systems ofeubacteria, archaebacteria, and eukaryotes havenow been characterised partially, and there aresubstantial differences between eubacterial andnuclear systems. In particular, eubacteria use twotermination factors: RF1 recognises UAA andUAG, and RF2 recognises UAA and UGA.33 – 35 Incontrast, eukaryotes and archaebacteria possess asingle protein factor, eRF1, which recognises allthree stop codons.36 – 38 It has been argued thatchanges to the codon recognition site of eukaryoticeRF1 can lead to the loss of UAA and UAGrecognition.10,39 Conversely, it is probably verydifficult to reassign UAA in eubacteria, since there

1342 A Non-canonical Genetic Code in Streblomastix strix

Page 7: PII: S0022-2836(03)00057-3

are two independent factors that recognise thiscodon. This could, partly, explain why UAA andUAG have never been reassigned together ineubacteria or eubacterial organelles (althoughUAG has been reassigned alone as either alanineor leucine in the mitochondria of various greenalgae40).

Second, a critical part of ensuring the fidelity ofthe genetic code is in the specificity of tRNAs andthe enzymes that charge them with the appropriateamino acid, the tRNA synthetases. In virtuallyevery system that has been examined, CAR gluta-mine codons are decoded using two tRNAs, onewith anticodon CUG and one with UmUG (where

Figure 3. Phylogenetic trees of protein-coding genes. (a) a-Tubulin; (b) b-tubulin; (c) EF-1 a; (d) HSP90. All trees areunrooted g-corrected, weighted neighbor-joining trees (diplomonads are shown arbitrarily as the outgroup for con-sistency, and so the relative position of diplomonads and oxymonads can be seen easily). Numbers at nodes corre-spond to bootstrap support from weighted neighbor-joining (top), Fitch-Margoliash (centre), and protein maximumlikelihood (bottom). Major eukaryotic groups are bracketed and named to the right.

A Non-canonical Genetic Code in Streblomastix strix 1343

Page 8: PII: S0022-2836(03)00057-3

Um is a modified uridine base). Interestingly, bothisoacceptors have been shown to act as naturalsupressors of UAG and UAA in various eukary-otes,41,42 providing a natural bias for UAR codonsbeing reassigned to glutamine.

While the fidelity of the code is generallythought of at the level of tRNA binding to mRNA,it is equally important that the tRNA be chargedwith the correct amino acid. The recognition of theappropriate tRNA by its cognate aminoacyl-tRNA

Figure 4. The distribution of non-canonical genetic codes in eukaryotes. g-Corrected ML tree (2 ln L ¼ 25353.473)inferred from an alignment containing 60 SSU rDNA sequences and 1140 sites showing the position of Streblomastixamong the oxymonads and other eukaryotic lineages with non-canonical genetic codes (all boxed). Numbers atrelevant nodes correspond to 100 bootstrap replicates using g-corrected weighted neighbor-joining (above) and Fitch-Margoliash (below). Major eukaryotic lineages are labeled to the right. The transition/transversion ratio was 1.69; thescale bar represents 0.1 substitution (corrected) per site.

1344 A Non-canonical Genetic Code in Streblomastix strix

Page 9: PII: S0022-2836(03)00057-3

synthetase takes place through the so-calledidentity elements of the tRNA.43 In most cases(including glutamine), the anticodon makes up animportant part of the identity set, so one necessarystep in the origin of a non-canonical genetic codeis the relaxation of the tRNA-specificity of thetRNA synthetase. In most cases, there is a singletRNA synthetase specific for each amino acid,but there are some exceptions, and glutamine isone of note. Glutaminyl-tRNAs are charged by aglutaminyl-tRNA synthetase in all known nucleo-cytoplasmic systems of eukaryotes, but are other-wise found only in certain eubacteria, where theyare considered to have originated by lateral genetransfer.44 All other eubacteria, eubacterial-derivedorganelles, and archaebacteria lack a specificglutaminyl-tRNA synthetase, and instead chargeglutaminyl-tRNAs with glutamic acid using theglutamyl-tRNA synthetase.43 – 45 The glutamateresidue is subsequently transamidated by a specificGlu-tRNAGln amidotransferase,46 yielding gluta-minyl-tRNA. It is conceivable that the specificrecognition of tRNAGln by not one, but twoenzymes in these organisms (the tRNA synthetaseand the amidotransferase) could limit theprobability of charging mutant tRNAs in theseeubacteria, organelles, and archaebacteria.Taking together the translation terminationapparatus, the presence of natural suppressortRNAs and a discrete glutaminyl-tRNA synthetase,the eukaryotic nucleo-cytosolic environmentcould be a unique intersection of several factorsthat favour this particular non-canonical geneticcode.

Lastly, however, the natural variation in thefrequencies of different kinds of mutations shouldbe considered, as these too could play an importantrole in the evolution of non-canonical geneticcodes. A single transition is by far the mostcommon mutation, and six different codons can beconverted into TAA or TAG by single transitions.A transition mutation at the first position (C-to-T)converts glutamine codons CAA and CAG to TAAand TAG. At the second position, a single A-to-Gtransition converts the lone tryptophan codon(TGG) to TAG, while the TGA-to-TAA mutation isa synonymous opal-to-ochre change. At the thirdposition, TAA and TAG mutate to one another,and are therefore also synonymous. Accordingly,if TAG and TAA are to be involved in a codonreassignment, it is most likely to involve eitherglutamine or tryptophan. A change to tryptophanis considerably less likely than to glutamine, sincetryptophan is the rarest amino acid: for a codonreassignment to be fixed in the genome, it isnecessary for canonical codons to mutate to thenew codon, and there are relatively few tryptophancodons in a genome available to do this. Overall,the impact of mutation frequencies might beslight, but the potentially frequent mutationbetween TAR and CAR, together with the presenceof an additional synonymous codon (ochreTGA)makes the TAR pair especially flexible, and

perhaps prone to reassignment when other factorsare also favourable.

The exact course of events that leads to theevolution of a non-canonical genetic code hasbeen modeled in several different ways, and manyattempts have been made to take into account thenature of the genomes involved, the tRNAs thatresult, and the effects on translation terminationfactors to come up with a unified explanationfor how the genetic code evolves.3 – 6,10,18 – 20,39,47,48

However, the evolution of the genetic code mightbetter be regarded as a balance between manyfactors: in this case these may include the natureof the translation termination equipment, charac-teristics of tRNAs, the system used to chargeglutaminyl-tRNAs, and natural biases in mutationfrequencies. Indeed, it might be more informativeto treat each novel genetic code individually, sinceeach new code has evolved under a unique setof circumstances, potentially for very differentreasons and by very different means. There is noreason to believe that a single explanation or asingle model for the evolution of the geneticcode exists, but rather a set of outcomes whosefrequencies reveal important aspects of the natureof the translational machinery, mutation, and theflexibility of the code.

Materials and Methods

Collection, identification, and isolation ofStreblomastix strix

The termite Z. angusticolis was collected from dampand rotting wood at Jerico Beach, Vancouver, BC,Canada. Termites were maintained in the laboratory inplastic containers with damp wood from their originalcollection site. Termite hindgut contents were diluted inTrager’s medium U49 and inspected by differential inter-ference contrast (DIC) light microscopy for the presenceof Streblomastix, which was identified in all termitesexamined. As the hindgut of Z. angusticolis containsseveral other flagellates (but only one oxymonad), thematerial was enriched with Streblomastix by a simpledensity fractionation. Hindgut contents from ten termiteswere drawn into a 10 cm glass Pasteur pipette andallowed to settle for five minutes. The contents of thepipette were expelled slowly into three equal fractions,and the process repeated three times on the last fractionfrom each round. Microscopic examination revealed thatthis material was predominantly Streblomastix, whilethe remainder was largely the parabasalian flagellateTrichomitopsis, and a very small number of large tricho-nymphid parabasalia. Fractionated hindgut materialwas harvested by centrifugation for the extraction ofDNA and RNA. DNA was extracted by resuspendingpelleted material in 100 ml of CTAB extraction buffer(1.12 g of Tris, 8.18 g of NaCl, 0.74 g of EDTA, 2 g ofcetyltrimethylammonium bromide (CTAB), 2 g of poly-vinylpyrolidone, 0.2 ml of 2-mercaptoethanol in 100 mlof water) in a Knotes Duall 20 tissue-homogeniser andincubating at 65 8C with periodic grinding. The lysatewas extracted twice with chloroform/isoamyl alcohol(24:1, v/v), and the aqueous phase precipitated in 95%(v/v) ethanol.

A Non-canonical Genetic Code in Streblomastix strix 1345

Page 10: PII: S0022-2836(03)00057-3

Individual cells of Streblomastix were also manuallyisolated from gut contents diluted in Trager’s medium,using a Pasteur pipette stretched to a diameter ofapproximately 75 mm. Isolated cells were re-diluted inclean medium and re-isolated a total of three times untilno contaminating eukaryotes could be observed. DNAwas prepared from isolated cells by extracting homo-genised cells with chloroform/isoamyl alcohol(24:1, v/v) and precipitation in ethanol. PCR-basedapproaches using exact-match primers (described later)were used to confirm that gene sequences were derivedfrom the genome of Streblomastix.

Amplification and sequencing

The SSU rRNA genes from Streblomastix were ampli-fied from DNA extracted from the fractionated hindgutmaterial using the primers TGATCCTTCTGCAGGTTCACCTAC and CTGGTTGATCCTGCCAGT. Protein-coding genes from Streblomastix could not be amplifiedusing a simple protocol, and were amplified using atwo-step nested PCR procedure. In each case a reactionwas carried out using a pair of outer primers on theDNA extracted from the fractionated hindgut material,and the product of that reaction used as the templatefor a secondary reaction using a pair of inner primers.Primers used were: for a-tubulin GGGCCCCAGGTCGGCAAYGCNTGYTGG and GGGCCCCGAGAACTCSCCYTCYTCCAT (outer primers) and CGCGGCCTCARGTNGGNAAYGCNTGYTGGGA and CGCGCCATNCCYTCNCCNACRTACCA (inner primers); for b-tubulinGCCTGCAGGNCARTGYGGNAAYCA and TCCTCGAGTRAAYTCCATYTCRTCCAT (outer primers) and CAGATCGGCGCGAARTTYTGGGARAT and CTCGTCCATGCCYTCNCCNGTRTACCA (inner primers); for EF-1 aAACATCGTCGTGATHGGNCAYGTNGA and CTTGATCACNCCNACNGCNACNGT (outer primers) and CAACATCGTCGTCATCGGNCAYGTNGA and GCCGCGCACGTTGAANCCNACRTTRTC (inner primers); andfor HSP90 GGAGCCTGATHATHAAYACNTTYTA andCGCCTTCATDATNCKYTCCATRTTNGC (outer primers)and ACGTTYTAYWSNAAYAARGARAT and CGCCTTCATDATNCKYTCCATRTTNGC (inner primers). HSP90sequences were also obtained using the inner primersACGTTYTAYWSNAAYAARGARAT and GATGACYTTNARDATYTTRTTYTGYTG. The HSP90 gene from Hexa-mita inflata was also amplified using GGAGCCTGATHATHAAYACNTTYTA and CGCCTTCATDATNCKYTCCATRTTNGC so that a representative diplomonad couldbe included in the phylogeny of HSP90. For each gene,several independent clones were sequenced completely.

PCR-based experiments using exact-match primers onmanually isolated cells of Streblomastix were used tohelp confirm that the clones were not derived from adifferent eukaryotic genome. Exact-match primers(available on request) were designed to amplify smallfragments (400–500 bases) of the SSU rRNA, a-tubulin,b-tubulin and HSP90 genes. Amplified products werecloned, sequenced in one direction, and compared tothe original sequences.

Microscopy and in situ hybridisation

Cells observed with DIC light microscopy were sus-pended in Trager’s medium, fixed in 1% (w/v) glutar-aldehyde, and secured under a cover-slip with VALAP(vaseline/lanolin/paraffin; 1:1: by wt50). Images wereproduced with a Zeiss Axioplan 2 Imaging microscope

connected to a Q-Imaging, Microimager II, black andwhite digital camera.

Cells for scanning electron microscopy were preparedwith an osmium vapor fixation protocol.51 A smallvolume (10 ml) of cells suspended in Trager’s mediumwas transferred into a Petri dish that contained a pieceof filter-paper mounted on the inner surface of the lid.The filter-paper was saturated with 4% (w/v) OsO4 andthe lid placed over the dish exposing the cells to OsO4

vapors. After 30 minutes of exposure, the cells werefixed for an additional 30 minutes with six drops of4% OsO4 added directly to the medium. Cells were trans-ferred onto a 3 mm polycarbonate membrane filter (Corn-ing Separations Div., Acton, MA), dehydrated with agraded series of ethyl alcohol, and critical point driedwith CO2. Filters were mounted on stubs, sputter-coatedwith gold, and viewed under a Hitachi S4700 scanningelectron microscope (SEM). The SEM image was pre-sented on a black background using Adobe Photoshop6.0 (Adobe Systems, San Jose, CA).

In situ hybridisation using a rhodamine-labeled oligo-nucleotide probe was conducted to help demonstratethat the PCR-amplified SSU rDNA sequence was derivedfrom the genome of Streblomastix. The probe consistedof 24 bases taken from an insertion unique to the puta-tive Streblomastix rRNA gene: Rhodamine-CTATTGGTCATCAGCTGCAGTGCG (tm in 1 M Naþ ¼ 76 8C). Gutcontents of Z. angusticolis were suspended in Trager’smedium, pelleted with gentle microcentrifugation(8 rpm), and pre-fixed in 4% paraformaldehyde in 5 £Tris buffer (2.5 M NaCl, 0.1 M Tris, pH 7.5) for 30 min-utes. Following dehydration in a graded series of ethylalcohol, cells were gradually rehydrated using 3:1, 1:1,1:3, and 0:1 ratios of ethyl alcohol and TBSt buffer (Trisbuffer, 0.1% (v/v) Tween 20). Cells were partiallydigested with proteinase K for 15 minutes and washedthree times with TBSt (ten minutes each). Cells werepost-fixed with 4% paraformaldehyde in TBSt buffer.

A pre-hybridisation step involved washing the cellsthree times (ten minutes each) in a hybridisation mix(5 £ SSC, 0.2% Tween 20, 5 mM EDTA). Cells wereheated in the hybridisation mix at 70 8C for 45 secondsbefore the fluorescent-labeled probe was added. Oncethe probe was added (3 ml of probe to cells suspendedin 120 ml of the hybridisation mix), cells were immedi-ately exposed to a denaturing step for five minutes at95 8C followed by an annealing step for five minutes at60 8C. Unbound probe was washed from the cells fivetimes (ten minutes each) with the hybridisation mix.Rhodamine-labeled cells were viewed under a ZeissAxioplan 2 Imaging microscope using a 546 nm exci-tation wavelength (emission wavelength 590 nm). Cellswere also prepared as described, but without exposureto the rhodamine-labeled probe, to examine autofluor-escence (i.e. false positives). Negative controls for non-specific binding were available by examining the degreeof fluorescence in parabasalids such as Trichomitopsisand Trichonympha in the same preparations.

Characterisation of mRNAs by 30 RACE

RNA was isolated from fractionated Streblomastix(see above) by resuspending harvested material in 1 mlof Trizol (Gibco-BRL) and transferring it to a KnotesDuall 20 tissue homogeniser. Material was groundfor five minutes and incubated for five minutes at roomtemperature without grinding. Lysate was extractedwith 200 ml of chloroform/isoamyl alcohol (24:1, v/v),

1346 A Non-canonical Genetic Code in Streblomastix strix

Page 11: PII: S0022-2836(03)00057-3

and the aqueous phase precipitated with 500 ml of iso-propanol. RT-PCR was carried out using 600 ng of totalRNA as a template with the M-MLV reverse transcriptaseand poly(T) adapter primer from the RLM-RACE kit(Ambion). 30 RACE was performed on the first-strandDNA using the common anchor primer GCGAGCACAGAATTAATACGACT together with gene-specificprimers: CTTTGTTAGAGCACACTGATGTTGC fora-tubulin, GCTTCAAGACACTCAAGCTGTCGAACCfor b-tubulin, ATGGATATACACCAGTGCTTGATTG orTGTGGAGATTCTAAATAAGATCCAC for EF-1 a, andCAGTTCAAACATCTCCATTCTTAGA or TCGTGTTCTTATTATGGAGAATTGCG for HSP90. Reactions werecarried out according to the RLM-RACE protocol usingAmpliTaq gold polymerase (Applied Biosystems). In allcases, a product was recovered after the first round ofamplification, cloned, and sequenced as above.

Phylogenetic and codon analysis

The synonymous nature of first-position mutations atglutamine codons was examined by two simple ratios.For each gene, a representative (consensus) amino acidsequence was taken, and the number of potentialsynonymous and non-synonymous substitutions wascalculated according to the following formulae. Potentialnon-synonymous substitutions are the frequency of eachamino acid multiplied by the number of positions withinits codons that yield a non-synonymous change whenmutated (these factors are: A,G,L,P,R,T,V ¼ 2; C,D,E,F,H,I,K,M,N,S,W,Y ¼ 3). Potential synonymous substitutionsare the frequency of each amino acid multiplied bythe number of positions within its codons that yield asynonymous change when mutated (these factors are:M,W ¼ 0; A,C,D,E,F,G,H,I,K,N,P,T,V,Y ¼ 1; L,R ¼ 2;S ¼ 3). The number of observed synonymous and non-synonymous substitutions are counted using the varia-tion observed between different copies of each genesequenced from DNA or cDNA. This is an oversimplifi-cation, since it does not take into account multiple substi-tutions, and the number is an arbitrary result of thenumber of alleles and loci characterised (e.g. if morecopies of a-tubulin were characterised, the number ofpotential substitutions would remain the same, but thenumber observed would have to increase). This has littleeffect on the conclusion, however, since the actual valueof the ratio is not important, but rather the relationshipof the value for glutamine codons compared with thatof all other codons.

To infer the phylogenetic position of Streblomastixgenes, representative sequences were aligned withhomologues from public databases and phylogenetictrees inferred using distance and maximum likelihoodmethods. For protein data, distances were inferredusing TREE-PUZZLE 5.052 with the WAG substi-tution matrix, and site-to-site rate variation modeled ona g-distribution with eight rate categories, and the aparameter and fraction of invariable sites estimatedfrom the data. The estimated shape parameter (a) andproportion of invariable sites (i) were: for a-tubulin,a ¼ 0.40 and i ¼ 0; for b-tubulin, a ¼ 0.43 and i ¼ 0; forEF-1 a, a ¼ 0.53 and i ¼ 0; and for HSP90, a ¼ 0.70 andi ¼ 0. Trees were constructed using weighted neighbor-joining with WEIGHBOR 1.0.1a,53 and Fitch-Margoliashusing FITCH 3.6a.† Bootstraps were carried out by

the same methods using PUZZLEBOOT (shell scriptavailable from www.tree-puzzle.de). Protein maximumlikelihood trees were inferred using ProML 3.6a withglobal rearrangements and ten random addition repli-cates. Site-to-site rate variation was modeled using the-R option, with invariable sites and six categories ofvariable sites estimated by TREE-PUZZLE. Protein MLbootstrapping was carried out under the same par-ameters except that four categories of variable siteswere considered. SSU rRNA phylogenies were inferredby maximum likelihood using PAUP 4.0b10‡ with theHKY substitution frequency matrix using a heuristicsearch starting with a neighbor-joining tree and site-to-site rate variation modeled as above. The a-parameterand proportion of invariable sites were estimated byTREE-PUZZLE (a ¼ 0.43 and i ¼ 0). Bootstraps werecalculated in the same way. Distance trees and distancebootstraps were inferred using the same parameters.

Data Bank accession numbers

All new sequences have been deposited in GenBankunder accession numbers AY138769-AY138805.

Acknowledgements

This work was supported by a grant (227301-00)from the Natural Sciences and EngineeringResearch Council of Canada (NSERC). P.J.K. is ascholar of the Canadian Institute for AdvancedResearch, the Michael Smith Foundation for HealthResearch, and the Canadian Institutes for HealthResearch. B.S.L. is supported by a postdoctoralfellowship from the National Science Foundation(USA). We thank Alastair Simpson, Jeff Silberman,and Andrew Roger for access to unpublishedsequences from Trimastix, and J. M. Archibald,J. T. Haper, and an anonymous reviewer for criticalreading of the manuscript.

References

1. Osawa, S. & Jukes, T. H. (1989). Codon reassignment(codon capture) in evolution. J. Mol. Evol. 28,271–278.

2. Schultz, D. W., Yarus, M. & Transfer, R. N. A. (1994).mutation and the malleability of the genetic code.J. Mol. Biol. 235, 1377–1380.

3. Osawa, S., Jukes, T. H., Watanabe, K. & Muto, A.(1992). Recent evidence for evolution of the geneticcode. Microbiol. Rev. 56, 229–264.

4. Schultz, D. W. & Yarus, M. (1996). On malleability inthe genetic code. J. Mol. Evol. 42, 597–601.

5. Knight, R. D., Landweber, L. F. & Yarus, M. (2001).How mitochondria redefine the code. J. Mol. Evol.53, 299–313.

6. Kurland, C. G. (1992). Evolution of mitochondrialgenomes and the genetic code. BioEssays, 14,709–714.

7. Yamao, F., Muto, A., Kawauchi, Y., Iwami, M.,Iwagami, S., Azumi, Y. & Osawa, S. (1985). UGA is

† http://evolution.genetics.washington.edu/phylip/doc/fitch.html ‡ http://paup.csit.fsu.edu/problems.html

A Non-canonical Genetic Code in Streblomastix strix 1347

Page 12: PII: S0022-2836(03)00057-3

read as tryptophan in Mycoplasma capricolum. Proc.Natl Acad. Sci. USA, 82, 2306–2309.

8. Kawaguchi, Y., Honda, H., Taniguchi-Morimura, J. &Iwasaki, S. (1989). The codon CUG is read as serinein an asporogenic yeast Candida cylindracea. Nature,341, 164–166.

9. Meyer, F., Schmidt, H. J., Plumper, E., Hasilik, A.,Mersmann, G., Meyer, H. E. et al. (1991). UGA istranslated as cysteine in pheromone 3 of Euplotesoctocarinatus. Proc. Natl Acad. Sci. USA, 88,3758–3761.

10. Lozupone, C. A., Knight, R. D. & Landweber, L. F.(2001). The molecular basis of nuclear genetic codechange in ciliates. Curr. Biol. 11, 65–74.

11. Caron, F. & Meyer, E. (1985). Does Parameciumprimaurelia use a different genetic code in its macro-nucleus? Nature, 314, 185–188.

12. Helftenbein, E. (1985). Nucleotide sequence of amacronuclear DNA molecule coding for alpha-tubu-lin from the ciliate Stylonychia lemnae. Special codonusage: TAA is not a translation termination codon.Nucl. Acids Res. 13, 415–433.

13. Horowitz, S. & Gorovsky, M. A. (1985). An unusualgenetic code in nuclear genes of Tetrahymena. Proc.Natl Acad. Sci. USA, 82, 2452–2455.

14. Baroin-Tourancheau, A. B., Tsao, N., Klobutcher,L. A., Pearlman, R. E. & Adoutte, A. (1995). Geneticcode deviations in the ciliates: evidence for multipleand independent events. EMBO J. 14, 3262–3267.

15. Schneider, S. U., Leible, M. B. & Yang, X.-P. (1989).Strong homology between the small subunitof ribose-1,5-bisphosphate carboxylase-oxygenase oftwo species of Acetabularia and the occurrence ofunusual codon useage. Mol. Gen. Genet. 218, 445–452.

16. Keeling, P. J. & Doolittle, W. F. (1996). A non-canoni-cal genetic code in an early diverging eukaryoticlineage. EMBO J. 15, 2285–2290.

17. Keeling, P. J. & Doolittle, W. F. (1997). Widespreadand ancient distribution of a non-canonical geneticcode in diplomonads. Mol. Biol. Evol. 14, 895–901.

18. O’Sullivan, J. M., Davenport, J. B. & Tuite, M. F.(2001). Codon reassignment and the evolving geneticcode: problems and pitfalls in post-genome analysis.Trends Genet. 17, 20–22.

19. Ribas de Pouplana, L. & Schimmel, P. (2001). Amino-acyl-tRNA synthetases: potential markers of geneticcode development. Trends Biochem. Sci. 26, 591–596.

20. Syvanen, M. (2002). Recent emergence of the moderngenetic code: a proposal. Trends Genet. 18, 245–248.

21. Brugerolle, G. & Lee, J. J. (2000). Order Oxymona-dida. In The Illustrated Guide to the Protozoa (Lee, J. J.,Leedale, G. F. & Bradbury, P., eds), 2nd edit.,pp. 1186–1195, Society of Protozoologists, Lawrence,KA.

22. Dacks, J. B., Silberman, J. D., Simpson, A. G., Moriya,S., Kudo, T., Ohkuma, M. & Redfield, R. J. (2001).Oxymonads are closely related to the excavate taxonTrimastix. Mol. Biol. Evol. 18, 1034–1044.

23. Moriya, S., Tanaka, K., Ohkuma, M., Sugano, S. &Kudo, T. (2001). Diversification of the microtubulesystem in the early stage of eukaryote evolution:elongation factor 1 alpha and alpha-tubulin proteinphylogeny of termite symbiotic oxymonad andhypermastigote protists. J. Mol. Evol. 52, 6–16.

24. Moriya, S., Ohkuma, M. & Kudo, T. (1998). Phylo-genetic position of symbiotic protist Dinenympha[correction of Dinemympha ] exilis in the hindgut ofthe termite Reticulitermes speratus inferred from the

protein phylogeny of elongation factor 1 alpha.Gene, 210, 221–227.

25. Hollande, A. & Carruette-Valentin, J. (1970). Lalignee des pyrsonymphines et les caracteres infra-structuraux communs aux genres Opisthomitus, Oxy-monas, Saccinobaculus, Pyrsonympha, et Streblomastix.Compt. Rend. Acad. Sci. ser. D, 270, 1587–1590.

26. Kidder, G. W. (1929). Streblomastix strix, morphologyand mitosis. Univ. Calif. Publ. Zool. 33, 109–124.

27. Kofoid, C. A. & Swezy, O. (1919). Studies on theparasites of the termites. I. On Streblomastix strix, apolymastigote flagellate with a linear plasmodialphase. Univ. Calif. Publ. Zool. 20, 21–40.

28. Keeling, P. J. (2002). Molecular phylogenetic positionof Trichomitopsis termopsidis (Parabasalia) and evi-dence for the Trichomitopsiinae. Eur. J. Protistol. 38,279–286.

29. Brugerolle, G. (1991). Flagellar and cytoskeletalsystems in amitochondrial flagellates Archamoebae,Metamonada and Parabasalia. Protoplasma, 164,70–90.

30. Brugerolle, G. & Taylor, F. J. R. (1977). Taxonomy,cytology and evolution of the Mastigophora. InProceedings of the Fifth International Congress ofProtozoology (Hutner, S. H., ed.), pp. 14–28, PaceUniversity, New York.

31. Cavalier-Smith, T. (1998). A revised six-kingdom sys-tem of life. Biol. Rev. 73, 203–266.

32. Cohen, J. & Adoutte, A. (1995). Why does the geneticcode deviate so easily in ciliates? Biol. Cell. 85,105–108.

33. Caskey, C. T., Tompkins, R., Scolnick, E., Caryk, T. &Nirenberg, M. (1968). Sequential translation oftrinucleotide codons for the initiation and termi-nation of protein synthesis. Science, 162, 135–138.

34. Scolnick, E., Tompkins, R., Caskey, T. & Nirenberg,M. (1968). Release factors differing in specificity forterminator codons. Proc. Natl Acad. Sci. USA, 61,768–774.

35. Craigen, W. J., Lee, C. C. & Caskey, C. T. (1990).Recent advances in peptide chain termination. Mol.Microbiol. 4, 861–865.

36. Dontsova, M., Frolova, L., Vassilieva, J., Piendl, W.,Kisselev, L. & Garber, M. (2000). Translation termi-nation factor aRF1 from the archaeon Methanococcusjannaschii is active with eukaryotic ribosomes. FEBSLetters, 472, 213–216.

37. Frolova, L., Goff, X. L., Rasmussen, H. H.,Cheperegin, S., Drugeon, G., Kress, M. et al. (1994).A highly conserved eukaryotic protein familypossessing properties of polypeptide chain releasefactor. Nature, 372, 701–703.

38. Inagaki, Y. & Doolittle, W. F. (2000). Evolution of theeukaryotic translation termination system: origins ofrelease factors. Mol. Biol. Evol. 17, 882–889.

39. Inagaki, Y. & Doolittle, W. F. (2001). Class I releasefactors in ciliates with variant genetic codes. Nucl.Acids Res. 29, 921–927.

40. Hayashi-Ishimaru, Y., Ohama, T., Kawatsu, Y.,Nakamura, K. & Osawa, S. (1996). UAG is a sensecodon in several chlorophycean mitochondria. Curr.Genet. 30, 29–33.

41. Beier, H. & Grimm, M. (2001). Misreading of termi-nation codons in eukaryotes by natural nonsensesuppressor tRNAs. Nucl. Acids Res. 29, 4767–4782.

42. Grimm, M., Nass, A., Schull, C. & Beier, H. (1998).Nucleotide sequences and functional characteri-zation of two tobacco UAG suppressor tRNA(Gln)

1348 A Non-canonical Genetic Code in Streblomastix strix

Page 13: PII: S0022-2836(03)00057-3

isoacceptors and their genes. Plant Mol. Biol. 38,689–697.

43. Ibba, M. & Soll, D. (2000). Aminoacyl-tRNA syn-thesis. Annu. Rev. Biochem. 69, 617–650.

44. Brown, J. R. & Doolittle, W. F. (1999). Gene descent,duplication, and horizontal transfer in the evolutionof glutamyl- and glutaminyl-tRNA synthetases.J. Mol. Evol. 49, 485–495.

45. Freist, W., Gauss, D. H., Ibba, M. & Soll, D. (1997).Glutaminyl-tRNA synthetase. Biol. Chem. 378,1103–1117.

46. Curnow, A. W., Hong, K., Yuan, R., Kim, S., Martins,O., Winkler, W. et al. (1997). Glu-tRNAGln amido-transferase: a novel heterotrimeric enzyme requiredfor correct decoding of glutamine codons duringtranslation. Proc. Natl Acad. Sci. USA, 94,11819–11826.

47. Lehman, N. (2001). Molecular evolution: pleaserelease me, genetic code. Curr. Biol. 11, R63–R66.

48. Moreira, D., Kervestin, S., Jean-Jean, O. & Philippe,H. (2002). Evolution of eukaryotic translation

elongation and termination factors: variations ofevolutionary rate and genetic code deviations. Mol.Biol. Evol. 19, 189–200.

49. Trager, W. (1934). The cultivation of a cellulose-digesting flagellate. Trichomonas termopsidis, and ofcertain other termite protozoa. Biol. Bull. 66, 182–190.

50. Kuznetsov, S. A., Langford, G. M. & Weiss, D. G.(1992). Actin-dependent organelle movement insquid axoplasm. Nature, 356, 722–725.

51. Leander, B. S., Witek, R. P. & Farmer, M. A. (2001).Trends in the evolution of the euglenid pellicle.Evolution, 55, 2215–2235.

52. Strimmer, K. & von Haeseler, A. (1996). Quartetpuzzling: a quartet maximum-likelihood method forreconstructing tree topologies. Mol. Biol. Evol. 13,964–969.

53. Bruno, W. J., Socci, N. D. & Halpern, A. L. (2000).Weighted neighbor joining: a likelihood-basedapproach to distance-based phylogeny reconstruc-tion. Mol. Biol. Evol. 17, 189–197.

Edited by J. Karn

(Received 24 October 2002; received in revised form 2 January 2003; accepted 3 January 2003)

A Non-canonical Genetic Code in Streblomastix strix 1349