Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

11

Click here to load reader

Transcript of Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

Page 1: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

VIROLOGY 172,535546 (1989)

Nucleotide Sequence of the Genome of an Australian Isolate of Turnip Yellow Mosaic Tymovirus

PAUL KEESE,’ ANNE MACKENZIE, AND ADRIAN GIBBS’

Research School of Biological Sciences, Australian National University, Canberra, A. C. T. 260 1, Australia

Received May 27, 1988; accepted May 22, 1989

The nucleotide sequence of the Club Lake isolate of turnip yellow mosaic virus (TYMV-CL) genomic RNA has been determined. The genome is 6319 nucleotide residues in length and has three major open reading frames (ORFs), two of which overlap. The smallest ORF is proximal to the 3’terminus and encodes the virion protein gene, which has 98o! sequence similarity with the virion protein gene reported for the type strain of TYMV. The largest ORF is from nucleotide residues 96 to 5630, and encodes a protein some parts of which show sequence similarities to the possible RNA replicases and nucleotide binding proteins of other viruses. The third ORF is from nucleotide residues 89 to 1975 and overlaps the 5’ end of the largest ORF in a manner similar to that found in several animal viral genomes. The function of the protein encoded by this ORF is unknown. The genomes of tymoviruses have, characteristically, an unusually large cytosine content and small guanosine content. This compositional bias is mirrored in the codon and dinucleotide frequencies of the TYMV-CL genome, but is only partially reflected in the amino acid sequences encoded by the genome. 0 1989 Academic Press, Inc.

INTRODUCTION

Turnip yellow mosaic virus (TYMV) is the type mem- ber of the tymovirus group of plant viruses. All mem- bers have small, icosahedral particles that contain a positive sense single-stranded RNA genome of about 6000 nucleotide residues. The tymoviruses are distin- guished from viruses of other groups by the character- istic vesicles they induce at the periphery of chloro- plasts of infected plants (Hatta and Matthews, 1974) and by having a large genomic cytosine content of 34- 42% (Symons er al., 1963; Gibbs eta/., 1966).

TYMV causes a mosaic disease in domesticated and wild species of the Brassicaceae in Europe and has been found in the wild in Australia. The Australian iso- lates ofTYMV have only been found in a sward-forming endemic plant, Car&mine /dacha Hooker, that is re- stricted to glacial cirques of the Kosciusko alpine re- gion of Australia (Guy and Gibbs, 1981). The particle composition and antigenic specificity of all Australian isolates of TYMV is closest to the group 1 strains of TYMV that include the type strain (Paul era/., 1980).

The genomic RNA of TYMV-type strain has a 5’ m7G cap structure (Klein et al., 1976; Pleij et al., 1976) and a 3’ valine-accepting tRNA-like structure (Giega et a/., 1978), and yields two major translation products in rab- bit reticulocyte lysates (Mellema et a/., 1979). The larger M, 195,000 product is produced in vitro by trans-

Sequence data from this article have been deposited with the EMBUGenBank Data Libraries under Accession No. J04373.

’ Present address: CSIRO Division of Plant Industry, Black Moun-

tain, Canberra, A.C.T., 2601, Australia. ’ To whom reprint requests should be addressed.

lation of an open reading frame that uses over 90% of the genome. The smaller product, n/l, 150,000, is the same as the N-terminal portion of the larger protein, and is thought to arise from a functionally significant but unidentified leaky translation termination signal. Only the 195K protein undergoes post-translational cleavage (March and Benicourt, 1980), and one of the products is associated with viral replicase activity found in infected Chinese cabbage (Mouches et a/., 1984).

The virion protein gene at the 3’ end of the genomic RNA is expressed from a 5’ m7G-capped subgenomic RNA, not directly from the genome. Its nucleotide se- quence encodes a 189 amino acid residue protein (Gu- illey and Briand, 1978). The sequence of the 5’proximal 1 10 nucleotide residues of TYMV type strain has also been reported (Briand et al., 1978).

We report in this paper the complete genomic se- quence of TYMV-CL, an Australian isolate of TYMV from the vicinity of Club Lake in the Kosciusko alpine area. Since this paper was prepared the genomic se- quence of a European isolate of TYMV has been re- ported (March eta/., 1988). A report comparing this se- quence with those of TYMV-CL and another Australian isolate of TYMV is in preparation (A. D. Meek, personal communication).

MATERIALS AND METHODS

The enzymes that were used included avian myelo- blastosis virus reverse transcriptase (Life Sciences), RNase H and calf intestinal phosphatase (Boehringer- Mannheim), T4 polynucleotide kinase (New England

0042-6822189 $3.00 Copyright 0 1999 by Academtc Press. Inc All rights of reproduction tn any form rcOerved.

536

Page 2: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

TYMV-CL GENOME SEQUENCE 537

Biolabs), RNA ligase (Pharmacia), bacteriophage T4 DNA ligase, fscherichia co/i polymerase I and Kknow fragment (Bresatec), tobacco acid pyrophosphatase (Promega Biotec), ribonucleases A and Tl (Sigma Chemical Co.), RNase U2 (Sankyo), and various restric- tion endonucleases (New England Biolabs and Boeh- ringer-Mannheim), Phy M RNase was prepared and kindly provided by Dr. Jim Haseloff from culture super- natants of Physarum polycephalum. The radioisotopes of [a-32P]dATP (3000 Ci/mmol), [a-32P]dCTP (3000 Ci/ mmol), and [-y-32P]ATP (>5000 Ci/mmol) were pur- chased from Amersham. The TYMV-CL-specific oligo- deoxynucleotide primers were very kindly synthesized and provided by Dr. Jan Blok and Karin Harrison.

TYMV-CL was propagated in Chinese cabbage (Brassica chinensis, var. pekinensis (Wong Nga Baak). Viral RNA was extracted from particles purified as de- scribed previously (Gibbs et a/., 1966; Guy and Gibbs, 1985). Complementary DNA (cDNA) was synthesized by the method of Gtibler and Hoffman (1983) using a synthetic primer, 5’-labeled using T4 polynucleotide ki- nase and ATP. The primer was complementary in se- quence to the 3’ end of the RNA (5’-dTGGTTCCGAT- GACCCTC-3’), and partially hydrolyzed salmon sperm DNA (Taylor et al., 1976) was used as a random primer for first strand cDNA synthesis with reverse transcrip- tase. The double-stranded DNA was hydroloyzed with restriction endonucleases Alul, Haelll, and Rsal. The Alul fragments were ligated into the Smal hydrolyzed site of pGEM 1 and transformed into competent cells of E. co/i JMl 01. The recombined pGEM1 clones were transferred into Ml 3mpl8 and 19 and sequenced by the dideoxynucleotide chain termination method (Sanger era/., 1980). The Haelll and Rsal restriction en- donuclease fragments were fractionated and purified by electrophoresis in a 5% polyacrylamide gel, then they were cloned into the Smal site of Ml 3mpl8. Some Ml 3mpl8 recombinant clones were obtained in the opposite orientation by subcloning them into the EcoRI-i-lindlll site of Ml 3mpl9.

The sequences of the recombinant M 13 clones were compiled in a computer and checked for sequence ho- mology. Most of them unequivocally formed parts of longer concatenations. The sequences joining these longer sequences were determined using synthetic oli- gonucleotide primers complementary to terminal parts of those sequences. Primers with the sequences 5’-dCGCCAATGTTGCCTC-3’, 5’-dGGAGGTG-ITGCTG- GA-3’, 5’-dGTGAACGGAGCCTCG-3’, 5’-dCCCTC-FTG- GCATTTTG-3’, 5’-dTCCCTGCCAGGCATT-3’. and 5’. dGTGTCTTGCCGCAGT-3’ were annealed to genomic RNA and cDNA was synthesized on them using re- verse transcriptase. This double-stranded cDNA was hydrolyzed with either Sau3Al, Taql, Mspl, Haelll, Alul or Rsal restriction endonucleases and ligated into ap-

propriate Ml 3 vectors. These six synthetic primers were also used for direct sequencing of genomic RNA by the dideoxynucleotide chain termination method (Ou eta/., 1981).

The synthetic oligodeoxynucleotides, 5’-dAAGITT- GAGTTCACCTGG-3’ and 5’-dGTAATCAACTACCAAT- TC-3’, were phosphorylated using T4 polynucleotide ki- nase and ATP, and used for synthesizing cDNA and its complement corresponding to the 5’ terminal 424 nucleotide residues of TYMV-CL genomlc RNA. This double-stranded DNA was then ligated into the Smal site of M 13mpl8 and used to infect competent JM 101 cells. The DNA obtained was sequenced by the di- deoxynucleotide chain termination method.

The 5’ and 3’ 40 nucleotide residues of TYMV-CL- end-labeled genomic RNA were sequenced by direct RNA enzymatic sequencing (Haseloff and Symons, 1981). The 5’ methyl guanosine cap of the TYMV-CL genome was removed using tobacco acid pyrophos- phatase, it was then treated with calf alkaline phospha- tase as directed by the suppliers, and finally its 5’termi- nus was labeled using [T-~‘PIATP and polynucleotide kinase. The 3’ end was labeled with 3’pCp using RNA ligase (England and Uhlenbeck, 1978).

The sequences were compiled by the Staden (1982) library of computer programs for “shotgun sequenc- ing”, and analyzed using the SEQ program package of the Research School of BiologIcal Sciences in a VAX 1 l-750.

RESULTS AND DISCUSSION

TYMV-CL sequence

The sequence of TYMV-CL genomic RNA was deter- mined by sequencing short overlapping cDNA clones in M 13. Each element of the sequence was determined from two or more clones in each orientation; a total da- tabank of about 70,000 nucleotide residues. The 5’and 3’ terminal nucleotide sequences were confirmed by direct RNA enzymatic sequencing. Figure 1 shows the sequence of the 6319 nucleotide residues that com- prise the genome.

Sequence heterogeneity was found at positions 886, 2219, 2275, and 3071, but at two of these, positions 886 and 2275, the changes give rise, somewhat sur- prisingly, to nonconservative amino acid differences, Where two cDNA clones differed in nucleotide se- quence, additional cDNA clones of the same region were also examined. The nucleotide residue differ- ences given at the above four positions are those in which both alternatives were found in two or more clones, and hence probably are viable population se- quence variants of TYMV-CL. By contrast the se- quence differences that were identified in only one cDNA clone may have been random unselected viral

Page 3: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

1 120 m7Gp~~GUAAUCAACUACCAAUUCCAGCUCUCUUUUGACCUGGUCUUAUACC~CUUUCCGUACACUUGC~CCCUCGU~GAC~UUGC~~GAGU~UGG~CUUCC~UUAGCAUUGGACG

RP MAFQLALD OP MSNGLPISIGR

240 CCCVVGCAC~CACGACUCA~AGAGAUCCC;CUCUGCAVC~GAVVCVCG~VCCACAGVA~AVVCGAVVC~CVCCVCGAV~CAGACC"AC~CAVGGVCCA~VCCG~GG~CVVC"GCCCC

RPALAPTTHRDPSLHPILESTVDSIRSSIQTYPWSIPKELLP OP P CTHDSQRSLSASDSRIHSRFDSLLDTDLPMVHSEGTSAP

360 VACVCAACV~C"ACGGCAV~CCAACAVCV;jGUUUGGGM~AVCCCACCA~CCCCACGCC~CCCACMGA~~VCGAGAC~VVVCVCCVV~GCACCCACV~GVCVVVCCA~GCCACCAC"C

RPLLNSYGIPTSGLGTSHHPHAAHKTIETFLLCTHWSFQATT OP TQLLRHPNIWFGNIPPPPRRPQDNRDFSPLHPLVFPGHHS

(8) 01)

148) 1511

(88) (91)

480 CCAGCVCCG;CA"GVVCAU;;AAACCCAGC~GVVCMC~CVVGCCCA~GUGMCVC~CVVVCGGG~VVGMG~~VACCGCCVG~ACCCCMCG~CAGCACVCG~"ACCCCVVCA

RF P SSVMFMKP SKFNKLAQVNSNFRELKNYRLHPNDSTRYPF (128) OP QLRHVHETQQVQQTCPGE LKLSGIEELPPAPQRQHSLPLH(131)

600 CA"CACCAG~CC"VCCCG";V"CCCCACC~VVUVCAVGC~CGACGCCCV~AUGVACVAC~AVCCC"CCC~GAVCAUGGA~CVGVVC""G~AG~CC~CCVCGMCG~CVGVACGCCA

RPTSPDLPVFPTIFMHDALMYYHPSQIHDLF LQKPNLERLYA (1681 OP ITRPSRFPXHFHARRPDVLPSLPDHGPVLAETKPRTSVRQfl71)

720 GCCUCG"AG~ACCACCCGA;GCCCAVCVV~CCGACCMV~CVVCVVCCC~MGUVGVAC~CGVACACGA~GACCCGCCA~ACVCV"CAC~ACG"CCCGG~GG"CACG~GCCGGCAGC" SLVVPPEAHLSDQSFFPKLYTYTTTRHTLHYVPEGHEAGS 1.208)

P R S TTRGPSFRPILLPEVVHVHDDPPHSSLRPGRSRSRQL(211) RP OP

RP CJP

840 RCAACCMCCA”CCGACGCCCAC”C”“GGC”CC~”CM””CM””CGCC”CGGCMCCACCACC”C”CAG”GACGA”CC”GG~“CC”GGGGCCC”G”CCAC”CGC”CC”M”“CMC YNQPSDAHSWLRINSIRLGNHH LSVTILESWGPVHSLLIQ 1248)

RP OP

Q P T I RRPLLAPNQFNSPRQP PPLSDDPGILGPCPLAPNSTtZSl)

v 960 GAGGGACCCCCCCCCCCGACCCA"CAC"CCAGGCCCC""C~CACCCA"GGCG"CCGACC"CV""CGG"C""ACC~GAGCCCCGCC"CGACG"GG"C"CC""CCG~"CCCAGACGCCA RGTPPFDPSLQAPSTPMASDLFRSYQKPRLDVVSFRIPDA (288)

RDPPPRPITPGPFNTHGVRPLSVLPRAPPRRG L L P N P R R H (291)

RF OP

1080 VCGMCVVC~ACAGGCCAC~UVCCVVC~~~CCGC"VC~AGACCGACVGGVCCCCCGA~CCGVCVAC~CGCCCVGVVCACCVACACC~GAGCGGVCC~CACACVCCGGACVVCAGACC I ELPQATFLQQPLRDRLVPRAVYNALFTYTRAVRTLRTSD (328)

RTSTGHIPSTTASRPTGPPSRLQRPVHLHQSGPHTPDFRP(33l)

RP OP

1200 CAGCGGCAVirCGVAAGGAV~CA""CC"CC~CCGGACC~CGAVVGGGVCACCVCGMC~CCVGGGAC~VCVGCAGACCVVCGCACVV~UG~CGVAC~CCVVCGACC~CGVCGVCV PAAFVRMHSSKPDHDWVTSNAWDNLQTFALLNVP LRPNVV (368)

SGIRKDAFLQTGPRLGHLERLGQSAD L R T S E R T P S T K R R L(371)

RP OP

1320 ACCACG"CC;"CAGAGCCC~U"GCCUCC~"AGCVCVUU~CCVGAGGCMCAVVGGCGC~G"C"UACCGCCACCGCCGUUCCCAVCC"CVCC""CCV~CCCUCCUGCAGCGCVVCCUCC YHVLQSPIASLALYLRQHWRRLTATAVPILSFLT LLQRFL (408)

PRPSEPNCLPSSLPEATLAP SYRHRRSHP L L P N P F A A L P P 1411)

RP OP

1440 CAVVGCCVAVACC"CUGGCAGAGGV~VCCAVCACAGCCVVCCGMGGGAGC"CVACCG~GMGGCCCCCCACCACCCCCVCGACGVCVVCCAVCVCCAGC~CACCVCCGC~VC PLPIPLAEVKSITAFRRELYRKKAPHHPLDVFHLQQHLRN (448)

IAYTSGRGKIHHSLPKGALPKEGPPPPPRRLPSPATPPQS(451)

RF OF

RF OP

1560 ACCACVCCG~GA"CVCGGCCG"ACGCCCA~CVVCCCCAC~CCACC~GACVVCCACACGCGCVCCAG~GCVGCAVVGCVGCVCCVC~GACCGAVAVCGCCCCVCV"GACAGCGACCC HHSAISAVRPASP PHQRLPHALQKAALLLLRPISPLLTAT (488)

P LRDLGRTPSFP TPPKTSTRAPESCIAAPPTD I A P L D S D P (491)

1680 CGVVCVVVC~G"CCGAnCA~~GVCCAVG~VCCCGMCG~CGMC"VVCAVGGACCCVG~GCGCVVCG~GCVGCCVVGGCAGGCCVCC~VAGVCCVCC~CVCVCVGVCGG~VCAVCCG P FFRSEQKSMLPNAELSWTLKRFALPWQASLVLLSLSESS (528)

VLSVRTEVHAPERRTFMDPEALRAALAGLPSPPLSVGIIR(531)

1800 VACVGCVVC~CAAACVGVVCVCCCCACC~CVCVCCMGCCCMCACGACACCVACCACCGACA"CVVCACCCVGGAVCCVACAGVCVCCAGVGGGAGAGGACGCCAV"GVCGA"VCCGA

RPVLLHKLFSPFTLQAQHDTYHRHLHP GSYSLQWERTPLSIP OP TASQTVLP TNSPSPTRHLPPTSSPWILQSPVGEDAIVDSE

1920 GGACGACAG~AVVUCUUCCV"VCAC"CCCACGACVVCMCAGCCCCVCCGGACCACVCCGMGCCAGVC"CCCVCCCGCVV"CGCC"CCACCVCCGVVCCCCGVCCACCVCCAGVGGCAV

RPRTTAFLPFTP TTSTAPPDHSEASLPPAFASTSVPRPPPVA OP DDSISSFHSHDFNSP SGPLRSQSPSRFRLHLRSP STSSGI

(568) 1571)

(6081 1611)

2040 CGAGCC""G~AGccCAGCC;CC"ACGAC"ACGGCAGCGCCCCCGACACCGA"VGMCCCACCCAGCGCGCVCAVC~VVCVGACCVCACGCVVG~~VVCMCCCC~VVGMCCCC

RPSSLGAQPPTTTAAPPTPIEPTQRAHQNSDLTLESSTPIEP OP EPWSPASYDYGSAPDTD'

2160 CCCCACCCC~CA"CCMVC~VCCGACAVC~CGCCVVCCG~CCCCGVVCVVVVCCCA~VCMCVCACCGCAVCGVVV~VCCCCC~~UVCCCACCA~ACCCGAVVVCGMCCCACCC

RP P PPPIQSSDIPPSAPVLFPEI NSPHRFSPKLPTTPDFEPT

A. C 2280 GCACVVCAC~CCC”CCVVCtACVUCGCA”~MGAVVCGA~VGACCCCGC~GACCCCCVGAVGGGCVCCC~CCVVCVGCACCAVVCAC”A~CVGCACCVC~CACCCACCC~CVVC~VCVV

RPRTSPPPSTSHQDSTDPADP LNGSHLLHHSLPAPPTHPLQS

2400 CACAGC"CV~GCCCGCACC~VVGAC~C~ACCCCACCG~GAVCGGCCC~GVACVCCCC~VVGMGMC~CCACCCACGCAGGVACCCCG~CACCG~CACVVVCCVCACGAGGCVCC

RPSQLLPAPLTNDPTAIGPVLP FEELHPRRYP ENTATFLTRL

2520 GVVCACVVC~VVC~CCA;CVACCACM~CCACCC"G~VVGVCVCCV~"CVGCVGVC~CCGAC~~CMGGVV"C~GAGGAVCAC~VCVGGGAGV~CCVACAGAC~VVCVCCCAG

RPRSLPSNHLPQPTLNCLLSAVSDQTKVSKDHLWESLQTTLP

2640 ACAGCCMC~CAGGMCG~GA~“C~C~CVCVCGGGC~””C~CVG~CACCVCACV~CGVVGGCCC~VCVVVACM~VVCCAGGC~CCAVCVACV~CGAVCGVGG~CCCAVCCVCV

RPDSQLRNEEI NSLGLS TEHLTALAHLYNFQATI YSDRGFIL

2760 "CGGCCCA"~CGACACCAV~MGAG~VC~ACAVCACCC~CACCACCGG~CCGCCAVCC~ACVVVVCAC~CGGC~GACVVVVAGGC~GCCMCCCV~AGCVMGGG~CAVCCCVCCG

RPF G P SDTIKRIDITHTTGPP S H F S P GKRLLGSQPSAKGHPS

2880 AC"CAC"CA~CAGAGCCAV~MGVCVVVC~GVAVCCG~C~CVACCV~CCCVVCVCV~AGGCCCAC~CCAVCCCAC~VCCAVCVCA~AVGCC~G~CVVGGVVVC~CAVG~GA

RF D s L I R A H K s FKYSGNYLPF SEAHNHPTSI SHAKNLVSNMK

(648) (628)

(688)

(728)

(768)

(808)

(848)

(8881

(928)

FIG. 1. Nucleotide sequence of TYMV-CL genomic RNA together with the encoded amino acid sequences of the three major ORFs. Dots over the sequence occur every ten bases. RP is a postulated replicase polyprotein; OP is a protein encoded by a gene overlapping that of the RP; and VP is the virion protein. This nomenclature is used because equivalent proteins encoded by different tymoviral genomes are of slightly different M, (Ding er al., 1989; Osorio-Keese eta/., 1989).

Page 4: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

3000 A"GGAuucG~cGGcA"ccU~"cCcUUC"c~AcGUcUCCA~AGGccMCG~CCGGACCC~CCCCC~G~CGCGAUCAU~CAGAuAGACCACUACCUCGACACC~CCCCGGC~CCA

RPNGFDGILSLLDVSTGQRTGPTPKDAI IQIDHYLDTNP G K T (968)

.C 3120 CCCCUGUGG;GCAU"UUGC;GGUUuCGCUGGCUGUGG~GACAUAUCC~AUCC~CAG~UCCUU-CU~CUGUU~~GACUUU~GGGUCUCCU~CCCCACCAC~G~CUCAG~

RFTPVVHFAGFAGCGKTYPIQQLLKTKLFKDFRVSCPTTELR (1008)

3240 CCGAA"GGA;~GACUGCGAU~G~C"UCA"~GC"CCCAG"~A"GGCGCU"~MCACUUGG~AGUCUUCCA~UCUCMGUC~UCCAGMUUCUGGUCAUCGAUG~UCUAC-~GCCM

RP T E W R TAHELHGSQSWRFNTWESSILKSSR ILVIDEIYKMP (1048)

3360 GAGGGUACC;CGACC"UUCCAUUCUCGCUGACCCCGCCC;UCACCGCCUUC

RPR G Y LDLSILADPALELVIILGDP LQGEYH s Q s K D s s N H R L (loas)

3480 c~U~cG~cvcuCAGGcuGcvAccAUAc~u~~AcAuGv~~~GcuGGuGGAGv~AucGcAv~Cc~c~uGvAucGcCcGACvcuuCc~uucAC*Gcuuc~uGccuGGcAGGG~ucA

RP P S E T L R L L P Y I D M Y C WWS Y R I P Q C I AR LF Q I H S F N AWQ C, I (1128)

3600 UCGGCUCCGUUUCAACUCCCCAGGAUCAA;CCCCCGUUC~CACC~CAGUCAUGCCUCAUCUCUCACCUUC~CAGCCU~GGAUAUCGCUCCUGCACGAUCAGCUCUAGCC~GGCCUCA

RP I G S " S TPQDQSPVLTNSHASSLTFNSLGYRSCTISSSQGL (1168)

3720 CAUUCUGCGACCCUGCCAUCAUCGUCCuGGACAACvACACCAAGUGGCUCUCCUCGGCC~CGGCCUCGUCGCCCvCACCCGAUCCAGA~CAGGUGUCCAAUvCAUGGGCCCCvCUUCCU

RPT F CDPAII "LDNYTKWLS SANG ,,"A I. T II S R S G V Q F M G P I; S (1208,

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

RP

VP

VP

VP

VP

VP

3840 ~vGucGGGGGRACCAACGGCUCUUCUGCCAUGUUUUCUG~CGCCuUC~C~CAGCCUCAUCAUCAUGGAUCGCuACUUCCCAUCCCUGUUCCCACMCUc~GCUCAUCACCU~~CCCCC Y " G G T N G S S AM F SDAFNNSLI IMDRYFPS L F P Q L K L I T S P (1248)

3960 "CA~~~~~~~AG~~~~~C"C~CGGG~CCACCCCCA~CGCAUCUCC~ACCCAUCGC~CGCC~CU~CCACCUCCC~CCACACAU~~CCC~CUCUU~~GA~~G~GA~~~~GUCA~GG LTTRSPKLNGATPSASPTHRSPNFHLPPHIPLSYDRDFVT (1288)

4080 "~AACCCAAC"CUCCC"GAUCAGGGACCCGAAACAAGAC;CGACACCCAC""CCUCCCACCUUCUCGGCUCCCGCUUCA;UUCGAUCUC~CACCAGCUAUCACCCCCCC~CCGAUUUCCR "NPTLPDQGPETRLDTHFLPPSRLPLHFDLPPAITPPPIS (1328)

4200 CRAGCGUCGACCCGCCACAAGCUAGC"~GCCCCGUCUAUCCAGGCGAiUUCUUCGAU;CUCUGGCGGCGUUCUvC"U~CCAGCACAC~ACCCAUC~~~GGGMGU~CUCCACAAAG TSVDPPQAKASPVYP GEFFDS LAAFPLPAHDPSTREVLHK (1368,

4320 A"~AA"CUA~CAACCAGUU~CC"UGG""C~ACCGACCC"~CAGCUUG"C~~GCCAGCCC~C~GUUU~~"UC~GCC~~CAUGCACCC~CCACGA"C~GACCCUUCU~CCUGCCUCCA D Q s SNQFPWFDRPFS LSCQFSSLISAKHAPNHDPTLLPAS (1408)

4440 UCAA"AAACGCUUGCGAUUlAGACCCAGu~~GCACCGC~CC-UCAC~GCAGACGAC~UGGUCCUAGGCCUGC~CUCUUCCACUCUCUCUGCCGCGCCUACuCACG;CAACCCAACA INKRLRFRPSEAPHQITADDVVLGLQLFHSLCRAYSRQPN (1448)

4560 UCACCGUUCCAUUCAACCCUGAACUUUUCGCAGAAUGVAUCUCUCUG~UG~UACGCG~AGCUCAGUUCC-CCC~UCCACCAUA~UGGCC~CGCUUCACGCUC~GACCCAGACU ITVPFNP ELFAECISLNEYAQLSSKTQSTIVANAS R S D P D (14881

4680 GGCGACACACCACCGUCAAGAUUUUUGCGAAAGCUCMC~C-GUC~~GACGGCUCC~UCvUCGGuU~AUGG~GGC~UGCC-CU~UCGCACUCAUGCAUGAuUA~GU~UUC"GG WRHTTVKI FAKAQHKVNDGSIFGSWKACQTLALMH D Y v I L (15281

4800 UUCUUGGAC~CGVCAAGAAAVAUC~G~UCU~CGAC~CGUUGAUCGGCCAUCUCAC~UCUACUCAC~CUGCGGC~~ACACCC~C~~C~VCGAGAU~GGUGCCAGG~~A"C~CA V L G P "KKYQRIFDNVDRP SHIYSHCGKTP NQLRDWCQEHL (1568)

4920 CUCAUUCCACCCCARAAAU~GC~CGAC~ACACCGCCU~CGACC~VC~CAGCAVGGA~MUCCGVGG~UCUVGMGCCC~C~UG~GAGACUG~CAU~CCGAG~CAU~~UGAVVC THSTPK IANDYTAFDQSQHGESVVLEALKMKRLNIPSHLI (1608,

5040 AGCUCCA~G~CCACCUCAA~ACCAACGVC~CCACCCAG~~CGGCCCCCU~ACAUGCA~G~GCCUGACCG~GGMCC~GG~CCUACGAC~AC~~ACUG~C~AC~C~~~GCA~;~CA~C~ QLHVHLKTNVSTQFGP LTCMRLTGEPGT Y D D N T D Y N LA" I I16481

5160 ACvCUCAGUAUGACGVUGGUUCCVGCCCCAVCAUGGVCUCUGGCGACGACUCACUCAuAGACCACCcVCUUCCCACucGcCACGAcUGGcccucvGuvcuc~cGccuccAc(~uccGCv Y SQYDVGSCPIMVS G D D SLIDHPLP TRHDWPSVLKRLHLR (1688)

5280 UUAAAC~~GAA~VCAC~~~~~A~~CCCUC~UVVGUGGCU~CVACGUCGG~CCAGCAGGC~GCAUCCGC~CCCC~~GGC~CUVV~CUGC~GCUCAUGA~CGCAGUGGA~GA~~~A~GC~C F K L E LTSHPLF CGYYVGPAGCI RN P LA I. F c K L M I AVDDDA (1728)

5400 UCGACGACCGACGACUCAGCVACCUCACCGAGVUCACCACCGGACACCUCCU~GGCG~UCACUAUGGCACC~CC~~~~~G~CC~AC~~~~AG~A~~~G~~AG~~~G~~~~~;A~~~~~ LDDRRLSY L T E F T T G H LLGESLWHLLPETHVQYQSACFDF (1768)

5520 ~C~GCAGAC~UUGCCC~CACGAG~G~UGCUCCUCG~~GAUVCCAC~~CCACA~~~~GCCUCC~~G~CGM~~A~~VC~UCA~CG~GG~GG~~~A~~~G~~G~~A~G~A~~~~~ FCRRCPK HEKMLLDDS TPTLSLLERITSSPRW LTKNAMYL 11808)

5640 UCCCCGCC~GCUCAGACU~GCUAUCACC~CVCVGUCU~~CGC~~~~~~~~~AG~~~CA~~GAGG~~~~~~A~G~~GAGUC~G~~~G~~~~A~~~~G~~~~~A~~~~~A~~~~ L P A K L R LA ITSLSQTQSFPES IEVSHAESELLHY"Q* (1848)

5760 CGACAUGG~~CGACAAA~AACUCGCCC~CCAAGACCG~ACCGUCAC~~~CG~CA~~G~~~~A~~GA~~G~~~~~GG~~~~~~A~~~~~~AC~A~C~~~CCG~~C~AG~~U~~~~ "EIDKE LAPQDRTVTVATVLP TVPGPSPFTIK Q P F Q S E " (39)

5880 UC~~G~~GGGA~~~GA~G~~GAGGCCUCUC~A~CA~CG~~AUCGACAGCGVUUCCACCC~CACCACCV~C~AVCG~CAVGCCUCUC~GG~~CAC~CUGG~~CA~CA~~~A FAGTKDAEAS IDSVSTLTTF YRHASLES LWVTIH (79)

6000 ~~~~A~~~~G~~G~~C~AG~~~~~~~GACCACGG~~GGCGVUUGCVGGGUACCCGCC~CVCCCCAGVCACVCCCACCCAAAUCACCAAGACCUACGGCGGCCAGAUC~VC~G~A~~GG PTLQAPAFPTTVGVCWVPANSPVTP TQITKTYGGQIFcI G (119)

6120 ~GGCGCC~~CMCACUCUC~C~~~CCUCAUUGUCMGV~~CCACUVG~~~~~~~CC~CC~~~~C~~~~~~~UU~~V*CCVVG~CUCGCCC~CUCCVC~UC~CCA~~~~~~~ GAINTLSPL I V K C P LEMMNP R" K D S IQYLDSPKLLIS I T A (159)

6240 ~~~~~~A~~GC~~~~~~CG~A~CGACC~G~A~~~~C~GUAUCAGGMCUC~C~CGA~GCA~~C~CCGC~~CA~GG~~~CA~C~~G~~CUCGA~C~~U~UCG~UAG~~ QPTAPPASTCI TVSGTLSMHSP T " 1189)

CGCCAGVUA~CGAGGUCVG~CCCCACACG~CAGA~~VC~GGUGC~CV~CCGCCCCVU~VCCGAGGGU~AVCGG~~~(A)

FIG. 1 -Continued

Page 5: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

540 KEESE. MACKENZIE, AND GIBBS

+

L 1 I I I I I

I

I I I I I I i

FIG. 2. Diagram illustrating the positions of open reading frames (open rectangles) in the three codon phases of the viral (+) RNA strand and the complementary (-) strand of TYMV-CL genomic RNA; the upper band represents codons starting with the 5’terminal nucle- otide and subsequent triplets, the second band is of those in phase with the second nucleotide, the third band is of those in phase with the third nucleotide. The left border of each open rectangle indicates where an AUG occurs, and its right border indicates the position of the first in-phase UGA, UAG, or UAA triplet. The scale units are kb.

mutants (Domingo er a/., 1978), or may have resulted from errors in copying the RNA template by AMV re- verse transcriptase.

Comparisons of the TYMV-CL sequence and partial sequences of TYMV type strain

The 5’ noncoding region of TYMV-CL genomic RNA differs from that reported for the type strain of TYMV (Briand eta/., 1978) at position 49 and has an additional uridine residue at position 53. The 3’ terminal se- quence of genomic TYMV-CL RNA is closely similar to that of the virion protein messenger of TYMV-type re- ported by Guilley and Briand (1978) but 32 of the 695 nucleotide residues are different. These differences in- clude 30 in the virion protein coding region, though only four of these encode amino acid differences: 3 of them occur in the isolate of type-TYMV studied by March et al. (1988). Interestingly, three of the four differences are alanine to threonine conversions and one leucine to phenylalanine, thus there seems to be size conservation at these four sites, rather than charge conservation.

Open reading frames and encoded viral proteins

Figure 2 is a diagram showing, in both the positive (virion) and complementary strands of the TYMV-CL genome, all ORFs that have standard initiation and ter- mination codons. Three of the ORFs, all in the positive strand, are also found in the genomes of two other ty- moviruses, eggplant mosaic tymovirus (Osorio-Keese et al., 1989) and ononis yellow mosaic tymovirus (Ding et al., 1989). These conserved ORFs are the virion pro- tein (VP) (lur, 20,152) gene near the 3’end and two over-

lapping genes with initiation codons close to the 5’end. The larger of the overlapping ORFs initiates at nucleo- tide residue 96 and encodes a protein, which we call the replicase protein (RP) (I’@ 206,509), and presum- ably corresponds to the 195K protein found in in vitro translation studies (Mellema et a/., 1979). The smaller overlapping ORF begins at nucleotide residue 89, seven nucleotide residues to the 5’side of the start of the RP ORF, and encodes an out-of-phase, overlapping polypeptide (OP) (AJ 68,740). This latter protein was not reported from in vitro translation studies of the TYMV genome (Mellema et al., 1979; March and Beni- court, 1980). All other large ORFs present in the ge- nome of TYMV-CL are not conserved in size, position, or sequence with other tymoviral genomes (Osorio- Keese eta/., 1989; Ding eta/., 1989) and therefore may not be functional.

It has been shown that the UAG codon of the RP ORF can be suppressed, in an in vitro translation experi- ment, by a yeast amber suppressor tRNA to yield an n/l, 221 K protein (March et a/., 1982). Comparisons with other tymoviruses indicate that this potential read- through product is unlikely to have a functional signifi- cance in vivo because its features are not conserved; the stop codon is different in OYMV-Tin (Ding et a/., 1989) the sizes and sequences of the possible read- through products differ considerably, and even the reading frame relative to the VP ORF is not conserved. Furthermore, the postulated M, 221 K protein product, or a proteolytic fragment of it, has not been found either in infected plant cells, or in in vitro translation experi- ments of TYMV-type RNA using reticuloctye lysates in the absence of suppressor tRNAs (March et al., 1982). In contrast, in similar translation studies of TMV RNA, the readthrough product of the replicase internal am- ber termination codon has been detected in the ab- sence of suppressor tRNAs (Pelham, 1978).

The presence of overlapping ORFs has been shown for a number of bacteriophages (Barrel et al., 1976), animal viruses (see Kozak, 1986), and plant viruses (Wu et a/., 1987). Kozak (1984) has suggested that regula- tion of overlapping gene expression in animal viruses occurs by leaky translational initiation of the 5’ proximal AUG initiation codon as this appears to occupy an un- favorable sequence context compared to the optimal vertebrate initiation sequence, -CCACC@G- (Kozak, 1984). A similar situation may exist for tymoviruses such as TYMV-CL (Osorio-Keese et al., 1989).

Codon usage

Tymovirus genomes have a greater cytosine content than those of other plant viruses so far examined. Thus, as expected, the codons used by the three en- coded viral proteins of the TYMV-CL genome have a

Page 6: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

TYMV-CL GENOME SEQUENCE 541

predominance of cytosines in codon position III (Table I), as do many plant genes (Grantham et al., 1986). In contrast, uridine is the most common base in the ge- names of many other plant viruses and these also favor uridine in codon position III (Dasgupta and Kaesberg, 1982; Meyer et al., 1986; Rezaian et al., 1985; van Wezenbeek et al., 1983).

The large cytosine content of the TYMV-CL genome is also reflected in the distinctive amino acid composi- tions of the encoded proteins when these are com- pared with either nonvirion proteins or the mean amino acid compositions of all proteins in the NEWAT data- bank of 300,000 residues (Table 2). For example, the relative amounts of 1 1 amino acids in the RP and 14 amino acids in OP fall outside the range observed for the nonvirion proteins of a number of other plant and animal viruses. All three proteins encoded by the TYMV-CL genome have a composition that favors amino acids rich in cytosine-dominated codons, such as leucine, proline, and serine, and are correspond- ingly depauperate in purine-dominated amino acids such as lysine, glutamic acid, and glycine.

The unusual amino acid composition of the RP, which represents most of the encoded information of the genome, does not, however, strictly conform to that expected from the base ratio of the TYMV-CL ge- nome. For example, the 10.5% proline content is greater than that of other viral proteins but less than the 15.0% encoded by the RP ORF when it is random- ized and translated. Conversely, the molar ratios of glu- tamic acid, lysine, glycine, methionine, and valine in RP, although less than in proteins encoded by other viruses, are greater than obtained from the randomized sequence (Table 2). Thus it seems that the nucleotide sequence is biased at the codon level to modify levels of particular amino acids, while still maintaining a high cytosine, low guanine ratio.

When compared with other nonvirion proteins, it can be seen that OP has an even more unusual amino acid composition than RP (Table 2). It contains, for example, 18.69’0 proline. It is also noteworthy that the OPs of TYMV-CL, eggplant mosaic virus, and ononis yellow mosaic virus have less sequence similarity than their RPs (Ding et al., 1989).

Base ratios

Chemical analyses reported by Markham and Smith (1951) were the first to show the unusual base ratio of the TYMV genome, and in particular its large cytosine content. The sequence reported here confirms that es- timate (Table 3).

Although the reason for the large cytosine content of tymovirus genomes is unknown, it is not merely deter- mined by the encoded amino acid sequences because

Page 7: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

542 KEESE, MACKENZIE, AND GIBBS

TABLE 2

PERCENTAGE MOLAR AMINO ACID COMPOSITIONS OF VIRAL AND OTHER PROTEINS

Amino acid OP

TYMV-CL RP VP

Proteins

Virala Allb

Ala 4.9 (7.2)’ 6.2 (6.5) 7.4 (6.3)

f% 10.5 (8.3) 5.1 (8.1) 1.6 (7.9) Asn 2.1 (2.8) 3.4 (3.2) 2.1 (3.0)

Asp 4.8 (2.4) 4.9 (2.3) 3.7 (2.6)

CYS 0.8 (2.3) 1.2 (2.1) 2.1 (2.0) Gln 3.7 (3.5) 4.3 (3.3) 4.2 (3.1) Glu 3.0(1.5) 3.3 (1.5) 3.2 (1.6)

G/Y 5.3 (3.3) 3.3 (2.8) 4.2 (2.5) His 5.4 (5.3) 5.0 (5.4) 1.6(6.0) Ile 3.7 (3.6) 4.6 (4.1) 8.4 (4.5) Leu 9.9 (9.9) 12.4(9.9) 8.4 (10.7)

LYS 1.3(1.9) 3.3 (2.0) 3.7 (2.0) Met 0.5 (1 .O) 1.4 (0.9) 2.1 (0.7) Phe 2.5 (2.6) 4.4 (2.9) 3.2 (2.6) Pro 18.6(16.3) 10.5(15.0) 10.5(15.7)

Ser 12.1 (10.8) 10.7 (10.6) 9.0 (11 .O) Thr 6.7 (8.5) 7.6 (9.1) 14.2 (8.2)

Tw 0.5 (0.6) 1.4 (0.6) 1.1 (0.4)

W 0.6 (2.7) 2.9 (3.0) 1.6 (2.8) Val 3.2 (3.4) 4.0 (3.7) 7.4 (3.5) Terd 0.2 (2.3) 0.1 (2.8) 0.5 (2.9)

5.7-9.1 7.8 3.5-6.9 5.1 2.8-5.1 4.3 4.5-9.4 5.3 1.5-3.0 1.9 2.2-4.2 4.2 5.0-9.8 6.3 3.8-8.5 7.2 1.2-3.6 2.3 3.9-6.9 5.3 7.1-11.7 9.1 5.5-8.0 5.9 1.6-4.0 2.3 3.2-6.8 3.9 2.9-6.1 5.2 4.6-9.5 6.8 4.2-8.0 5.9 0.5-2.5 1.4 2.4-5.1 3.2 5.1-8.3 6.6

a The amino acid composition range (upper and lower molar per- centages) of viral nonvirion polypeptides was compiled from amino acid sequences encoded by bromegrass mosaic bromovirus (BMV) RNA1 (nucleotide residues 75-2960; Ahlquist et a/., 1984); BMV RNA2 (104-2572; Ahlquist et a/., 1984); alfalfa mosaic virus (AIMV) RNA1 (101-3481; Cornelissen et a/., 1983a); AIMV RNA2 (55-2427; Cornelissen ef a/., 1983b); tobacco mosaic tobamovirus (TMV) RNA (69-4919; Goelet et a/., 1982); Sindbis alphavirus RNA (5751-7598; Strauss et a/., 1984); polio enterovirus RNA (3386-7372; Kitamura et al., 1981); cowpea mosaic comovirus B RNA (207-5807; Lomo- nossoff and Shanks, 1983); tobacco etch potyvirus RNA (6981- 851 6; Allison et al., 1986); black beetle nodavirus RNA1 (39-2732; Dasmahapatra et a/., 1985); yellow fever flavivirus RNA (3680- 10,354; Rice eta/.. 198t

b The average compo ,rtion of proteins in the NEWAT database of approximately 300,OOC residues (Table XI of Doolittle, 1986).

c The molar compositions in brackets are the means of 25 values, each obtained by randomizing the nucleotide sequence of the appro- priate ORF 25 times and then translating it.

‘Termination codon.

cytosine is also the most common base in the noncod- ing regions of the genome (Table 3) and the viral codon usage is biased towards cytosine in the third position. For example, the RP ORF contains 38.6% cytosine, but only 29.2% would be required if all codons for each amino acid in the RP were used equally frequently.

The base ratio also differs considerably between the three codon positions of each ORF (Table 3) although cytosine is the most abundant base in all positions, ex- cept codon position I of the VP. Similar nonuniform base ratios are also found in the genomes of other vi-

ruses, but there is no conserved pattern between vi- ruses (Table 3). For example, the TYMV-CL genome and BMV RNA1 have more purine in codon position I and more pyrimidine in codon position III; a pattern that conforms to the primitive code of RNA postulated by Shepherd (1981). However codon position III of AIMV RNA1 is dominated by purines even though AIMV is clearly related to BMV and other “tricornaviruses” (Ha- seloff et a/., 1984).

The maintenance of two out-of-phase overlapping ORFs may also influence the base ratio. This possibility was examined by comparing the 5’ terminal portion of the RP gene, which overlaps the OP ORF (nucleotide residues 96-l 975) with the 3’ portion (nucleotide resi- dues 1976-5630). It can be seen (Table 3) that the 5’ and 3’ portions of the RP ORF resemble one another more than either resembles the OP ORF.

Dinucleotide frequencies

Convolution analysis of the sequence revealed sig- nificant local variations in base composition that mostly recur every third nucleotide (results not shown). The frequencies of nucleotide pairs in the TYMV-CL ge- nome are mostly those predicted from the general base ratio and the biased base ratios of each codon position. The most deviant doublet frequencies are those of UA, which occur less than half as frequently as expected in all three ORFs, and those of GA, which occur 40% more frequently than expected in the OP and RP ORFs.

UA doublets are uncommon in the genomes of all organisms and viruses, both DNA and RNA (Nussinov, 1981; Grantham et a/., 1985), probably because UAA and UAG are terminator codons; however, this could not explain why the UA content is less in all three read- ing frames, while the UG frequency is not, even though it is also part of a terminator codon. The bias against UA doublets in all codon positions may be the vestigial remnant of a time when UA was the termination signal (Balasubramanian, 1982). If that was true small UA doublet frequencies would allow testing of out-of- phase ORFs for the presence of functionally useful pro- teins such as the RP and OP ORFs in the TYMV-CL genome.

Examination of the longer range frequency devia- tions of nucleotide pairs separated by one or more nu- cleotide residues in the RP gene revealed few anoma- lies (Table 4). The UxA pair is less abundant than ex- pected but only in the lxlll codon positions due to the absence of UGA and UAA terminator codons. The most intriguing anomaly in nucleotide pairing is the un- expectedly large number of CxxC pairs. This pattern of cytosines occurs not only as CxxC pairs but also CxxxxxC pairs, and is unlikely to reflect a coding func- tion as both occur more frequently than expected in all

Page 8: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

TYMV-CL GENOME SEQUENCE 543

TABLE 3

PERCENTAGE BASE COMPOSITIONS OFTYMV-CL, BMV RNA1 , AND AIMV RNA1

Base

G

A

U

C

Codon OP RP VP RP5 RP 3’

position ORF ORF ORF ORFa ORFb

I 21 .l 21.7 26.1 20.5 22.3 II 20.5 12.7 9.8 10.9 13.6 Ill 10.9 16.0 13.0 21.2 13.4 All 17.6 16.8 16.1 17.6 16.4

I 18.8 23.4 30.4 20.7 24.8 II 20.7 27.1 20.1 24.9 28.2 Ill 24.9 18.6 16.8 18.8 18.5 All 21.6 23.0 23.0 21.6 23.7

I 14.4 20.1 17.4 19.7 20.3 II 19.7 26.8 29.3 27.6 26.4 III 27.6 17.7 18.5 14.4 19.4 All 20.7 21.5 21.8 20.7 21 9

I 45.6 34.8 26.1 39.0 32.7 II 39.0 33.4 40.8 36.6 31.8 Ill 36.6 47.7 51.6 45.6 48.8 All 40.1 38.6 39.1 40.1 38.9

Total sequence

16.8

23.1

21.7

38.4

Noncoding

17.4

25.1

25.6

31.9

BMVl AlMVl

ORF” ORFd

34.5 16.4 19.2 18.9 25.2 32.0 26.3 22.4

25.8 32.2 33.7 24.1 17.8 29.5 25.8 28.6

21.0 26.9 26.2 36.9 33.8 21.6 27.0 28.5

18.6 24.5 20.8 20.1 23.2 16.8 20.1 20.5

a Base ratios calculated for nucleotide residues 96-l 975 of the RP ORF that are overlapping with the OP ORF of TYMV-CL RNA. ’ Base ratios calculated for nucleotrde residues 1976-5630 of TYMV-CL RNA. c Base ratios calculated for nucleotide residues 75-2960 of BMV RNA1 ORF. ’ Base ratios calculated for nucleotrde residues 101-3481 of AIMV RNA1 ORF.

three codon positions. This pattern of cytosines may aspartate and glutamic acids. However, no part of the be linked to the overall large cytosine content of tymovi- genome showed a larger scale pattern of cytosine fre- rus genomes. It has been suggested by Jonard et al. quency with a periodicity, that could involve the 180, (1976) and Guilley and Briand (1978) that the cytosine 60, 32, 20, or 12 symmetries of the virion. residues may aid in stabilizing RNA-coat protein inter- In summary, it seems that selective constraints oper- actions by hydrogen bonding with carboxyl groups of ate on the TYMV-CL genome at both the nucleotide

TABLE 4

DINUCLEOTIDE FREQUENCIES~ OFTYMV-CL RP ORF

Codon position of first residue

Nucleotrde pair I II Ill All

UxA obs 43 112 82 237 w 69 116 89 274 XT 9.8 0.1 0.5 4.9

GxxxxxxG obs 43 21 57 121 exp 51 38 64 156 XT 1.2 7.3 0.8 7.9

cxxc obs 234 222 451 907 em 223 206 418 825 XT 0.5 1.3 2.6 8.2

cxxxxxc obs 233 208 423 864 ew 223 206 418 825 XT 0.4 0 0.1 1.9

a The expected dinucleotide frequencies were calculated from the base ratios determrned for each codon position (Table 3). The x2 was calculated as (obs - exp)‘/exp (one degree of freedom).

Page 9: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

544 KEESE, MACKENZIE, AND GIBBS

TYMV-CL (2997-3059)

BM" RNA1(2103-2165)

TM" (2541-2603)

Sindbis (2211-2273)

TYMV-CL (3204-3245)

BM" P.NA1(2322-23631

TMV (2769-2810)

Sindbis (2418-2459)

'MM"-CL (3288-3332)

B"" RNA1(2403-2447)

TM" (2850-2894)

Sindbis (2502-2546)

TYMV-CL (3660-3704)

BM" RNA1(2865-29091 TMV (3273-3317)

Sind!ds (2913-2957)

VLAFGDTEQISFKSR AYVYGDTQQIPYINR ""LCGDPMQCGFFNM

l l *

l ***** NGLVALTRSRSGVQF

YCLVALTRHKKSFEY HVLVALSRHTCSLKY HVNVLLTRTEDRLVW

B) * * **

TYMV-CL (4815-4853) K;;NtYTA:;:::

B"" RNA2(1484-1522) FLEADLSKFDKSQ TM" (4212-4250) "LELDISKYDKSQ Sin&is (6864-6902) "LETDIASFDKSQ

* . l l

TYMV-CL (4974-5021) CM;LT:EPG;YDDi:D BM" RNA2(1661-1708) FQRRTGDAFTYFGNTL TM" (4389-4436) YQRKSGDVTTFIGNTV Sindbis (7041-7088) AMMKSGMFLTLFVNT"

l l l

TYMV-CL (5067-5099) PIM;S;;;;t; BM" RNA2(1757-1789) CAIFSGDDSLI TM" (4485-4517) K G A F C G D D S L L Sindbis (7143-7175) CAAFIGDDNII

* t t * *****

TYMV-CL (5184-5258) P L F C G Y Y " - - G P A : C I R - - - - : : t. A L F C ; *L " BM"RNA2(1874-1963)PY"CSKFL"ETEMGNL"-S"PDPLREIQRLA TM" (4608-4694) GYFCGRYVIHHDRGCIV--YYDPLKLISKLG Sindbis (7272-7364) P Y F C G G F I L Q D S V T S T A C R V A D P L K R L F K L G

FIG. 3. Alignments of parts of the TYMV-CL RP amino acid sequence with sequences encoding equivalent proteins of other viruses. The position in each genome of the nucleotide sequence that encodes each segment is given in brackets. Double asterisks indicate where all four sequences have the same amino acid residues, single asterisks where amino acid residues in three of the four sequences are the same. (A) Parts of the possible nucleotide-binding fold. (B) Parts of the possible RNA replicase domain.

sequence and protein coding levels. Even the suppos- edly redundant third codon position of the ORFs of the TYMV-CL genome seems to be constrained to main- tain a high overall cytosine content.

Sequence similarities to proteins of other viruses

Studies of the large and increasing number of viral genomic sequences have shown that many viruses with different particle structure and genome organiza- tion share amino acid sequences. The sequences that have attracted most interest are those that include a GxxGxGKg sequence and the tripeptide GDD (See Goldbach, 1987).

The GxxGxGKz motif is presumed to function as a nucleotide-binding fold (Gorbalenya et al., 1985) be- cause it has sequence similarity with the amino termi- nal sequence of prokaryotic and eukaryotic nucleotide- binding proteins (Walker et a/., 1982; Mijller and Amons, 1985; Higgins et a/., 1986). The GDD se-

quence with neighboring hydrophobic amino acid resi- dues has been detected as a common element in viral proteins (Franssen et al., 1984; Haseloff ef al., 1984; Kamer and Argos, 1984). In the bacteriophage MS2 (Fiers et a/., 1976) and poliovirus (Van Dyke and Fla- negan, 1980; Kitamura et a/., 1981) the GDD-contain- ing polypeptide is part of the viral RNA replicase. In contrast, viral-specific polymerase activity from infec- tions of BMV (Bujarski et a/., 1982) or West Nile fever virus (Grun and Brinton, 1987) appear to be associated with the peptide that has the nucleotide-binding motif.

Both the GxxGxGKz and GDD motifs occur in the RP gene of TYMV-CL. The amino acid sequences sur- rounding these two motifs are more similar (Fig. 3) to those of members of the tobamovirus, tricornavirus, and alphavirus supergroup of RNA viruses than those of the picornavirus, comovirus, potyvirus, sobemovi- rus, and carmovirus groups.

Mellema eta/. (1979) reported that in vitro translation studies with rabbit reticulocyte lysates indicate that the

Page 10: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

TYMV-CL GENOME SEQUENCE 545

RP gene contains a signal which results in the prema- ture release of an M, 150K polypeptide. The precise location of this leaky termination signal, as well as the sites of in vitro proteolytic cleavage observed for the larger RP product, have not been identified. Neverthe- less, the GDD peptide is found in the carboxyl region of the complete RP product, and the nucleotide-bind- ing fold motif occurs towards the carboxyl terminus of the M, 150K polypeptide. A similar arrangement of these two amino acid motifs is found in the TMV poly- merase gene (Goelet et al., 1982), except that the leaky termination signal in the case of TMV is an in-phase amber codon.

One major difference in the genome organization of TYMV-CL and TMV is that the latter encodes an M, 30K polypeptide between the polymerase and 3’ terminal coat protein genes. A similar ORF is found in an equiva- lent position of the genome of otherviruses of the same supergroup. The M, 30K protein is involved in the inter- cellular transport of virus infection (Doem et al., 1987). The functionally equivalent gene in TYMV-CL may be the smaller of the overlapping genes, OP, as there is no additional ORF between those of the RP and VP. March and Bknicourt (1980) suggested that a similar polypeptide might be produced by readthrough of the termination codon of the RP of TYMV-type; however, there is no ORF equivalent in size or composition in the genomes of two other tymoviruses (Osorio-Keese et a/., 1989; Ding et a/., 1989).

In conclusion, the TYMV-CL genome is similar, both in terms of its organization and of the amino acid se- quences it encodes, to a wide range of genomes of both plant and animal viruses.

ACKNOWLEDGMENTS

We thank M. Torronen and 1. Howe for skilled technical assistance.

REFERENCES

AHLCIUIST, P., DASGUPTA, R., and KAESBERG, P. (1984). Nucleotide se- quence of the brome mosaic virus genome and its implications for viral replication. /. n/lo/. Viol. 1728, 369-383.

ALLISON, R., JOHNSTON, R. E., and DOUGHERTY, W. G. (1986). The nu- cleotlde sequence of the coding region of tobacco etch virus geno- mlc RNA: Evidence for the synthesis of a single polyprotein. viro/- ogy 154,9-20.

BALASUBRAMANIAN, R. (1982). Origin of life: A hypothesis forthe origin of adaptor-medlated ordered synthesis of proteins and an explana- tion for the choice of terminating codons in the genetic code. Bio- Systems 15,99-l 04.

BARREL. B. I. AIR, G. M., and HUTCHISON. C. A. (1976). Overlapping genes in bacteriophage @Xl 74. Narure flondon) 264, 34-40.

BRIAND, J. P., KEITH, G., and GUILLEY, H. (1978). Nucleotlde sequence at the 5’extremity of turnip yellow mosaic virus genome RNA. Proc. Natl. Acad. Sci. USA 75,3168-3172.

BUJARSKI, 1. J., HARDY, S. F., MILLER, W. A., and HALL, T. C. (1982). Use of dodecyl-P-o-maltoside in the purification and stabilization

of RNA polymerase from brome mosaic virus-infected barley. \/ire/-

ogy 119,465-473.

CORNELISSEN, B. J. C., BREDERODE, F. T., MOORMANN, R. J. M., and BOL, J. F. (1983a). Complete nucleotide sequence of alfalfa mOSaiC virus RNA 1. Nucleic Acids Res. 11, 1253-l 265.

CORNELISSEN, B. J. C., BREDERODE, F. T., VEENEMAN, G. H., VAN BOOM, J. H., and BOL, J. F. (198313). Complete nucleotlde sequence of al- falfa mosaic virus RNA 2. Nucleic Acids Res. 11, 301 g-3025.

DASGUPTA, R., and KAESBERG, P. (1982). Complete nucleotlde se- quences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus. /Vucle!c Acids Res. 10, 703-713.

DASMAHAPATRA. B., DASGUP~A R., GHOSH, A., and KAESBERG, P. (1985). Structure of the black beetle virus genome and Its func- tional Implications. J. Mol. Biol. 182, 183-189.

DING, S., KEESE, P., and GIBES, A. (1989). Nucleotlde sequence of the genome of ononis yellow mosaic tymovlrus. Virology 172, 555- 563.

DOEM, C. M., OLIVER, M. J., and BEACHY. R. N. (1987). The 30-kllodal- ton gene product of tobacco mosaic vtrus potentlates virus move- ment. Science 237, 389-394.

DOMINGO, E., SABO, D., TANAGUCHI, T., and WEISSMANN, C. (1978). Nucleotide heterogeneity of an RNA phage population. Cell 13, 735-744.

DOOLI~LE, R. F. (1986). “Of URFS and ORFS; a Primer of How to Analyze Derived Amino Acid Sequences.” Unlverslty Science Books, MIII Valley, CA.

ENGCAND, T. E., and UHLENBECK, 0. C. (1978). 3’.Terminal labelling of RNA with T4 RNA ligase. Nature (London) 275,560-561.

FIERS, W., CONIRERAS, R., DUERINCK, F., HAEGEMAN, G., ISERENTANT, D., MERREGAERT, J., MIN Jou, W., MOLEMANS, F , RAEYMAEKERS, A., VANDENBERGHE, A., VOLCKAERT, G., and YsEBAERT, M. (1976). Com- plete nucleotlde sequence of bacteriophage MS2-RNA: Primary and secondary structure of the repllcase gene Nature (London) 260, 500.-507.

FRANSSEN, H., LEUNISSEN, J., GOLDBACH. R., LOMONOSSOFF, G., and ZIMMERN, D. (1984). Homologous sequences In non-structural pro- telns from cowpea mosaic virus and picornavlruses. EA&?O J 3, 855-861.

GIBBS, A. J., HECHT-POINAR, E., and WOODS, R. D. (1966). Some prop- erties of three related viruses: Andean potato latent, dulcamara mottle, and ononls yellow mosaic. /. Gen. u;crobiol. 44, 177-l 93.

GIEG~, R., BRIAND, J. P., MENGUAL, R., EBEL, J.-P., and HIRTH, L. (1978). Valylation of the two RNA components of turnip-yellow mosaic VI-

rus and speclficlty of the tRNA aminoacylatlon reactjon. Eur. /. Bio- them. 84, 25 lo-256.

GOELET, P., LOMONOSSOFF, G. P., BUTLER, P. J. G., AKAM, M. E., GAIT, M. M., and KARN, J. (1982). Nucleotlde sequence of tobacco mo- salt virus RNA. Proc. Nat/. Acad. SC;. USA 79, 58 18-5822.

GOLDBACH, R. (1987). Genome simllantles between plant and animal RNAvlruses. M~crobiol. SC;. 4, 197-202.

GORBALENYA, A. E.. BLINOV, V. M., and KOONIN. E. V. (1985). Predlc- tlon of nucleotide-blndlng propertles of virus-specific proteins from thetr primary structure. Mol. Genet. Mikrobioi. Virusol. 11, 30-36.

GRANIHAM. R.. GREENLAND, T., LOUAIL, S., MOUCHII~OUD, D., PRATO,

J. L., GOUY, M., and GAUIIER, C. (1985). Molecular evolution of VI- ruses as seen by nucleic acid sequence study. t?ul/. Insr. Pasteur (Par/s)83, 95-148.

GRANTHAM. R.. PERRIN, P., and MOUC&IIROUD, D. (1986). Patterns In codon usage of different kinds of species Oxford Surveys Evol. B/o/. 3, 48-81.

GRUN, J. B.. and BRINTON, M. A. (1987). Dlssoclatlon of NS 5 from cell fractions contalnlng West Nile virus specific polymerase activity, /. V/ro/ 61, 3641-3644.

Page 11: Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus

KEESE, MACKENZIE, AND GIBBS

GOBLER, U., and HOFFMAN, 6. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263-269.

GUILLEY, H., and BRIAN& J. P. (1978). Nucleotide sequence of turnip yellow mosaic virus coat protein mRNA. Cell 15, 1 13-l 22.

GUY, P. L., and GIBBS, A. (1981). A tymovirus of Car&mine sp. from alpine Australia. Ausf. Plant Parhol. 10, 12-l 3.

GUY, P. L., and GIBBS, A. J. (1985). Further studies on turnip yellow mosaic tymovirus isolates from an endemic Australian Car- damine. Plant Pathol. 34, 532-544.

HASELOFF, 1.. and SYMONS, R. H. (1981). Chrysanthemum stuntviroid: Primary sequence and secondary structure. Nucleic Acids Res. 9, 2741-2752.

HASELOFF, J., GOELET, P., ZIMMERN, D., AHLQUIST, P., DASGUPTA, R., and KAESBERG, P. (1984). Striking similarities in amino acid se- quence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization. Proc. Nat/. Acad. Sci. USA 81,4358-4362.

HATTAT., and MA-HEWS, R. E. F. (1974). The sequence of early cyto- logical changes in Chinese cabbage leaf cells following systemic infection with turnip yellow mosaic virus. Virology 59, 383-396.

HIGGINS, C. F., HILES, I. D., SALMOND, G. P. C., GILL, D. R., DOWNIE, J. A., EVANS, I. J., HOLLAND, I. B., GRAY, L., BUCKEL, S. D., BELL, A. W., and HERMODSON, M. A. (1986). A family of related ATP-bind- ing subunits coupled to many distinct biological processes in bac- teria. Nature (London) 323,448-450.

JONARD, G., BRIAND, J. P., BOULEY, J. P., WITZ, J., and HIRTH, L. (1976). Nature and specificity of the RNA-protein interaction in the case of the tymoviruses. Philos. Trans. R. Sot. London Ser. B 276, 123- 129.

KAMER, G., and ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial vi- ruses. Nucleic Acids Res. 12,7269-7281.

KITAMURA, N., SEMLER, B., ROTHBERG, P. G., LARSEN, G. P., ADLER, C. J., DORNER, A. J., EMINI, E. A., HANECAK, R., LEE, J. J., VAN DER WERF, S., ANDERSON, C. W., and WIMMER E. (1981). Primary struc- ture, gene organization and polypeptide expression of poliovirus RNA. Nature (London) 291, 547-553.

KLEIN, C., FRITSCH, C., BRIAND, J. P., RICHARDS, K. E., JONARD, G., and HIRTH, L. (1976). Physical and functional heterogeneity in TYMV RNA: Evidence for the existence of an independent messenger coding for the coat protein. Nucleic Acids Res. 3,3043-3061.

KOZAK, M. (1984). Compilation and analysis of sequences upstream from the translational start site in eucaryotic mRNAs. Nucleic Acids Res. 12,857-872.

KOZAK, M. (1986). Regulation of protein synthesis in virus-infected animal cells. Adv. Virus Res. 31, 229-292.

LOMONOSSOFF, G. P., and SHANKS, M. (1983). The nucleotide se- quence of cowpea mosaic virus RNA. fMBO/. 2,2153-2158.

MARKHAM, R., and SMITH, J. D. (1951). Chromatographic studies of nucleic acids. 4. The nucleic acid of the turnip yellow mosaic virus, including a note on the nucleic acid of the tomato bushy stunt vi- rus. Biochem. J. 49, 401-407.

MELLEMA, J.-R., B~NICOURT, C., HAENNI, A.-L., NOORT, A., PLEIJ, C. W. A., and BOSCH, L. (1979). Translational studies with turnip yellow mosaic virus RNAs isolated from major and minor virus par- ticles. virology 96, 38-46.

MEYER, M., HEMMER, O., MAYO, M. A., and FRITSCH, C. (1986). The nucleotide sequence of tomato black ring virus RNA-2. J. Gen. Viral. 67, 1257-1271.

MUELLER, W.. and AMONS, R. (1985). Phosphate-binding sequences in nucleotide-binding proteins. FEBS Left. 186, l-7.

MORCH, M.-D., and B~NICOURT, C. (1980). Post-translational proteo- lflic cleavage of in vitro synthesized turnip yellow mosaic virus RNA-coded high-molecular-weight proteins. J, Viral. 34, 85-94.

MORCH, M.-D.. BOYER, J.-C., and HAENNI, A.-L. (1988). Overlapping open reading frames revealed by complete nucleotide sequencing of turnip yellow mosaic virus genomic RNA. Nucleic Acid Res. 16, 6157-6173.

MORCH, M.-D., DRUGEON, G., and B~NICOURT, C. (1982). Analysis of the in vitro coding properties of the 3’ region of turnip yellow mo- saic virus genomic RNA. Virology 119, 193-l 98.

MOUCHES, C., CANDRESSE, T., and Bovi, J. M. (1984). Turnip yellow mosaic virus RNA-replicase contains host and virus encoded sub- units. Virology 134, 78-S 1,

NUSSINOV, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Mol. Evol, 17, 237-244.

OSORIO-KEESE, M. E., KEESE, P., and GIBBS, A. (1989). Nucleotide se- quence of the genome of eggplant mosaic tymovirus. Virology 172,547-554.

Ou, J.-H., STRAUSS, E. G., and STRAUSS. J. H. (1981). Comparative studies of the 3’terminal sequences of several alphavirus RNAs. Virology 109, 281-289.

PAUL, H. L., GIBBS, A. J., and WI-MAN-LIEBOLD, B. (1980). The rela- tionships of certain tymoviruses assessed from the amino acid composition of their coat proteins. Infervirology 13, 99-l 09.

PELHAM, H. R. B. (1978). Leaky UAG termination codon in tobacco mosaic virus RNA. Nature (London) 272,469-471.

PLEIJ, C. W. A., NEELEMAN, A., VAN VLOTEN-DOTING, L., and BOSCH, L. (1976). Translation of turnip yellow mosaic virus RNA in v&o: A closed and open coat protein cistron. Proc. Nat/. Acad. Sci. USA 73,4437-444 1.

REZAIAN, M. A., WILLIAMS, R. H. V., and SYMONS, R. H. (1985). Nucleo- tide sequence of cucumber mosaic virus RNA 1. fur. J. Biochem. 150,331-33s.

RICE, C. M., LENCHES, E. M., EDDY, S. R., SHIN, S. J., SHEETS, R. L., and STRAUSS, J. H. (1985). Nucleotide sequence of yellow fever virus: Implications forflavivirus gene expression and evolution. Sci- ence 229,726-733.

SANGER, F., COULSON, A. R., BARRELL, B. G., SMITH, A. J. H., and ROE, B. A. (1980). Cloning in a single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 143, 161-l 78.

SHEPHERD, J. C. W. (1981). Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc. Nat/. Acad. Sci. USA 78, 1596-l 600.

STADEN, R. (1982). Automation of the computer handling of gel read- ing data produced by the shotgun method of DNA sequencing. Nucleic Acids Res. 10,473 l-475 1.

STRAUSS, E. G., RICE, C. M., and STRAUSS, J. H. (1984). Complete nu- cleotide sequence of genome RNA of Sindbis virus. Virology 133, 92-l 10.

SYMONS, R. H., REES, M. W., SHORT, M. N., and MARKHAM, R. (1963). Relationships between the ribonucleic acid and protein of some plant viruses. J. Mol. Biol. 6, l-l 5.

TAYLOR, J. M., ILLMENSEE, R.. and SUMMERS, 1. (1976). Efficient tran- scription of RNA into DNA by avian sarcoma virus polymerase. Bio- them. Biophys. Acta 442,324-330.

VAN DYKE, T. A., and FLANEGAN, J. B. (1980). Identification of poliovi- rus polypeptide p63 as a soluble RNA-dependent polymerase. J. Viral. 35, 732-740.

VAN WEZENBEEK, P., VERVER, J., HARMSEN, J., Vos, P., and VAN KAMMEN, A. (1983). Primary structure and gene organization of the middle component RNA of cowpea mosaic virus. EMBOJ. 2,941-946.

WALKER, 1. E., SARASTE, M.. RUNSWICK, M. J., and GAY, N. J. (1982). Distantly related sequences in the (Y- and ,&subunits of ATP syn- thase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBOJ. I, 945-951.

Wu, S.. RINEHART, C. A., and KAESBERG, P. (1987). Sequence and or- ganization of southern bean mosaic virus genomic RNA Virology 161,73-80.