The Primary Structure of Rat Ribosomal Protein L23a

7
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1993 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 268, No. 4, Issue of February 5, pp. 2755-2761,1993 Printed in U.S.A. The Primary Structure of Rat Ribosomal Protein L23a THE APPLICATION OF HOMOLOGY SEARCH TO THE IDENTIFICATION OF GENES FOR MAMMALIAN AND YEAST RIBOSOMAL PROTEINS AND A CORRELATION OF RAT AND YEAST RIBOSOMAL PROTEINS* (Received for publication, August 17,1992) Katsuyuki Suzuki and Ira G. Wools From the Department of Biochemistry and Molecular Biology, University of Chicago, Chicago,Illinois 60637 The amino acid sequence of the rat 60 S ribosomal subunit protein L23a was deduced from the sequence of nucleotides in a recombinant cDNA. Ribosomal pro- tein L23a has 166 amino acids and a molecular weight of 17,684. Hybridization of the L23a cDNA to digests of nuclear DNA suggests that there are 18-20 copies of the L23a gene. The mRNA for the protein is about 600 nucleotides in length. Rat L23a is related to the yeast Saccharomyces cerevisiae L25, to the archae- bacterial Methanococcus vannielii L23, to eubacterial Escherichia coli L23, and to other members of the L23 family of ribosomal proteins. A novel application of a routine homology search procedure was employed to identify a nucleotide sequence that could be used to design an oligodeoxynucleotide probe to screen a li- brary for a cDNA that encodes rat L23a; this same procedure uncovered a number of previously uniden- tified genes for yeast ribosomal proteins in the Gen- Bank DNA data base. In a correlation of rat and yeast ribosomal proteins 48 pairs are shown tobe related. An attempt is being made to solve the structure of eukary- otic ribosomes. The motivation for this undertaking derives from the belief that knowledge of the structure is essential for a rational, coherent description of the biochemistry of protein synthesis. Solution of thestructure of ribosomes requires the sequence of nucleotides and of amino acids in the constituent nucleic acids and proteins. A commitment has been made to acquire this data for mammalian (rat) ribosoms (1). As part of this exercise we report the sequence of amino acids in rat ribosomal protein L23a. A novel application of a common homology search procedure was used to uncover, in a DNA data base, a sequence that encodes a previously unidentified ribosomal protein. The search was made with the amino acid sequence of a yeast ribosomal protein and the gene for a human ribosomal protein was discovered; the deduced amino acid sequence of the latter served, in turn, as the basis for the design of an oligodeoxynucleotide probe to screen a rat cDNA library. Rat L23a is related to Saccharomyces cerevisiae L25 and to * This work was supported by National Institutes of Health Grant GM-21769. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencef‘s) reported in this paper has been submitted X65228. to the GenBankTM/EMBL Data Bank with accession numberls) $ To whom correspondence should be addressed Dept. of Biochem- istry and Molecular Biology, University of Chicago, 920 E. 58th St., Chicago, IL 60637. Tel.: 312-702-1341;Fax: 312-702-0439. Escherichia coli L23, ribosomal proteins that bind to a specific site in domain I11 of 26 S and of 23 S rRNAs (2, 3); these proteins participate in the initiation of the assembly of the large ribosomal subunits. Yeast L25 and E. coli L23 also recognize the homologs of their cognate binding sites in the heterologous rRNAs (4). Finally, the sequence of amino acids at the carboxyl terminus of yeast L25 responsible for specific binding to 26 S rRNA has been identified (5). Ribosomal protein L23a has a similar amino acid sequence that is likely to be important for binding to the homologous site in rat 28 S rRNA. EXPERIMENTAL PROCEDURES The recombinant DNA procedures and the methods used to deter- mine the sequences of nucleotides in nucleic acids have been described or cited (6-8). Two separate degenerate oligodeoxynucleotide probes for the cDNA encoding rat ribosomal protein L23a were synthesized; the design of the probes was based on the sequence of amino acids encoded in the DNA downstream from the human leukocyte antigen F (HLA-F) gene (9); this region harbors a human ribosomal protein L23a pseudogene, which was identified by a homology search (the procedure is described in detail later). The probes were mixtures of 384 different oligodeoxynucleotides, each 26 bases in length probe 1 was complementary to the sequence encoding the amino acids DHHVIIKFP in the human gene; the sequence in rat L23a is DHYAIIKFP (residues 72-80). Probe 2 was complementary to the sequence encoding amino acids KANKHQIKQ (residues 103-111 in rat L23a). The oligodeoxynucleotides were synthesized on a solid support by the methoxyphosphoramidite method using an Applied Biosystems model 380B DNA synthesizer (10). The protein encoded in the open reading frame in the cDNA insert in the plasmid pcD- L23a-1 was characterized by in vitro transcription and translation and electrophoresis of the radioactive product in two dimensions in polyacrylamide gels (11). The computer program TFASTA (12) was used to search the GenBank DNA data base; the computer programs RELATE and ALIGN (13) were used to assess possible evolutionary relationships between rat L23a and other ribosomal proteins. The scoring matrix was Dayhoff‘s MDM ‘78 (13). RESULTS AND DISCUSSION The Use of Homology Search to Identify Genes That Encode Mammalian Ribosomal Proteins-It has been apparent for some time that ribosomal proteins from different eukaryotic species derive from common ancestral genes (1). With the accretion of amino acid sequences, the earlier presumption has become a conviction that is perhaps exemplified best by a comparison of individual rat and yeast ribosomal proteins. The data for these two species are particularly valuable be- cause the set is so large; there are 62 complete and three partial amino acid sequences for rat ribosomal proteins and 41 complete and 17 partial sequences for yeast (Table I). Even though the two species are distant, 48 pairs can be correlated (see Refs. 1, 14, and Table I). All of the painvise comparisons give highly significant scores when evaluated with the RE- 2755

Transcript of The Primary Structure of Rat Ribosomal Protein L23a

Page 1: The Primary Structure of Rat Ribosomal Protein L23a

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1993 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 268, No. 4, Issue of February 5, pp. 2755-2761,1993 Printed in U.S.A.

The Primary Structure of Rat Ribosomal Protein L23a THE APPLICATION OF HOMOLOGY SEARCH TO THE IDENTIFICATION OF GENES FOR MAMMALIAN AND YEAST RIBOSOMAL PROTEINS AND A CORRELATION OF RAT AND YEAST RIBOSOMAL PROTEINS*

(Received for publication, August 17,1992)

Katsuyuki Suzuki and Ira G. Wools From the Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637

The amino acid sequence of the rat 60 S ribosomal subunit protein L23a was deduced from the sequence of nucleotides in a recombinant cDNA. Ribosomal pro- tein L23a has 166 amino acids and a molecular weight of 17,684. Hybridization of the L23a cDNA to digests of nuclear DNA suggests that there are 18-20 copies of the L23a gene. The mRNA for the protein is about 600 nucleotides in length. Rat L23a is related to the yeast Saccharomyces cerevisiae L25, to the archae- bacterial Methanococcus vannielii L23, to eubacterial Escherichia coli L23, and to other members of the L23 family of ribosomal proteins. A novel application of a routine homology search procedure was employed to identify a nucleotide sequence that could be used to design an oligodeoxynucleotide probe to screen a li- brary for a cDNA that encodes rat L23a; this same procedure uncovered a number of previously uniden- tified genes for yeast ribosomal proteins in the Gen- Bank DNA data base. In a correlation of rat and yeast ribosomal proteins 48 pairs are shown to be related.

An attempt is being made to solve the structure of eukary- otic ribosomes. The motivation for this undertaking derives from the belief that knowledge of the structure is essential for a rational, coherent description of the biochemistry of protein synthesis. Solution of the structure of ribosomes requires the sequence of nucleotides and of amino acids in the constituent nucleic acids and proteins. A commitment has been made to acquire this data for mammalian (rat) ribosoms (1). As part of this exercise we report the sequence of amino acids in rat ribosomal protein L23a. A novel application of a common homology search procedure was used to uncover, in a DNA data base, a sequence that encodes a previously unidentified ribosomal protein. The search was made with the amino acid sequence of a yeast ribosomal protein and the gene for a human ribosomal protein was discovered; the deduced amino acid sequence of the latter served, in turn, as the basis for the design of an oligodeoxynucleotide probe to screen a rat cDNA library.

Rat L23a is related to Saccharomyces cerevisiae L25 and to

* This work was supported by National Institutes of Health Grant GM-21769. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencef‘s) reported in this paper has been submitted

X65228. to the GenBankTM/EMBL Data Bank with accession numberls)

$ To whom correspondence should be addressed Dept. of Biochem- istry and Molecular Biology, University of Chicago, 920 E. 58th St., Chicago, IL 60637. Tel.: 312-702-1341; Fax: 312-702-0439.

Escherichia coli L23, ribosomal proteins that bind to a specific site in domain I11 of 26 S and of 23 S rRNAs (2, 3); these proteins participate in the initiation of the assembly of the large ribosomal subunits. Yeast L25 and E. coli L23 also recognize the homologs of their cognate binding sites in the heterologous rRNAs (4). Finally, the sequence of amino acids at the carboxyl terminus of yeast L25 responsible for specific binding to 26 S rRNA has been identified (5). Ribosomal protein L23a has a similar amino acid sequence that is likely to be important for binding to the homologous site in rat 28 S rRNA.

EXPERIMENTAL PROCEDURES

The recombinant DNA procedures and the methods used to deter- mine the sequences of nucleotides in nucleic acids have been described or cited (6-8). Two separate degenerate oligodeoxynucleotide probes for the cDNA encoding rat ribosomal protein L23a were synthesized; the design of the probes was based on the sequence of amino acids encoded in the DNA downstream from the human leukocyte antigen F (HLA-F) gene (9); this region harbors a human ribosomal protein L23a pseudogene, which was identified by a homology search (the procedure is described in detail later). The probes were mixtures of 384 different oligodeoxynucleotides, each 26 bases in length probe 1 was complementary to the sequence encoding the amino acids DHHVIIKFP in the human gene; the sequence in rat L23a is DHYAIIKFP (residues 72-80). Probe 2 was complementary to the sequence encoding amino acids KANKHQIKQ (residues 103-111 in rat L23a). The oligodeoxynucleotides were synthesized on a solid support by the methoxyphosphoramidite method using an Applied Biosystems model 380B DNA synthesizer (10). The protein encoded in the open reading frame in the cDNA insert in the plasmid pcD- L23a-1 was characterized by in vitro transcription and translation and electrophoresis of the radioactive product in two dimensions in polyacrylamide gels (11). The computer program TFASTA (12) was used to search the GenBank DNA data base; the computer programs RELATE and ALIGN (13) were used to assess possible evolutionary relationships between rat L23a and other ribosomal proteins. The scoring matrix was Dayhoff‘s MDM ‘78 (13).

RESULTS AND DISCUSSION

The Use of Homology Search to Identify Genes That Encode Mammalian Ribosomal Proteins-It has been apparent for some time that ribosomal proteins from different eukaryotic species derive from common ancestral genes (1). With the accretion of amino acid sequences, the earlier presumption has become a conviction that is perhaps exemplified best by a comparison of individual rat and yeast ribosomal proteins. The data for these two species are particularly valuable be- cause the set is so large; there are 62 complete and three partial amino acid sequences for rat ribosomal proteins and 41 complete and 17 partial sequences for yeast (Table I). Even though the two species are distant, 48 pairs can be correlated (see Refs. 1, 14, and Table I). All of the painvise comparisons give highly significant scores when evaluated with the RE-

2755

Page 2: The Primary Structure of Rat Ribosomal Protein L23a

2756 Rat Ribosomal Protein L23a TABLE I

A correlation of homologous rat and yeast ribosomal proteins The correlation was compiled with the assistance of A. Gluck and

Y. L. Chan. The amino acid sequences are in GenBank or are available on request from the authors. The letter N designates an NH2-terminal amino acid sequence. Note: several of the correlations are from the present study.

Rat Yeast Rat Yeast

s 2 53 s 4 S5/S5a S6 s 7 sa s 9 s10 s11 SI2 S13 S14 S15 S15a-N S16 S17 S18 s19 s20 s21 S24 S25 S26 S27

S28 S27a

S29 S30

s 4 YS3-N s 7

s10

YS9-N S13

RP41-N

YS15-N RP59 s21 S24 RP61R-N RP51

S16a

YS25

S31 RP50-N

s37 s33 YS29

RP14-N RP3O-N RP37

L3 L4-N L5 L7 L7a L8 L9 L11 L12 L15-N L17 L18 L18a L19 L21 L23 L23a L26 L27 L27a L28 L30 L31 L32 L34 L35 L35a L36 L36a L37 L37a L38 L39 PO P1 P2

L3 L2 YL3 YL8 L4 SP-K37, (YL6-N) YLll L16 L15 YL10-N YL17-N RP28 SP-L17-N YL14-N

L1la L25 YL33-N

L29

L32 L34

YL37 YL39-N L41 YL35

L46 A0 Al , L44’ A2, L45 YL31-N YL43-N L17 L30 L47 UBI1 RP23-N

LATE and ALIGN programs (13); the extent of the amino acid identities in the alignments of the yeast and ribosomal protein homologs range from 40 to 80% for the comparisons where both sequences are complete. The data are sufficient to provide confidence that most if not all of the proteins in the ribosomes of the two species are homologous and that it will be possible to establish a complete protein-to-protein correlation. There were, however, at the time this study was begun nine yeast ribosomal proteins whose rat homologs had not been identified. To find the genes for the unidentified mammalian ribosomal proteins, the amino acid sequences of the nine yeast ribosomal proteins were used to search the GenBank DNA data base using the computer program TFASTA (12). TFASTA translates, into amino acid se- quences, the six possible reading frames of the nucleotides in double-stranded DNAs; the data for the homology search were the amino acid sequences generated by TFASTA.

A significant relationship was found between the amino acid sequence in a portion of the carboxyl-terminal half of S. cereuisiae L25 (15), one of the nine yeast ribosomal proteins

for which no rat homolog had been identified, and that en- coded in the DNA immediately downstream from the human leukocyte antigen F (HLA-F) gene (9). Despite a shift in the reading frame and the presence of a termination codon, there are 47 identities and 13 conservative changes in a sequence of 86 consecutive residues (Fig. 1). Thus, it seemed likely that the region 3’ to the HLA-F gene harbored a human ribosomal protein pseudogene related to yeast L25 and we undertook to use the amino acid sequence encoded in the DNA to design probes to isolate the rat homolog and to determine its identity.

The Sequence of Nucleotides in a Recombinant cDNA En- coding Rat Ribosomal Protein L23a-A random selection of 20,000 colonies from two cDNA libraries of 20,000 and 30,000 independent transformants that had been constructed from regenerating rat liver poly(A)+ mRNA (6) was screened for clones that hybridized to two oligodeoxynucleotide probes for the rat ribosomal protein cDNA related to yeast L25 (15). Three clones gave a positive signal on hybridization with the probes. The DNA from the plasmids of the three transform- ants was isolated and digested with restriction endonucleases. The sequences of nucleotides were determined in both strands of the cDNA inserts in the plasmids, designated pL23a-1, pL23a-3, and pL23a-4, and in overlapping sequences for each restriction site. The cDNA insert in pL23a-3, apart from a long poly(A) tail, has 517 nucleotides that include a single open reading frame of 470 and a 3”noncoding region of 47. The open reading frame encodes 155 amino acids but lacks an initiation codon and a 5”noncoding sequence. The cDNA inserts in pL23a-1 and pL23a-4 are shorter than pL23a-3 and also lack the 5”terminal region.

The cDNA insert in pL23a-3 was made radioactive and used to screen a third rat cDNA library (kindly provided by Dr. M. Brownstein of the National Institutes of Health). Three colonies gave a positive hybridization signal with this probe. The sequence of nucleotides in the cDNA insert in the longest, pcD-L23a-1, was determined. The cDNA, apart from a long poly(A) sequence, has 533 nucleotides: a 5”noncoding sequence of 15, a single open reading frame of 471, and a 3’- noncoding region of 47. The overlapping sequences in pL23a- 3 and pcD-L23a-1 are identical, and, hence, they are likely to be derived from the same gene. The open reading frame in pcD-L23a-1 starts at an ATG codon at a position that we designate +1 and ends with a termination codon (TAA) at position 469; it encodes 156 amino acids (Fig. 2). The initia- tion codon occurs in a context, A A G a G , which does not depart significantly from the optimum A C C m G (16). There is a hexamer, AATATA, at positions 496-501, 17 nucleotides upstream of the start of the poly(A) stretch. This is an infrequent variant from the canonical AATAAA, which, none- theless, directs posttranscriptional cleavage-polyadenylation of the 3‘ end of the precursor of the mRNA, albeit perhaps with decreased efficiency (17) . The first nine nucleotides in the 5”noncoding sequence are pyrimidines, i.e. CCTTTTTTC (Fig. 2). Pyrimidine sequences are present at the 5’ end of most if not all eukaryotic ribosomal protein mRNAs (1) and presumably play a role in the regulation of their translation (18).

The Primary Structure of Rat Ribosomal Protein L23a“An attempt to identify the rat ribosomal protein encoded in the open reading frame in pcD-L23a-1 was made first by tran- scription of the cDNA, translation of the RNA transcript in a nuclease-treated reticulocyte lysate containing [35S]methi- onine, and inspection of the migration of the radioactive product in two-dimensional polyacrylamide gels containing urea (Fig. 3). A definitive identification could not be made in this way because L23a is not easily distinguished from L21

Page 3: The Primary Structure of Rat Ribosomal Protein L23a

Rat Ribosomal Protein L23a 2757

4316

FIG. 1. The sequence of amino acids encoded in the nucleotides in the 3'-flanking region of the human HLA-F gene. The positions of the nu- cleotides in the 3' region of the HLA-F gene (9) are given aboue the residues. The amino acids that can be derived from the sequence of nucleotides in the three reading frames (a, b, and c ) that are identical to residues in the carboxyl- terminal portion of s. cerevisiae ribo- somal protein L25 (15) are boxed; the conservative changes are underlined. The amino acid sequence in the relevant region of L25 is given.

~ K I L ~ H H V I ~ K F F ( P I L J T I T I E J * J A I V [ ~ ~ AAGCTTGACCACCATGTTATCATCAAGTTTTCCGCTGACCACTGAGTAGGCTGTG~GAAG

b S L T T M L S S S F R * P L S R L * R R c A * P P C Y H Q V S A D H * V G C E E D

Y L 2 5 R . L D S Y K V I E Q P I T S E T A M K K

4256 ~ " .

A T A G A A A A C A A C A G C C T A C T T G T G T T C A C T G T G G A T G T T Z T C a I ~ E ~ N N s L I L V T I V J D V I K A N K ~ H I Q I ]

~ ~~

b * K T T A Y L C S L W M L K P T S T R S c R K Q Q P T C V H C G C * S Q Q A P D Q

Y L 2 5 V E D G N I L V F Q V S M K A N K Y Q I

4196 A A A C A G G C T G T G A A G A A G T T T G T G A T G T G G C C A A G T

~ ~ Q ~ ~ K F V T L M W P K S T L * F S b N R L * R S L * H * C G Q S Q H S D S V c T G C E m V C D I m I A r K V N T L j L Q S

Y L 2 5 K K A V K E L Y E V D V L K V N T L V R P

4136 CTGATGGAGAGAGGAAGGCATATGTTCGACTGGCTCCTGACTACGATGCTTTGGTTGTTG

a L M E R G R H M F D W L L T T M L W L L b * W R E E G I C S T G S * L R C F G C C c D m E R ( K A Y V R L I A P r D Y D A L j V V m

YL25 N G T K K A Y V R L T A D Y D A L D I A

4076

a P P K L G S P K L S Q A G * F Q I Y V Y b H Q N W D H L N * V K L A N S K Y M Y c T K I I ] I T * T E S S W L I P N I C I

C C A C C A A A A T T G G G A T C A C C T A A A C T G A G T C A A G C T G G C T

YL25 N R I G Y I *

CCTTTTTTCGCCAAG -1

+1 30 60 90

1 10 20 30

ATGGCGCCGAAAGCGAAGAAGGAAGCTCCTGCTCCTGCCCCTCCC~GCCG~GCC~GCGAAGGCCTTGAAAGCTAAG~~AGTGCTG~ M A P K A R K E A P A P P K A E A K A K A L K A K K A V L K

120 150 180 GGTGTCCACAGTCACAAAAAGAAGAAGATCCGATCCGAACGTCACCCACTTTCC~GGCCCAAGACCCTGCGGCTCC~AGGCAGCC~TA~ G V H S H K K K K I R T S P T F R R P K T L R L R R Q P K Y

4 0 5 0 60

2 1 0 2 4 0 270 CCTCGAAAGAGTGCACCCAGGAGAAACAAGCTTGACCACTAT~TATCATC~TTCCCACTGACCACCGAGTCAGCTATGAAG~TA P R K S A P R R N K L D H Y A I I K F P L T T E S A M K K I

70 80 90

300 330 360 GAGGACAACAACACGCTTGTGTTCATTGT~ATGTTAA~C~CAA~ACCAGATC~CA~CGTG~CTCTATGATATAGAT E D N N T L V F I V D V K A N K H Q I K Q A V K K L Y D I D

100 110 120

390 4 2 0 450 GTGGCCAAAGTCAATACTCTGATACGGCCTGACGGAC~AGAGAAGAA~ATATGTTC~TT~TCCTGATTATGAT~TCTAGATGTT~C V A K V N T L I R P D G E K K A Y V R L A P D Y D A L D V A

130 140 1 5 0

480 510 AACAAGATTGCGATCATCTATCCA~T~TTAATTCT~TATATACTTTTTTTCCACCAT(A), N K I G I I *

FIG. 2. The sequence of nucleotides in the cDNA insert in plasmid pcD-L23a-1 and the amino acid sequence encoded in the open reading frame. The positions of the nucleotides in the cDNA are given above the residues; the positions of amino acids in protein L23a are below the residues.

Page 4: The Primary Structure of Rat Ribosomal Protein L23a

2758 Rat Ribosomal Protein L23a

and L23 in two-dimensional gels. However, the amino acid sequences of rat L23 (19) and of L21 (20) had already been determined; since they are different from that deduced from the sequence of nucleotides in pcD-L23a-l, the cDNA must encode L23a.

The molecular weight of rat ribosomal protein L23a, cal- culated from the sequence of amino acids deduced from pcD- L23a-1, is 17,684. We do not know whether the NH2-terminal methionine encoded in the L23a mRNA is removed after translation. However, the residue next to the initial methionyl in L23a is alanyl, which has been reported (21) to favor NH2- terminal processing. Protein L23a has a large excess of basic residues (11 arginyl, 30 lysyl, and 4 histidyl) over acidic ones (9 aspartyl and 5 glutamyl). The basic residues tend to be clustered for example, 7 of 9 residues at positions 33-41 and 7 of 13 at positions 47-59 (Fig. 2). There are also a number of hydrophilic regions: for example, 29 of 60 residues at positions 14-73 are charged. L23a lacks cysteine and trypto- phan.

The Number of Copies of the L23a Gene-The cDNA insert in pL23a-3 was made radioactive and used to probe digests of rat liver DNA made with the restriction endonucleases BamHI, EcoRI, or Hind111 (7). The number of hybridization bands suggests that there are 18-20 copies of the L23a gene (data not shown). Many other mammalian ribosomal protein genes have been found to be present in multiple copies (cf. Ref. 1 for references and discussion). However, in no instance

FIG. 3. Electrophoresis in polyacrylamide gels of the prod- uct of the translation of the pcD-L23a-1 transcript. A nuclease- treated rabbit reticulocyte lysate (50 pl) was incubated with [‘?4S] methionine and the RNA transcript (I pg) from the cDNA insert in pcD-L23a-1. A sample (15 pl) of the lysate containing the product of the translation reaction was extracted with 67% acetic acid, and the protein was precipitated with 90% acetone and then supplemented with 80 pg of all the proteins of the 60 S ribosomal subunit. Electro- phoresis was in two dimensions in a polyacrylamide gel containing urea. In a, the gel was stained with Coomassie Brilliant Blue; in b, the radioautograph of the same gel was visualized by fluorography. The less prominent spots in b are aggregates of L23a.

has it been shown that more than one of the genes is func- tional; the presumption is that the other copies are retroposon pseudogenes. The gene located in the 3”flanking region of the HLA-F gene appears to be a human ribosomal protein L23a pseudogene. Although the nucleotide sequence of an active human L23a gene is not available for comparison, it is reasonable to assume that the pseudogene has accumulated nucleotide changes that have produced a number of amino acid substitutions, a termination codon, and a shift in the original reading frame (Fig. 1).

The Size of the mRNA Encoding Rat Ribosomal Protein L23a-To determine the size of the mRNA coding for L23a, poly(A)+ mRNA from rat liver was separated by electropho- resis and screened for hybridization bands using radioactive pL23a-3 cDNA. One distinct band of about 600 nucleotides was detected (data not shown).

Comparison of the Sequence of Amino Acids in Rat L23a with Ribosomal Proteins from Other Species-The sequence of amino acids in rat ribosomal protein L23a was compared, using the computer programs RELATE and ALIGN (13), to the sequences of amino acids in more than 1,000 other ribo- somal proteins contained in a library that we have compiled. The comparison that yielded the highest RELATE score (32.3 S.D. units) was with S. cereuisiae L25 (15). In an alignment there are 88 identities in 142 possible matches (the ALIGN score is 55.0); the two proteins are likely then to have derived from a common ancestral gene. Rat L23a is also related to Candida utilis L25 (15); the RELATE score is 29.4 and the ALIGN score is 56.0 with 89 identities in 141 possible matches. In addition, a group of archaebacterial and eubac- terial members of the L23 family are related to rat L23a. Examples from the family with the RELATE scores in S.D. units include: Methanococcus uannielii L23 (22), 18.2; Bacillus stearothermophilus L23 (23), 9.9; and E. coli L23 (24), 6.8.

The sequence of amino acids in L23a was searched for internal duplications but none were found.

The Use of Homology Search to Identify Yeast Genes That Encode Ribosomal Proteins-The yeast homologs of 48 rat ribosomal proteins are known (Table I); nonetheless, there are 17 rat ribosomal proteins whose amino acid sequences have been determined but for which a related yeast protein has not been identified. Since there can be confidence that there are yeast homologs of these 17 rat ribosomal proteins an attempt was made to locate the genes for the missing yeast ribosomal proteins by searching the GenBank DNA data base. The search was predicated on the possibility that the se- quences of nucleotides in genes for some yeast ribosomal proteins had been determined but not recognized (aspects of this procedure were suggested to us by Dr. B. Baum). There

TABLE I1 Candidate yeast ribosomal protein genes found in the GenBank DNA data base with a TFASTA homology search

Rat ribosomal Yeast ribosomal Accession no. proteins” proteins Yeast sequencesb Location Identities‘

S8 (25) Y s 9 505637 70-kDa heat shock protein gene (30) 5’ region 771131 S13 (26) YS15 M58330 C. maltosa autonomously replicating Unknown 1141144

S26 (27) sequenced

M72716 CDC55 gene (31) 5’ region 32/47 L18a (28) SP-LI7 None ZRCl gene (32) 3’ region 50192 L19 (29) YL14 503724 Mitochondrial C1-tetrahydrofolate 3’ region 811150

synthase gene (33) L21 (20) M21696 Phosphoglucoisomerase 1 gene (34) 5’ region 24/37

a The rat ribosomal proteins that were used for the search; references are in parentheses. * The species is S. cereuisiae unless otherwise indicated; references are in parentheses. Number of identical residues in an alignment of the sequence of amino acids in the rat ribosomal protein with that encoded in the yeast

The sequence is not yet published. gene.

Page 5: The Primary Structure of Rat Ribosomal Protein L23a

Rat Ribosomal Protein L23a 2759

FIG. 4. The sequence of amino acids encoded in the sequence of nu- cleotides in the 3"flanking region of S. cereuisiae mitochondrial CI- tetrahydrofolate synthase gene that are related to residues in rat ribo- somal protein L19. The position of the nucleotides in the 3' region of the mito- chondrial C1-tetrahydrofolate synthase gene (33) are given above the residues. The amino acids that can be derived from the sequence of nucleotides in the three reading frames (a , b, and c) that are identical to residues in the carboxyl- terminal region of rat ribosomal protein L19 (29) are boxed; the conservative changes are underlined.

4 3 5 9 ""

CTCCAGAAACGCCATTAGATTGGTTAAGAACGGTACCATCGTAAAGAAGAGCGTTACC

c P E T P L E I G * E R Y H R K E E R Y R

4299

a-1~ S ~ T ~ A H A Q s KIT~E-]S GTCCACTCTAAATCCAGAACCAGAGCCCATGCTCAATCTAAGGGTCGTCACAGT

b S T L N P E P E P M L N L R E K V V T V c P L * I Q N Q S P C S I * E R R S S Q W

4239

~ ( G Y [ G K R K G T J R E - ~ L ~ S Q ~ V ~ I b V T V R E R V P E K P V Y H P K L S G S c L R * E K G Y Q R S P F T I P S C L D Q

4179

GGTTACGGTAAGAGAAAGGGTACCAGAGAAGCCCGT~TACCATCCCAAGTTGTCTGGA~C

AGAAGATTACGTGTCTTGAGAAGATTA~TGGCTAAGTACCGTGATGCTGGTAA~TTGAC ~ ~ F - F R ] L I R ] ~ L L R R L L J A K I Y R ~ D A G-]

c K I T C L E K I I G * V P * C W * D * Q b E D Y V S * E D Y W L S T V M L V R L T

4119

a K ~ L ~ ~ v ~ ~ K E s ~ G A ( F ~ H ( K ~ A A G C A C T T G T A C C A T G T T T T G T A C A A G G G T A

b S T C T M F C T R N L R V T L S N T R E C A L V P C F V Q G I * G * R F Q T Q E S

4059

a A ~ V ~ I Q ~1-1~ Q ~ E ( K I A ~ N G C C T T G G T T G A A C A C A T C A T C C A A G C T A A G G C T G A T G C T C

b P W L N T S S K L R L M L N V K R L * T c L G * T H H P S * G * C S T * K G F E R

3999

~~

a A E I A E A R R J L ~ N A J A ] D ~ I A Q B GAAGAAGCTGAAGCTAGAAGAT~GAAGAACAGAGCTGCTCG~GACAGAAGAGCTCAAA~

b K K L K L E D * R T E L L V T E E L K E C R S * S * K I E E Q S C S * Q K S S K S

3939

~ ~ A E ~ R ~ A L L ~ E D A * I F F N * F G T T G C T G A A A A G A G A G A T G C T T T A T T G A A G G ~ G A C G C T T ~ T T T T C T T T ~ T T ~ T T T

b L L K R E M L - Y * R K T L K F S L I N L c C * K

are, in addition, 12 rat ribosomal proteins for which a related yeast protein has been identified from a partial amino acid sequence only. On the chance that the genes for these yeast proteins might be identified also the amino acid sequences of the 12 related rat ribosomal proteins were used as well in the search. Several candidate sequences were identified in this way (cf. Table I1 for examples), and we describe in detail the identification of two of them.

An amino acid sequence was deduced from a nucleotide sequence in the DNA downstream from the S. cereuisiae mitochondrial C,-tetrahydrofolate synthase gene (33) that has significant identity with residues in the carboxyl-terminal part of rat ribosomal protein L19 (29). Although, there is a shift in the reading frame there are 81 identities and 16 conservative changes in a sequence of 150 residues (Fig. 4). Since yeasts, in contrast to mammals, have few if any pseu- dogenes, the frame shift might reflect a sequencing error. The codon usage for the amino acid sequence is biased (data not shown) and similar to that for frequently expressed S. cere- visiae genes (35), which increases the likelihood that it en- codes a yeast ribosomal protein. From a comparison of the entire amino acid sequence of rat L19 and the residues in an NH2-terminal fragment of YL14, it was surmised that the two ribosomal proteins are related (1, 14, 36). We conclude then that the gene in the 3"flanking region of the mitochondrial C,-tetrahydrofolate synthase gene encodes yeast ribosomal protein YL14.

E R C F I E G R R L N F L * L I Y

An amino acid sequence that can be derived from a nucleo- tide sequence in a portion of Candida maltosa autonomously replicating DNA' has significant identity with the carboxyl- terminal half of rat ribosomal protein S13 (26). The amino acid sequence encoded in the C. maltosa DNA also contains a frameshift; however, in an alignment with rat S13 there are 114 identities and 12 conservative amino acid changes in a sequence of 144 residues (Fig. 5). The amino acid sequence of rat S13 had earlier been related to residues in an NH2- terminal fragment of S. cereuisiae ribosomal protein YS15 (26, 37); this conforms with the finding that the sequence of the NH2-terminal 34 amino acids of C. maltosa YS15 is very similar to the 34 residues at positions 7-40 of S. cereuisiae YS15 (37). The putative coding sequence for C. maltosa YS15 is preceded by the yeast intron consensus sequence TAC- TAACA.. .TAG (38) (Fig. 5). Yeast ribosomal protein genes commonly have a single intron near the 5' end (39). Thus it is very likely that the gene in the flanking region of C. maltosa autonomously replicating DNA encodes that organism's ri- bosomal protein Ys15.

The determination of the sequence of amino acids in rat L23a is a contribution to a data set which, it is hoped, will eventually include the structure of all the proteins in the ribosomes of this mammalian species. The primary purpose for the accumulation of these data is its use in arriving at a

' K. Sasnauskas, unpublished observations.

Page 6: The Primary Structure of Rat Ribosomal Protein L23a

2760 Rat Ribosomal Protein L23a

7 4 8 i n t r o n ”I

FIG. 5. The sequence of amino acids encoded in the sequence of nu- cleotides in the flanking region of the autonomously replicating DNA of C. maltosa that are related to res- idues in rat ribosomal protein S13. The position of the nucleotides in the GenBank locus YSFARSAA are given abooe the residues. The amino acids that can be derived from the sequence of nu- cleotides in the three possible reading frames (a, b, and c ) that are identical to residues in the carboxyl-terminal region of rat ribosomal protein S13 (26) are boxed; conservative changes are under- lined. The yeast intron consensus se- quence is underlined, and the end of the putative intron is indicated by the uer- tical of the bent arrow.

TGGGATAATTAATACTAACATTTCTTCTTCTTCTTATAGGGGTAAAGGTATTTCTTCCTC

b G I I N T N I S S S S Y R I G K G 1 I m S m a W D N * Y * H F F F F L * G * R Y F F L

C G * L I L T F L L L L I G ’ V K v ” F L P P ~~

-I- w

688 CGCTATTCCATATTCAAAGAAACAAGAAACGCTCCATCTT~TTCAAATTATCTTCTGACGAAGTTG

a R Y S I F K E T L H L G S N Y L L T K L b ~ I ~ f f l S K K R S I L V Q I I F * R S C c L-F H I Q m N A ( P I S I W I F l n 1 S F ] E I V ( V - - -

628 TTGAACAAGTTATCAAATACGCCAGAAAAGGTTTGACTCCATCCCAAATTGGTGTTATCT

a L N K L S N T P E K V * L H P K L V L S b * T S Y Q I R Q K R F D S I P N W C Y L c (-1V I m Y m R I K G L T P S Q I G V I L]

568 TGAGAGATGCTCACGGTGTTTCCCAAGCTAAAATCGTTACCGGTAACMGTTTTAAGAA

a * E M L T V F P K L K S L P V T K F * E b E R C S R C F P S * N R Y R * Q S F K N c [D~A~H G V ~ S ~ A I I V T G N K I V ( L I R ( I ~ - -

508 TCTTAAAATCTAACGGTTTAGCTCCAGAACTCCCAGAAGATTTATACTTCTTGATTAAAA

a S * N L T V * L Q N S Q K I Y T S * L K b L K I * R F S S R T P R R F I L L D * K c I L K S ] N I G L A P]l?(L P E D L Y j F f L I K K1

4 4 8 AAGCTGTCGCTGTCAGAAAACATTTGGAAAGAAACAGMGACAAAGATTCTMTTCA

a K L S L S E N I W K E T E K T K I L N S b S C R C Q K T F G K K Q K R Q R F * I Q c 1 A V A V R K H L E R N R K D K D ] S v l

388 GATTAATTTTGATCGAATCCAGAATCCACAGATT~TAGATACTACAGAACCGTCTCTG

a D * F * S N P E S T D W L D T T E P S L b I N F D R I Q N P Q I G * I L Q N R L C c [ L I L I E S R I H R L A R Y Y ] R m V S ( e l

3 2 8 TCTTGCCACCAAACTGGAAATACGAATCTGCTACTGCTTCTGCTTTAGTCGCTT~GAAG

a S C H Q T G N T N L L L L L L * S L K K b L A T K L E I R I C Y C F C F S R L R c I L P P N W K Y E S j A [ T A S A L V A * I E

solution of the structure of the organelle. However, the infor- mation may also help in understanding the evolution of ri- bosomes, in unraveling the function of the proteins, in uncov- ering the amino acid sequences that direct the proteins to the nucleolus for assembly on nascent rRNA, and in defining the rules that govern the interaction of the proteins and the rRNAs. Indeed, this study provides an example of how the primary structure of one ribosomal protein bears on another and hence helps in this undertaking. Yeast ribosomal protein L25 and E. coli L23 bind to a conserved site in domain I11 of 26 S and 23 S rRNAs (2, 3). Moreover, yeast L25 and E. coli L23 recognize cognate binding sites on the heterologous rRNAs (4). The structural elements in yeast L25 for binding to 26 S rRNA are in the region bounded by residues 62-126 (5). Within this sequence is found the highly conserved motif, KKAYVRL (it occurs in E. coli L23, yeast L25, and rat L23a); the motif, and most particularly the terminal leucyl residue, appear to be critical for association with rRNA ( 5 ) . The related region at the carboxyl terminus of rat L23a is very similar to the amino acid sequence in yeast L25; moreover, rat L23a has the conserved motif, KKAYVRL (position 134- 140); there must be a strong presumption then that it is involved in binding to the homologous site in domain I11 of rat 28 S rRNA.

Acknowledgments-We are grateful to Yuen-Ling Chan and Anton

Gluck for advice and for fruitful discussions, to Joe Olvera for technical assistance, and to Arlene Timosciek for aid in the prepa- ration of the manuscript.

1.

2.

3. 4.

5.

6.

7. 8.

9.

10.

11.

12.

13.

14.

15.

16.

REFERENCES Wool, I. G., Endo, Y. , Chan, Y. L., and Gluck, A. (1990) in The Ribosome:

Structure, Function, and Euolution (Hill, W. E., Dahlherg, A., Garrett, R. A,, Moore, P. B., Schlessinger, D., and Warner, J. R., eds) pp, 203-

El-Baradi, T. T. A. L., Rau6, H. A,, de Re@, V. C. H. F., Verbree, E. C., 214, American Society for Microbiology, Washington, D. C.

Vester, B., and Garrett, R. A. (1984) J. Mol. Bid. 179, 431-452 and Planta, R. J. (1985) EMBO J . 4, 2101-2107

El-Baradi, T. T. A. L., de Re@, V. C. H. F., Planta, R. J., Nierhaus, K. H.,

Rutgers, C. A., Rientjes, J. M. J., van’t Riet, J., and Raue, H. A. (1991) J.

Chan, Y. L., Lin, A,, McNally, J., and Wool, I. G. (1987) J. Biol. Chem.

Chan, Y . L., and Wool, I. G. (1988) J. Biol. Chem. 263,2891-2896 Gluck, A,, Chan, Y. L., Lln, A., and Wool, 1. G. (1989) Eur. J. Bwchem.

Geraghty, D. E., Wei, X., Orr, H. T., and Koller, B. H. (1990) J. Exp. Med.

Beaucage, S. L., and Caruthers, M. H. (1981) Tetrahedron Lett. 22, 1859-

and Raue, H. A. (1987) Biochimie 69 , 939-948

Mol. Btol. 218, 375-385

262 , 12879-12886

182,105-109

171, 1-18

Chan, Y . L, Devi, K. R. G., Olvera, J., and Wool, I. G. (1990) Arch. Biochem.

Pearson. W. R.. and Lipman, D. J. (1988) Proc. Natl. Acad. Sci. U. S. A.

1862

Biophys. 283,546-550

86.2444-2448 Dayhoff, M. 0. (1978) in Atlns of Protein Sequence and Structure (Dayhoff,

M. 0. ed) Vol. 5, Suppl. 3, pp. 1-8, National Biomedical Research

Wool, I. G., Chan, Y . L., Gluck, A,, and Suzuki, K. (1991) Biochirnie 73 , Foundation, Washington, D. C.

Woudt, L. P., Mager, W. H., Beek, J. G., Wassenaar, G. M., and Planta, R. 861-870

Kozak, M. (1986) Cell 44, 283-292 J. (1987) Cum Genet. 12 , 193-198

Page 7: The Primary Structure of Rat Ribosomal Protein L23a

Rat Ribosomal Protein L23a 2761 17. Sheets, M. D., Ogg, S. C., and Wickens, M. P. (1990) Nucleic Acids Res.

18, 5799-5805 (1985) J. Biochem. (Tokyo) 97,983-992

28. Aoyama, Y., Chan, Y. L., Meyuhas, O., and Wool, I. G. (1989) FEBS Lett. 18. Levy, S., Avni, D., Hariharan, N., Perry, R. P., and Meyuhas, 0. (1991)

Proc. Natl. Acad. Sci. U. S. A. 88,3319-3323 29. Chan, Y. L., Lin, A., McNally, J., Peleg, D., Meyuhas, O., and Wool, I. G. 247 , 242-246

19. Chan, Y. L., Paz, V., and Wool, I. G. (1991) Biochem. Biophys. Res. (1987) J. Biol. Chem. 262, 1111-1115 Commun. 178,1153-1159 30. Boorstein, W. R., and Craig, E. A. (1990) J. Biol. Chem. 265,18912-18921

20. Devi, K. R. C., Chan, Y. L., and Wool, I . G. (1989) Biochem. Biophys. Res. 31. Healy, A. M., Zolnierowicz, S., Stapleton, A. E., Goebl, M., DePaoli-Roach, Commun. 1 6 2 , 364-370

21. Flinta, C., Persson, B., Jornvall, H., and von Heijne, G. (1986) Eur. J. 32. Kamizono, A., Nishizawa, M., Teranishi, Y., Murata, K., and Kirnura, A.

22. Kopke, A. K. E., and Wittmann-Liehold, B. (1988) FEBS Lett. 239, 313- 33. Shannon, K. W., and Rabinowitz, J. C. (1988) J. Biol. Chem. 263 , 7717- Biochem. 1 5 4 , 193-196 (1989) Mol. Gen. Genet. 219, 161-167

318 23. Kimura, M., Kimura, J., and Ashman, K. (1985) Eur. J. Biochem. 150, 34. Tekamp-Olson, P., Najarian, R., andBurke, R. L. (1988) Gene (Amst.) 73,

A. A,, and Pringle, J. R. (1991) Mol. Cel. Bid. 11,5767-5780

7725

A91 -A47 153-1 Gl 24. Zurawski, G., and Zurawski, S. M. (1985) Nucleic Acids Res. 13,4521-4526 35. Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H., and 25. Chan, Y. L., Lin, A., Paz, V., and Wool, I. G. (1987) Nucleic Acids Res. 15, Wright, F. (1988) Nucleic Acids Res. 16,8207-8211

26. Suzuki, K., Olvera, J., and Wool, I. G. (1990) Biochem. Biophys. Res. 37. Otaka, E., Higo, K., and Osawa, S. (1982) Biochemistry 21,4545-4550 9451-9459 36. Otaka, E., Higo, K., and Itoh, T. (1983) Mol. Gen. Genet. 191,519-524

27. Kuwano, Y., Nakanishi, O., Nabeshima, Y., Tanaka, T., and Ogata, K. 39. Fink, G. R. (1987) Cell 49,5-6 Commun. 171,519-524 38. Woolford, J. L. (1989) Yeast 5,439-457

."& .I. _YY _"I