When ELFs are ORFs, but don't act like them
-
Upload
jeffrey-lawrence -
Category
Documents
-
view
214 -
download
0
Transcript of When ELFs are ORFs, but don't act like them
![Page 1: When ELFs are ORFs, but don't act like them](https://reader036.fdocuments.net/reader036/viewer/2022082903/575075441a28abdd2e98abe0/html5/thumbnails/1.jpg)
|Letters
When ELFs are ORFs, but don’t act like them
Jeffrey Lawrence
Pittsburgh Bacteriophage Institutes and Dept of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
When they are very small, open reading frames
(ORFs) are among the most difficult features of a
newly sequenced genome to annotate. Although it has
been suggested that degree of conservation of these
sequences among closely related genomes might assist
this process, there are some classes of ORF that will
defy identification because little or none of the protein
sequence is under selection.
In a commentary on the difficulties of proper annotation ofsmall open reading frames (termed ELFs, for ‘evil littlefellows’) in bacterial genomes, Ochman [1] suggested usingthe differential rates of evolution between nonsynonymoussites (Ka, where alteration changes the encoded aminoacid) and synonymous sites (Ks, where alteration leavesthe encoded protein unchanged) to assist in identificationof genes in bacterial genomes. Because nonsynonymoussites are under stronger selection than synonymous sites,the ratio of Ka/Ks can provide a barometer for the likeli-hood that a particular region of DNA is under selection asa protein-coding region. Ochman argues that if the Ka/Ks
ratio approaches 1.0, it is unlikely that the region encodesa protein, observing that 10 of 14 ORFs in the Escherichiacoli genome showing Ka/Ks . 1 are denoted ‘hypothetical’,as are 90% of ORFs ,300 bp in length. This methodcan clearly be of use when the genome sequences of two ormore closely related organisms are available for analysis,increasing the reliability of annotation of small ORFs.
However, two caveats to this approach should be raised.Both stem from the fact that although a low Ka/Ks ratio canbe strong evidence that a proposed ORF is protein coding, ahigh Ka/Ks ratio does not necessarily mean that an ORF isnot protein coding. First, the threshold of Ka/Ks useful forseparating ORFs from ELFs is not necessarily 1.0, andvaries with the divergence between the genomes beingcompared. For example, the average Ka/Ks for genesshared between Escherichia coli and Salmonella entericais 0.07, and few genes (especially those .400 bp long) haveKa/Ks . 0.25 (Fig. 1a); therefore, a threshold of 1.0 is fartoo conservative. To identify which ELFs encode proteins,Ochman used a threshold of two standard deviations (2s)above the mean (m), or 0.389. Because the distribution ofKa/Ks is not Gaussian, this threshold is too conservativein this case. However, if one compares two more-closelyrelated genomes (e.g. S. enterica serovars Typhimuriumand Typhi) the threshold must be set much higher (Fig. 1b);here, them þ 2s threshold appears too liberal. The broaderdistribution of Ka/Ks in this comparison reflects both thesmaller number of substitutions available to infer rates ofevolution, and the lack of time available for naturalselection to remove deleterious mutations. Therefore,more flexible guidelines must be followed.
Second, this approach is not useful in the identificationof many small ORFs that do not evolve as one would expectprotein-coding regions to evolve. That is, this methodwill erroneously dismiss small ORFs that are indeed
Fig. 1. The distribution of Ka/Ks values in comparisons between Salmonella enterica serovar Typhimurium and Escherichia coli (a) and between S. enterica serovars
Typhimurium and Typhi (b). Black bars, genes .400 nucleotides; white bars, all genes. Green arrows indicate the Ka/Ks values for the four leader peptides shown; red
arrows, the mean (m); blue arrows, the threshold suggested by Ochman [1].
TRENDS in Genetics
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
leuL trpLNum
ber
of g
enes
Num
ber
of g
enes
Ka/Ks ratio Ka/Ks ratio
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
thrL pheL
++
µ = 0.073 Threshold = 0.389 µ = 0.16 Threshold = 0.85
0
200
400
600
800
1000
0
200
400
600
800
(a) (b)
Corresponding author: Jeffrey Lawrence ([email protected]).
Update TRENDS in Genetics Vol.19 No.3 March 2003 131
http://tigs.trends.com
![Page 2: When ELFs are ORFs, but don't act like them](https://reader036.fdocuments.net/reader036/viewer/2022082903/575075441a28abdd2e98abe0/html5/thumbnails/2.jpg)
translated, but where the entire primary sequence of theprotein is not the information that is under selection. Inmany cases, the number of amino acids under selectioncould be few or none, which can lead to the appearance of ahigh Ka/Ks ratio, even though there is selection forexpression of the small protein sequence. For example,leader peptides are small proteins that have critical rolesin gene expression through translational control [2]. Here,pausing of the ribosome during translation of a leaderprotein may allow for the formation of an anti-terminatorRNA structure, thereby allowing transcription of thedownstream genes in an operon. If the ribosome does notpause, a rho-independent terminator will form and theoperon is not expressed. Leader peptides in the Escher-ichia coli genome control the thr, leu, trp and phe operons,and their Ka/Ks ratios are far greater than expected forprotein-coding genes (three of them are even greater thanOchman’s conservative m þ 2s threshold, Fig. 1a), whichwould seem to indicate that these small ORFs do notencode proteins. In reality, only amino acids responsiblefor ribosome pausing when a charged tRNA becomesdepleted (e.g. two tryptophan codons in the trpL gene)might be under selection for amino acid identity. Theadditional residues of the leader peptide are not underselection for the function of the protein (although they canparticipate in mRNA secondary structure formation).Similar leader peptides are found upstream of the tnaAtryptophanase gene [3,4] and chloramphenicol resistancegenes [5] and operate by different methods. Similar smallupstream ORFs occur in eukaryotic genomes (e.g. the 25-codon arginine-sensing peptide cotranscribed with theSaccharomyces cerevisiae CPA1 gene [6]). But the end resultis the same: the short length, poor conservation and oftenunusual composition of these leader peptides can easily leadone to dismiss the coding potential of their genes.
In some cases, the majority of a peptide could be dis-posable, also leading to high Ka/Ks ratios. Although smalldisposable portions are commonly seen among leadersequences for secreted proteins, or pro-proteins made ininactive states (e.g. Bacillus pro-s factors); the mostextreme case might be the pqqA peptide, which is over-produced relative to other proteins in the Klebsiella pqq
operon and may serve as the substrate for the synthesis ofthe cofactor PQQ [7,8]. Here, the amino acids glutamateand tyrosine may be cleaved from the peptide backboneand serve as the substrate for cofactor biosynthesis, andthe remaining residues may serve as a scaffold. Sometranscribed regions may not encode polypeptides at all.The Ka/Ks test is useless in the identification of smallfunctional RNAs, which can have important roles incellular metabolism, and may be great in number [9].Some of the regions designated as small ORFs – eventhose with genetic evidence for their importance – may actthrough an RNA product.
The manifold ways small protein products affectcellular metabolism make their identification onerous,and sometimes even comparisons with closely relatedgenomes cannot aid in their unambiguous identification.In these cases, hands-on experimentation could be theonly route towards gene discovery and the potentiallyfascinating insights in molecular biology that can result.
References
1 Ochman, H. (2002) Distinguishing the ORFs from the ELFs: shortbacterialgenes and theannotation ofgenomes.TrendsGenet.18,335–337
2 Yanofsky,C.(1988)Transcriptionattenuation.J.Biol.Chem.263,609–6123 Stewart, V. and Yanofsky, C. (1986) Role of leader peptide synthesis in
tryptophanase operon expression in Escherichia coli K-12. J. Bacteriol.167, 383–386
4 Gong, F. and Yanofsky, C. (2002) Analysis of tryptophanase operonexpression in vitro: accumulation of TnaC-peptidyl-tRNA in a releasefactor 2-depleted S-30 extract prevents Rho factor action, simulatinginduction. J. Biol. Chem. 277, 17095–17100
5 Lovett, P.S. and Rogers, E.J. (1996) Ribosome regulation by the nascentpeptide. Microbiol. Rev. 60, 366–385
6 Pierard, A. and Schroter, B. (1978) Structure–function relationships inthe arginine pathway carbamoylphosphate synthase of Saccharomycescerevisiae. J. Bacteriol. 134, 167–176
7 Velterop, J.S. et al. (1995) Synthesis of pyrroloquinoline quinone in vivoand in vitro and detection of an intermediate in the biosyntheticpathway. J. Bacteriol. 177, 5088–5098
8 Meulenberg, J.J.M. et al. (1992) Nucleotide sequence and structure ofthe Klebsiella pneumoniae pqq operon. Mol. Gen. Genet. 232, 284–294
9 Wassarman, K.M. et al. (2001) Identification of novel small RNAs usingcomparative genomics and microarrays. Genes Dev. 15, 1637–1651
0168-9525/03/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved.PII: S0168-9525(02)00038-0
cGMP signalling: different ways to create a pathway
Jeroen Roelofs1, Janet L. Smith2 and Peter J.M. Van Haastert3
1Department of Cell Biology, Harvard Medical School, 240 Longwood Avenue, Boston, Massachusetts 02115-5730, USA2Boston Biomedical Research Institute, 64 Grove Street, Watertown, Massachusetts 02472-2829, USA3Dept of Biochemistry, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
Recently, a novel cGMP signalling cascade was uncovered
in Dictyostelium, a eukaryote that diverged from the line-
age leading to metazoa after plants and before yeast. In
both Dictyostelium and metazoa, the ancient cAMP-bind-
ing (cNB) motif of bacterial CAP has been modified and
assembled with other domains into cGMP-target pro-
teins. The domain structures of these cGMP targets, as
well as the enzymes responsible for cGMP synthesis and
degradation, are entirely different between DictyosteliumCorresponding author: Peter J.M. Van Haastert
Update TRENDS in Genetics Vol.19 No.3 March 2003132
http://tigs.trends.com