Proteomics Reveals Open Reading Frames in Mycobacterium ...

3
INFECTION AND IMMUNITY, 0019-9567/01/$04.0010 DOI: 10.1128/IAI.69.9.5905–5907.2001 Sept. 2001, p. 5905–5907 Vol. 69, No. 9 Copyright © 2001, American Society for Microbiology. All Rights Reserved. Proteomics Reveals Open Reading Frames in Mycobacterium tuberculosis H37Rv Not Predicted by Genomics PETER R. JUNGBLUT, 1 * EVA-CHRISTINA MU ¨ LLER, 2 JENS MATTOW, 3 AND STEFAN H. E. KAUFMANN 3 Core Facility for Protein Analysis 1 and Department of Immunology, 3 Max Planck Institute for Infection Biology, and Protein Chemistry, Max Delbru ¨ck Center, 2 Berlin, Germany Received 23 February 2001/Returned for modification 27 March 2001/Accepted 25 May 2001 Genomics revealed the sequence of 3924 genes of the H37Rv strain of Mycobacterium tuberculosis. Proteomics complements genomics in showing which genes are really expressed, and here we show the expression of six genes not predicted by genomics, as proved by two-dimensional electrophoresis and matrix-assisted laser desorption ionization and nano-electrospray mass spectrometry. Each year eight million new cases and two million deaths are caused by tuberculosis (5). Therefore, the World Health Or- ganization (WHO) declared tuberculosis to be a global emer- gency, and new strategies toward the prevention and therapy are urgently required. Six years after the first publication of a complete bacterial genome (3), the complete genomes of 38 microorganisms have been sequenced (http://www-fp.mcs.anl .gov/;gaasterland/genomes.html and http://www.tigr.org/tdb /mdb/mdbcomplete.html), including Mycobacterium tuberculo- sis strain H37Rv (1). The sequencing of the genome of a clinical isolate of M. tuberculosis, CDC1551, is also nearly com- plete (http://www.tigr.org/tdb/CMR/gmt/htmls/SplashPage.html). The proteome reflects the functional status of a cell in re- sponse to environmental stimuli and thus serves as a valuable complement to genomics. In searching for novel strategies for immune intervention, we have initiated a systematic proteome investigation by comparing the protein compositions of viru- lent M. tuberculosis strains with attenuated vaccine strains (4). Approximately 1,800 protein spots were separated by two- dimensional electrophoresis (2-DE) and, despite the similarity of the overall patterns, distinct and reproducible differences were detected between the strains. Only 1/2 variants were accepted, which occurred in all gels of independent prepara- FIG. 1. Sector 5 of M. tuberculosis H37Rv 2-DE pattern. Proteins were stained with silver nitrate. The M r range between 6 and 15 kDa and the pI range between 4 and 6 are shown. The spots numbered were sequenced de novo by nanospray MS/MS and revealed ORFs not predicted previously. * Corresponding author. Mailing address: Max Planck Institute for Infection Biology, Core Facility for Protein Analysis, Schumannstr. 21-22, D-10117 Berlin, Germany. Phone: 49-30-28460133. Fax: 49-30- 2846-0507. E-mail: [email protected]. FIG. 2. MS analysis of spot 5_98. (a) Spectrum of the trypsinized protein. Labeled peptides were fragmented to obtain sequence infor- mation. (b) fragmentation pattern of the peptide with an m/z of 708.36 identified as VEIEVDDDLIQK. 5905 on March 17, 2018 by guest http://iai.asm.org/ Downloaded from

Transcript of Proteomics Reveals Open Reading Frames in Mycobacterium ...

Page 1: Proteomics Reveals Open Reading Frames in Mycobacterium ...

INFECTION AND IMMUNITY,0019-9567/01/$04.0010 DOI: 10.1128/IAI.69.9.5905–5907.2001

Sept. 2001, p. 5905–5907 Vol. 69, No. 9

Copyright © 2001, American Society for Microbiology. All Rights Reserved.

Proteomics Reveals Open Reading Frames in Mycobacteriumtuberculosis H37Rv Not Predicted by Genomics

PETER R. JUNGBLUT,1* EVA-CHRISTINA MULLER,2 JENS MATTOW,3

AND STEFAN H. E. KAUFMANN3

Core Facility for Protein Analysis1 and Department of Immunology,3 Max Planck Institute for Infection Biology, andProtein Chemistry, Max Delbruck Center,2 Berlin, Germany

Received 23 February 2001/Returned for modification 27 March 2001/Accepted 25 May 2001

Genomics revealed the sequence of 3924 genes of the H37Rv strain of Mycobacterium tuberculosis. Proteomicscomplements genomics in showing which genes are really expressed, and here we show the expression of sixgenes not predicted by genomics, as proved by two-dimensional electrophoresis and matrix-assisted laserdesorption ionization and nano-electrospray mass spectrometry.

Each year eight million new cases and two million deaths arecaused by tuberculosis (5). Therefore, the World Health Or-ganization (WHO) declared tuberculosis to be a global emer-gency, and new strategies toward the prevention and therapyare urgently required. Six years after the first publication of acomplete bacterial genome (3), the complete genomes of 38microorganisms have been sequenced (http://www-fp.mcs.anl.gov/;gaasterland/genomes.html and http://www.tigr.org/tdb/mdb/mdbcomplete.html), including Mycobacterium tuberculo-sis strain H37Rv (1). The sequencing of the genome of aclinical isolate of M. tuberculosis, CDC1551, is also nearly com-plete (http://www.tigr.org/tdb/CMR/gmt/htmls/SplashPage.html).

The proteome reflects the functional status of a cell in re-sponse to environmental stimuli and thus serves as a valuablecomplement to genomics. In searching for novel strategies forimmune intervention, we have initiated a systematic proteomeinvestigation by comparing the protein compositions of viru-lent M. tuberculosis strains with attenuated vaccine strains (4).Approximately 1,800 protein spots were separated by two-dimensional electrophoresis (2-DE) and, despite the similarityof the overall patterns, distinct and reproducible differenceswere detected between the strains. Only 1/2 variants wereaccepted, which occurred in all gels of independent prepara-

FIG. 1. Sector 5 of M. tuberculosis H37Rv 2-DE pattern. Proteinswere stained with silver nitrate. The Mr range between 6 and 15 kDaand the pI range between 4 and 6 are shown. The spots numbered weresequenced de novo by nanospray MS/MS and revealed ORFs notpredicted previously.

* Corresponding author. Mailing address: Max Planck Institute forInfection Biology, Core Facility for Protein Analysis, Schumannstr.21-22, D-10117 Berlin, Germany. Phone: 49-30-28460133. Fax: 49-30-2846-0507. E-mail: [email protected].

FIG. 2. MS analysis of spot 5_98. (a) Spectrum of the trypsinizedprotein. Labeled peptides were fragmented to obtain sequence infor-mation. (b) fragmentation pattern of the peptide with an m/z of 708.36identified as VEIEVDDDLIQK.

5905

on March 17, 2018 by guest

http://iai.asm.org/

Dow

nloaded from

Page 2: Proteomics Reveals Open Reading Frames in Mycobacterium ...

tions of six virulent and six attenuated strains. A total of 263proteins were identified by Matrix-assisted laser desorptionionization (MALDI) mass spectrometry (MS) and a bioinfor-matics platform was constructed to store our data and connectit by hyperlinks with the genomics data (10) (http://www.mpiibberlin.mpg.de/2D-PAGE/). Using this proteome approach,namely, a combination of 2-DE (6) and MS, we detected sixgenes previously not predicted in the genome of M. tuberculosisH37Rv. Our data demonstrate the value of proteomics in iden-tifying gene products undetected by the genomics approach.

M. tuberculosis H37Rv was grown in Middlebrook mediumfor 6 to 8 days to a cell density of 1 3 108 to 2 3 108 cells/ml.The cells were washed and sonicated in the presence of pro-teinase inhibitors, and the proteins were treated with urea,dithiothreitol, and Triton X-100 to obtain final concentrationsof 9 M, 70 mM, and 2%, respectively (4). Up to 900 mg ofproteins were separated in preparative 2-DE gels (23 by 30 cm)and stained with Coomassie brilliant blue (CBB) G-250 (2).Spot positions were assigned to the standard 2-DE pattern, inwhich proteins are detected by silver staining. Given that pro-teins are detectable by CBB, the sequence coverage is superiorwhen CBB-stained spots are the starting material comparedto the use of silver-stained spots (11). Therefore, we startedidentification with CBB-stained spots. Peptide mass finger-prints were obtained by tryptic in-gel digestion and MALDIMS (Voyager Elite; Perseptive Biosystems, Framingham,Mass.) (7). Sequence information resulted from nanoelectro-

spray-tandem MS (nano-ESI-MS/MS) (Q-TOF; Micromass,Manchester, United Kingdom). The sequence tag method (8)was used to search the proteins in a translated protein se-quence database (http://195.41.108.38/PA_PeptidePatternForm.html). If no protein matched, de novo sequencing was per-formed. Then the tBLASTN program of the National Centerfor Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov:80/blast.cgi?Jform51) and the sequence search pro-gram of the Institute for Genome Research (TIGR) (http://www.tigr.org/tdb/CMR/gmt/htmls/SeqSearch.html) were ap-plied to search within the entire genome of M. tuberculosisH37Rv and the clinical isolate CDC1551. Detailed investiga-tions were focused on 190 spots in the pI range from 4 to 6 andthe Mr range from 6 to 15 kDa representing about one-sixth ofthe whole 2-DE gel and one-tenth of all spots of the completegel (9). Sixty-two 2-DE spots were identified by their peptidemass fingerprints, and ten further spots needed sequence in-formation by n-ESI-MS/MS for their identification. Elevenspots contained more than one protein. Ten genes gave rise tomore than one protein species. Within this sector of the gel(Fig. 1) sequences of six proteins could not be assigned togenes of M. tuberculosis H37Rv. As an example for the MSanalysis, the identification of spot 5_98 is shown in Fig. 2a withthe MS spectrum of the peptide mixture after digestion withtrypsin, and in Fig. 2b with the MS/MS spectrum obtained byfragmentation of one peptide. Open reading frames (ORFs)were found in the genome of the strain CDC1551 for five spots,

TABLE 1. Protein identification by n-ESI-MS/MS (boldface residues) and MALDI MS (underlined residues) of previouslyunpredicted ORFs of M. tuberculosis H37Rv

Spot H37Rv Sequence Sanger EMBL accession no. ORF detectedin CDC1551

Comparison of H37Rv andCDC1551 Mr pI

5_37 GGAPVARVVV HVMPKAEILD Z80226 (32260–32508) 100% identity 8,872 4.5POGQAIVGAL GRLGHLGISDVRQGKRFELE VDDTVDDTTLAEIAESLLAN TVIEDWTISRDPQ

5_53 MPMEGATVEV KIGITDSPRE Z95120 (5517–5311) 03128 100% identity 10,118 4.9LVFSSAQTPS EVEELVSNALRDDSGLLTLT DERGRRFLIHTARIAYVEIG VADARRVGFGVGVDAAAGSA GKVATSG

5_98 LGSDCGCGGY LWSMLKRVEI Z92772 (17111–17359) D0043 Leu-13Met-1 in CDC1551 9,403 4.9EVDDDLIQKV IRRYRVKGAREAVNLALRTL LGEADTAEHGHDDEYDEFSD PNAWVPRRSRDTG

5_115 PVTVYRRGMA VLTDEQVDAA Z95584 (24791–24486) 06120 Pro-1-Val-23Met-1 in CDC1551 11,309 5.9LHDLNGWQRA GGVLRRSIKFPTFMAGIDAV RRVAERAEEVNHHPDIDIRW RTVTFALVTHAVGGITENDI AMAHDIDAMFGA

5_123 VQEGGPQETM SARSTQHDAA AL021646 (44673–44494) 03103 Val-13Met-1 in CDC1551 7,253 4.9DALFRAIIET LDKHRNERTLTEDVLDTLAR AYASISTNVPEQGRLG

5_139 MSNHTYRVIE IVGTSPDGVD Z79701 (17944–17735) 00401 100% identity 7,629 5.8AAIQGGLARA AQTMRALDWFEVQSIRGHLV DGAVAHFQVTMKVGFRLEDS

5906 NOTES INFECT. IMMUN.

on March 17, 2018 by guest

http://iai.asm.org/

Dow

nloaded from

Page 3: Proteomics Reveals Open Reading Frames in Mycobacterium ...

and no ORF was found for one spot (Table 1). A search in thegenome of M. tuberculosis H37Rv revealed the presence ofthese DNA sequences, suggesting that the ORFs were notrecognized by the search algorithms used by Cole et al. (1).The predicted Mr values from theoretical gene sequences arein the same range as the ones estimated by 2-DE. Three of thegene sequences are completely identical between H37Rv andCDC1551 (5_53, 5_139, and 5_37). The reasons for the failureof detection of these ORFs in H37Rv remain elusive. In con-trast, the exchange of methionine in position 1 in 5_98, 5_123,and 5_115 by leucine, valine, and proline-valine, respectively,may have prevented the detection of the starting codon. Spot5_53 contains two further proteins: 14-kDa antigen (SwissProt:14KD_MYCTU) and hypothetical protein Rv2626c (PIR:A70573). The protein of spot 5_37 was predicted neither in theH37Rv nor CDC1551 genome so far. A hypothetical M. lepraeprotein (SwissProt: Y525_MYCLE) shows 83.5% similarity tothe new ORF. Recently, a sequence as part of an U.S. patentwas published (EMBLNEW: AX023830) identical to the se-quence of spot 5_53 without the residues 1 to 7 and methionineinstead of valine as residue 8.

MALDI MS proved highly effective in the rapid identifica-tion of the main components of a 2-DE gel, if the proteins areknown in a sequence database. A more detailed analysis ofspots in 2-DE gels by nano-ESI-MS/MS elucidated additionalproteins per spot and additional genes not predicted fromgenome investigations. Our findings illustrate the value of pro-teomics in complementing genomics in both functional andgenomic analyses. Proteomics is a further building block tounravel the molecular network in bacterium-host interactions,a prerequisite for the development of new vaccines to fightagainst infectious diseases like tuberculosis.

This work was supported by Chiron Behring, Marburg, Germany,and the WHO (Global Programme for Vaccines and Immunization–Vaccine Research and Development).

REFERENCES

1. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V.Gordon, K. Eiglmeier, S. Gas, C. E. Barry, F. Tekaia, K. Badcock, D.Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T.Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, and B. G.Barrell. 1998. Deciphering the biology of Mycobacterium tuberculosis fromthe complete genome sequence. Nature 393:537–544.

2. Doherty, N. S., B. H. Littman, K. Reilly, A. C. Swindell, J. M. Buss, and N. L.Anderson. 1998. Analysis of changes in acute-phase plasma proteins in anacute inflammatory response and in rheumatoid arthritis using two-dimen-sional gel electrophoresis. Electrophoresis 19:355–363.

3. Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness,A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, K.Mckenney, G. Sutton, W. Fitzhugh, C. Fields, J. D. Gocayne, J. Scott, R.Shirley, L. I. Liu, A. Glodek, J. M. Kelley, J. F. Weidman, C. A. Phillips, T.Spriggs, E. Hedblom, M. D. Cotton, J. C. Venter, et al. 1995. Whole-genomerandom sequencing and assembly of Haemophilus influenzae RD. Science269:496–511.

4. Jungblut, P. R., U. E. Schaible, H.-J. Mollenkopf, U. Zimny-Arndt, B. Rau-pach, J. Mattow, P. Halada, S. Lamer, K. Hagens, and S. H. E. Kaufmann.1999. Comparative proteome analysis of Mycobacterium tuberculosis andMycobacterium bovis BCG strains: towards functional genomics of microbialpathogens. Mol. Microbiol. 33:1103–1117.

5. Kaufmann, S. H. E. 2000. Is the development of a new tuberculosis vaccinepossible? Nat. Med. 6:955–960.

6. Klose, J., and U. Kobalz. 1995. Two-dimensional electrophoresis of proteins:an updated protocol and implications for a functional analysis of the ge-nome. Electrophoresis 16:1034–1059.

7. Lamer, S., and P. R. Jungblut. 2001. Matrix-assisted laser desorption-ion-ization mass spectrometry peptide mass fingerprinting for proteome analysis:identification efficiency after on-blot or in-gel digestion with and withoutdesalting procedures. J. Chromatogr. B 752:311–322.

8. Mann, M., and M. Wilm. 1994. Error tolerant identification of peptides insequence databases by peptide sequence tags. Anal. Chem. 66:4390–4399.

9. Mattow, J., P. R. Jungblut, E.-C. Muller, and S. H. E. Kaufmann. 2001.Identification of acidic, low molecular mass proteins of Mycobacterium tu-berculosis strain H37Rv by MALDI- and ESI-mass spectrometry. Proteomics1:494–507.

10. Mollenkopf, H.-J., P. R. Jungblut, B. Raupach, J. Mattow, S. Lamer, U.Zimny-Arndt, U. E. Schaible, and S. H. E. Kaufmann. 1999. A dynamictwo-dimensional polyacrylamide gel electrophoresis database: the mycobac-terial proteome via the internet. Electrophoresis 20:2172–2180.

11. Scheler, C., S. Lamer, Z. Pan, X.-P. Li, J. Salnikow, and P. Jungblut. 1998.Peptide mass fingerprint sequence coverage from differently stained proteinson 2-DE patterns by MALDI-MS. Electrophoresis 19:918–927.

Editor: R. N. Moore

VOL. 69, 2001 NOTES 5907

on March 17, 2018 by guest

http://iai.asm.org/

Dow

nloaded from