Using Ion Mobility Data to Improve Peptide Identification ......Amino Acid Size Parameters ......

12
Published: March 21, 2011 r2011 American Chemical Society 2318 dx.doi.org/10.1021/pr1011312 | J. Proteome Res. 2011, 10, 23182329 ARTICLE pubs.acs.org/jpr Using Ion Mobility Data to Improve Peptide Identification: Intrinsic Amino Acid Size Parameters Stephen J. Valentine, Michael A. Ewing, Jonathan M. Dilger, Matthew S. Glover, Scott Geromanos, Chris Hughes, § and David E. Clemmer* ,Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States Waters Corporation, Milford, Massachusetts 01757, United States § Waters Corporation, Manchester, United Kingdom b S Supporting Information INTRODUCTION Since the inception of methods to identify peptide ions by tandem mass spectrometry (MS/MS) techniques, 13 there has been a rapid advance in mass spectrometric instrumentation development. These advances are in large part spurred by the need to increase the overall numbers of identied peptides and proteins in proteomics experiments to provide the necessary increased protein complement coverage for accurate and relevant comparative analyses. Over the last 15 years, improvements in mass spectrometry (MS) instrumentation have resulted in increased numbers of assigned peptide ions obtained from liquid chromatography (LC)MS/MS experiments for complex pro- teomics samples; 46 in the characterization of human plasma digests, 712 numbers of assigned peptide ions in a given experi- ment have increased by nearly 2 orders of magnitude over this time period. Although improvements in instrumentation sensitivity and speed have enabled increased numbers of peptides to be identi- ed, a problem of false identication has persisted. The problem is so pervasive in the eld that there has been a push to standardize proteomics reporting consisting of the establishment of guidelines for disclosure of statistical analyses used to establish the accuracy of assignments. 13 In part, instrumentation improvements lead to the intransigence of the false identication problem as lower-signal species move into identication range with increased analytical performance capabilities. Typically such species produce lower-quality spectra leading to suspect assign- ments. There is a constant need to develop methods to improve the condence of peptide ion assignments. To dramatically improve the accuracy of assignments in proteomics studies, the measurement of new characteristics attributable to data set features is required. As an example consider the enabling eect of MS/MS experiments. Whereas, the precursor ion mass is insucient to allow identication of peptide ions in complex proteomics samples, the addition of MS/ MS information allows accurate assignments in many cases. A question that arises is how will the new distinguishing character- istics be produced? Some advocate chemometric approaches to elucidate distinguishing characteristics buried in proteomics data sets. For example, ongoing work consists of eorts to predict ion fragmentation distributions (including ion intensities) 1425 as well as LC retention 2631 in order to provide increased identi- cation accuracy. Finally, improved separations of data set Received: November 10, 2010 ABSTRACT: A new method for enhancing peptide ion identication in proteomics analyses using ion mobility data is presented. Ideally, direct comparisons of experimental drift times (t D ) with a standard mobility database could be used to rank candidate peptide sequence assignments. Such a database would represent only a fraction of sequences in protein databases and signicant diculties associated with the verication of data for constituent peptide ions would exist. A method that employs intrinsic amino acid size parameters to obtain ion mobility predictions that can be used to rank candidate peptide ion assignments is proposed. Intrinsic amino acid size parameters have been determined for doubly charged peptide ions from an annotated yeast proteome. Predictions of ion mobilities using the intrinsic size parameters are more accurate than those obtained from a polynomial t to t D versus molecular weight data. More than a 2-fold improvement in prediction accuracy has been observed for a group of arginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ion identication is discussed, and a simple peptide ion scoring scheme is presented. KEYWORDS: ion mobility spectrometry, mass spectrometry, peptide identication

Transcript of Using Ion Mobility Data to Improve Peptide Identification ......Amino Acid Size Parameters ......

  • Published: March 21, 2011

    r 2011 American Chemical Society 2318 dx.doi.org/10.1021/pr1011312 | J. Proteome Res. 2011, 10, 2318–2329

    ARTICLE

    pubs.acs.org/jpr

    Using Ion Mobility Data to Improve Peptide Identification: IntrinsicAmino Acid Size ParametersStephen J. Valentine,† Michael A. Ewing,† Jonathan M. Dilger,† Matthew S. Glover,† Scott Geromanos,‡

    Chris Hughes,§ and David E. Clemmer*,†

    †Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States‡Waters Corporation, Milford, Massachusetts 01757, United States§Waters Corporation, Manchester, United Kingdom

    bS Supporting Information

    ’ INTRODUCTION

    Since the inception of methods to identify peptide ions bytandem mass spectrometry (MS/MS) techniques,1�3 there hasbeen a rapid advance in mass spectrometric instrumentationdevelopment. These advances are in large part spurred by theneed to increase the overall numbers of identified peptides andproteins in proteomics experiments to provide the necessaryincreased protein complement coverage for accurate and relevantcomparative analyses. Over the last 15 years, improvements inmass spectrometry (MS) instrumentation have resulted inincreased numbers of assigned peptide ions obtained from liquidchromatography (LC)�MS/MS experiments for complex pro-teomics samples;4�6 in the characterization of human plasmadigests,7�12 numbers of assigned peptide ions in a given experi-ment have increased by nearly 2 orders of magnitude over thistime period.

    Although improvements in instrumentation sensitivity andspeed have enabled increased numbers of peptides to be identi-fied, a problem of false identification has persisted. The problemis so pervasive in the field that there has been a push tostandardize proteomics reporting consisting of the establishmentof guidelines for disclosure of statistical analyses used to establishthe accuracy of assignments.13 In part, instrumentation

    improvements lead to the intransigence of the false identificationproblem as lower-signal species move into identification rangewith increased analytical performance capabilities. Typically suchspecies produce lower-quality spectra leading to suspect assign-ments. There is a constant need to develop methods to improvethe confidence of peptide ion assignments.

    To dramatically improve the accuracy of assignments inproteomics studies, the measurement of new characteristicsattributable to data set features is required. As an exampleconsider the enabling effect of MS/MS experiments. Whereas,the precursor ion mass is insufficient to allow identification ofpeptide ions in complex proteomics samples, the addition ofMS/MS information allows accurate assignments in many cases. Aquestion that arises is how will the new distinguishing character-istics be produced? Some advocate chemometric approaches toelucidate distinguishing characteristics buried in proteomics datasets. For example, ongoing work consists of efforts to predict ionfragmentation distributions (including ion intensities)14�25 aswell as LC retention26�31 in order to provide increased identi-fication accuracy. Finally, improved separations of data set

    Received: November 10, 2010

    ABSTRACT: A new method for enhancing peptide ion identification inproteomics analyses using ion mobility data is presented. Ideally, directcomparisons of experimental drift times (tD) with a standard mobilitydatabase could be used to rank candidate peptide sequence assignments.Such a database would represent only a fraction of sequences in proteindatabases and significant difficulties associated with the verification of datafor constituent peptide ions would exist. A method that employs intrinsicamino acid size parameters to obtain ion mobility predictions that can beused to rank candidate peptide ion assignments is proposed. Intrinsic aminoacid size parameters have been determined for doubly charged peptide ionsfrom an annotated yeast proteome. Predictions of ion mobilities using theintrinsic size parameters are more accurate than those obtained from apolynomial fit to tD versus molecular weight data. More than a 2-foldimprovement in prediction accuracy has been observed for a group ofarginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ionidentification is discussed, and a simple peptide ion scoring scheme is presented.

    KEYWORDS: ion mobility spectrometry, mass spectrometry, peptide identification

    http://pubs.acs.org/action/showImage?doi=10.1021/pr1011312&iName=master.img-000.jpg&w=192&h=137

  • 2319 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    components can be used to enhance peptide ion assignments.Examples include the use of increased mass accuracy permittingmore stringent mass matching thresholds for protein databasesearches32�34 as well as precursor and fragment ion intensitymatching that includes the use of LC retention time profiles.35,36

    The work presented here describes the use of an additionalprecursor ion trait—ion mobility—to evaluate peptide ionassignments. Specifically, the use of mobility data obtained fromLC�MS/MS analyses of the yeast proteome is evaluated as ameans for improving peptide ion identification. Briefly, similar toion mobility spectrometry (IMS) experiments performed pre-viously,37�40 peptide ion composition is related to measured ionmobilities to determine the general effect that the presence ofspecific amino acid residues have on the overall mobilities ofdatabase ions. Upon establishing this relationship for groups ofpeptide ions, the ability to match drift times (tD) with peptideions based solely on amino acid composition has been evaluated.A simple peptide ion identification scoring scheme for datathat can be produced on current commercial instrumentation(Synapt HDMS,Waters) is discussed. Finally, it is noted that thiswork is related to that attempting to predict tDs of peptide ionsusing artificial neural networks (ANNs).41

    ’EXPERIMENTAL SECTION

    GeneralData from the analysis of a yeast proteome was provided by

    Waters Corporation. IMS techniques,42�46 instrumentation,47�54

    and theory,55�59 as well as the combination of LC with IMS-MSinstrumentation60�65 have been discussed elsewhere. Here only abrief description of methods related to the collection of the trypticdigest data is presented.

    Eight-hundred nanagrams of a tryptic digest of S. cerevisae wasinjected onto a Trapping and Nanoscale column configurationusing a nanoACQUITY (Waters) UPLC system. Peptides wereseparated on the UPLC prior to being electrosprayed into theentrance orifice of the Synapt HDMS (Waters) instrument.Peptide ions were stored in the Trap Traveling Wave (T-Wave)located at the front of the IMS (Ion Mobility Separation)T-Wave device. Periodically, ion packets from the Trap T-Wavewere pulsed into the IMS T-Wave cell where ions were separateddue to their mobilities through a buffer gas (N2 for theseexperiments) under the influence of a drift voltage that is rapidlytransmitted along adjacent electrostatic lenses in the IMST-Wave cell. The repetition of this voltage transmission(wave) provides periodic separation of ions according to theirmobilities. Most ions have mobilities that are lower than thetransmission rate of the T-Wave voltage, causing them to “roll”back and be separated in subsequent waves. The residence timesin the T-Wave cell can be calibrated to ion mobilities and thus tocollision cross sections. After exiting the IMS T-Wave cell, ionsare transmitted through a Transfer T-Wave collision cell into atime-of-flight (TOF) MS device for mass analysis. The collisionenergy of the Transfer T-Wave is increased on an alternate scanbasis producing approximately 10 low energy and 10 high energyspectra across each chromatographic peak.

    Yeast Digest SamplesYeast strain W303 (MATa ura3�52 leu2�3 leu2�112

    trp1�1 ade2�1 his3�11 can1�100) was grown at 30 �C toexponential phase (A600 = 0.8) in rich YEPD medium (2% w/vglucose, 2% w/v bactopeptone, 1% w/v yeast extract). Cells were

    harvested by centrifugation andwashedwith water to remove anytraces of growth medium. Cells were resuspended in ice-coldwater and broken with glass beads using a Minibead beater(Biospec Products, Bartlesville, OK) for 40 s at 4 �C. Cell debriswas pelleted in a microcentrifuge for 15 min (13 000 rpm; 4 �C)and supernatants collected for further analysis.

    Four-hundred micrograms of protein was suspended in 44 μLof 50 mM ammonium bicarbonate solution containing 0.1%Rapigest (Waters Corporation) and heated at 80 �C for 15 min.Dissulfide bonds were reduced by addition of DTT (5 mM) andincubation at 60 �C for 1/2 an hour. Protein samples were thenalkylated with addition of iodoacetamide (10 mM) and incuba-tion at 23 �C for 1 h in the dark. Trypsin (1:50 trypsin:protein)was added to the protein solution and the sample was incubatedfor 16 h at 37 �C. Rapigest was then removed by adding TFA to afinal concentration of 0.5%, incubating at 37 �C for 45 min andspinning down at 13 000 rpm for 20 min.

    UPLC SettingsEight-hundred nanograms of the tryptic sample was loaded

    onto a 180 μm � 20 mm Trapping column and washed with 30column volumes of solvent A (99.9% H2O, 0.1% Formic acid).Peptides are separated on this column and a 75 μm � 200 mmcolumn using 1.3, 0.7 and 0.44% per minute gradient increases insolvent B (99.9% ACN, 0.1% formic acid) starting from an initialmixture of 99:1 solvent A:solvent B. A total separation time of 60,90, and 120 min was used for the LC separation resulting in atotal experimental time of 180, 270, and 360 min for the replicateruns. A flow rate of 300 nLmin�1 is used to perform the LCseparation and the eluent is directed into a capillary ESI tip fordirect electrospray into the mass spectrometer.

    Mass Spectrometer SettingsTo perform the mobility separation, the IMS T-Wave height is

    set to 40 V during transmission. The wave velocity was set at 600m/s. These settings resulted in a total separation time of 13.7 ms.Nitrogen gas pressure in the IMS T-Wave was maintained at 3.27mBar. The TOF mass spectrometer was operated in “V” modewith a resolving power of >2� 104 fwhm and a mass accuracy of3 ppm rms. MS/MS experiments were performed using theIdentityE mode.66 Here conditions in the Transfer T-Wavelocated behind the IMS T-Wave cell are alternated betweenthose that favor transmission of precursor ions (Collision Energy0 V) and those that induce precursor ion dissociation (CollisionEnergy ramped from 19 to 45 V). Product ions produced underthese conditions have the same chromatographic retention timeand the same ion mobility as their precursor. Precursor andproduct ion mass spectra were acquired over the mass range50�2000 amu with an acquisition rate of 0.9 s per spectrum. Atotal of 10 000 MS/MS spectra were generated and subjected toprotein database searches using the Waters ProteinLynx GlobalServer (PLGS) and IdentityE software suite.

    Peptide Ion IdentificationIon mobility enhanced MSE spectra were submitted to the

    PLGS software suite for protein database searches. Mass toler-ances used for database searches were 5 and 10 ppm for precursorand product ions, respectively. At least two unique peptides ofgreater than a 95% probability were required for a protein to bereported. A forward/reverse protein database search strategy wasimplemented to limit the number of proteins reported. For thesedata sets utilized in this study, the protein false discovery rate wasset to 1%.

  • 2320 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    Data AnalysisTo provide the best estimation of intrinsic amino acid size

    parameters, it was necessary to filter the data sets to group ionsinto those that may contain structural similarities. For the workperformed here, the first filter requirement was that peptide ionsbe doubly charged. The second filter criterion removes allpeptides with missed cleavages to allow the use of only peptide

    ions where the location of the protons is known. Next peptideions were divided into those containing a c-terminal arginine orlysine residue. Finally, within these two subgroups, the peptideswere further divided by length (number of amino acids). Sizeparamterization was performed as described below for each ofthese groups of peptide ions. Matrix manipulation was achievedusing the MATLAB software suite.67

    Figure 1. (A) Intrinsic amino acid size parameters derived from a group of doubly charged peptide ions that are 12 residues in length and eachcontain a single, c-terminal arginine residue. Intrinsic size parameters have been grouped by types of amino acids. Size parameters for proline andglycine are presented separately in light of their propensity to disrupt R-helical structure in solution. Histidine and cysteine size parameters arealso shown separately because of their relative infrequent occurrence in the proteome database. Error bars represent one standard deviationabout the mean. (B) Average size parameters obtained from doubly charged peptide ions of different residue lengths. Solid and open diamondsrepresent average values for lysine- and arginine-terminated peptide ions, respectively. Error bars represent one standard deviation fromthe mean.

    http://pubs.acs.org/action/showImage?doi=10.1021/pr1011312&iName=master.img-001.png&w=353&h=498

  • 2321 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    ’RESULTS AND DISCUSSION

    Derivation of Size ParametersTo determine the contribution of each amino acid residue to

    the overall size of the peptide ions, those sequences estimated toexhibit similar gas-phase structures are selected (see selectioncriteria above and discussion below). As described previously,from ion mobility measurements for the peptide ions within aparametrization set,37�40 it is possible to establish a system ofequations relating size (ion mobility) to the amino acid composi-tion using eq 1.

    ∑n

    j¼ 1Xijpj ¼ yi ð1Þ

    In eq 1, i and j represent a given peptide ion in the parametriza-tion set (i = 1 tom, wherem is the total number of peptides in theset) and the given amino acid residue (j = 1 to n where n is thenumber of separate amino acids), respectively. X represents thefrequency of occurrence of the jth amino acid in the ith peptide ofthe parametrization set. The variable y is related to the ionmobility (represented here by a calibrated tD) of the ith peptideion. For these experiments y is calibrated to obtain a reduced tD.Because peptide ion size is correlated to mass, it is necessary tocalibrate the system such that differences in y within a subset ofpeptide ions are associated with peptide composition andsequence rather than differences in mass alone. That is, dividingthe tD of a peptide ion by that of a “model” peptide ion of thesame mass (obtained from a second-order polynomial fit to thetD versus molecular weight data) captures the variability in y atgiven masses. This variability is presumably determined largelyby differences in peptide amino acid composition and sequence.Finally, because the ratio of tD values is the same as the ratio thatwould be obtained for collision cross sections, values of p arereferred to as intrinsic “size” parameters. In eq 1, p represents theintrinsic size parameter of the jth amino acid.

    Because eq 1 represents a linear system of m equations with ncoefficients, it can be written in matrix form as40,68,69

    Xp ¼ y ð2ÞwhereX is am� nmatrix, p is a vector of n components, and y is avector of m components. It is straightforward to solve forindividual intrinsic size parameters using68,69

    p ¼ ðXTXÞ�1XTy ð3ÞThe m/n = 1 diagonal of the variance-covariance matrix of the

    size parameters (Mp) provides the variance for the size parameterpn where

    69

    Mp ¼ Sm� nðX

    TXÞ�1 ð4Þ

    and69

    S ¼ yT_r ð5ÞIn eq 5, rk corresponds to the residuals (rk = y � Xp) of theindividual equations.70 Errors representing one standard devia-tion can be obtained as the square root of the variance for eachintrinsic size parameter. For the study presented here, the sizeparameters have been determined for groups of peptide ionshaving the same length within the lysine- and arginine-termi-nated subgroups (see above). The size parameters for thec-terminal residues have been maintained at the previously

    reported values of 1.230 and 1.150 for lysine and arginine,respectively.37,39 This has been performed in order to removeany effect that might treat these parameters as “compensating”residues due to their single occurrence in every peptide ionsequence.

    Figure 1A shows the values of the intrinsic amino acid sizeparameters obtained for doubly charged, arginine-terminatedpeptide ions containing 12 amino acid residues. Several trendsare worth noting. First, nonpolar aliphatic residues generally havelarger intrinsic size parameters (i.e., they have a greater contribu-tion to peptide ion size) than polar aliphatic residues. This is verysimilar to the trend observed for singly charged, lysine-termi-nated peptides, and it has been suggested that stronger interac-tions between the charge site and polar residues may account forthe difference in size.37 Another similarity is that the sizeparameters for the aromatic residues are intermediate in valueto those of the nonpolar aliphatic and the polar aliphatic residues.Additionally, the size parameters for proline and glycine arerelatively small. When compared to the previous work,37 the sizeparameter for valine obtained from this peptide ion group isrelatively large. The intrinsic size parameters for histidine andcysteine are the smallest determined for this parametrization set.Finally, it should be noted that the size parameter errors forcysteine, histidine, methionine, and tryptophan are relativelylarger than those of other residues. This can be attributed to therelatively low level of occurrence of these amino acids in the peptideion group used to obtain parameters. For example, the numbers ofoccurrence of these respective amino acids in the 102 peptidesin this group are 7, 3, 12, and 14, respectively. In comparison,alanine occurs 118 times within the same parametrization set.

    Previously it has been demonstrated that intrinsic size para-meters can be used to predict peptide ion collision crosssections.37�40 The studies showed that predictions improvedupon restricting the sizes and types of peptide ions used to obtainthe parameters. The reasoning for the improvement is that ionsexhibiting similarities in length, composition (i.e., no missedcleavages), charge, and c-terminal residue (R or K) are morelikely to adopt related gas-phase conformations; these similaritieswould be reflected in the intrinsic amino acid size parameters andthus lead to greater prediction accuracy for peptides within asubset. Indeed, in a previous study collision cross sectionprediction accuracy decreased by as much as a factor of 2 whensize parameters from one parametrization set were used in crosssection calculations for another set.39 For the present study,seventeen peptide subgroups have been extracted from theannotated proteome data set. Figure 1B shows the average sizeparameters obtained from each of the parametrization sets(peptides of different length) for arginine- and lysine-terminatedpeptides. For the former, average values were obtained fromintrinsic size parameters determined for peptides having lengthsof 7, 8, 9, 10, 11, 12, 13, and 14 to 15 amino acid residues. The lastgrouping is required because of an insufficient number of peptideions containing either 14 or 15 amino acid residues. For lysine-terminated peptides, size parameters from peptide groups withlengths of 7, 8, 9, 10, 11, 12, 13, 14, and 15 residues wereobtained. Figure 1B shows that similar trends in size parametersare obtained for the different peptide ion subgroups.

    Predicting Peptide Ion Drift TimesSize parameters can be used with amino acid composition to

    predict reduced tDs using eq 1. Because peptide ion tD values arecalculated for the ions used to obtain parameters, the calculations

  • 2322 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    can be termed retrodictions. Previously we have shown thatretrodictions are very similar in accuracy to bona fide predictionsand therefore we shall use the term predictions throughout thiswork.39 The predicted tDs can be compared with experimentalvalues to assess the quality of the intrinsic size parameterdetermination for each data set. As an example consider thepeptide ion [NTTIPTK þ 2H]2þ from the heat shock proteinSSC1. This seven-residue peptide ion has a tD peak centered at36.31 bins. From a polynomial fit to the tD versus molecularweight data, it is observed that a “model” peptide of the samem/z(774.4Da) would have a peak centered at a tD of 36.65 bins. Thusthe reduced tD for [NTTIPTKþ 2H]2þ would be 0.991 (36.31/36.65). The predicted reduced tD would be calculated according

    to eq 1 as XNpNþ XTpTþ XIpIþ XPpPþ XKpK (0.143� 0.883þ 0.429 � 0.967 þ 0.143 � 1.003 þ 0.143 � 0.936 þ 0.143 �1.23). The calculated reduced tD for this peptide is 0.993corresponding to a drift bin value of 36.40. This is within0.25% of the 36.31 value associated with the peak. This issignificantly more accurate than the 36.65 value (0.94%) ob-tained from the polynomial fit to the tD versus molecular weightdata. Supplementary Table 1 (Supporting Information) shows acomparison of experimental and theoretical tD values for allpeptides used in this study. On average, experimental andtheoretical tDs agree to within (1.8%.

    To better understand the efficacy of a size parameter predic-tion of the data, it is instructive to make comparisons to the

    Figure 2. Dot plots showing prediction accuracies of tDs for individual peptide ions as a function of molecular weight. Prediction accuracy is depicted asthe ratio of the predicted tD to the experimental tD. The top plot shows the prediction accuracy obtained by employing a polynomial fit to tD versusmolecular weight data. The bottom plot shows the prediction accuracy obtained by using intrinsic size parameters. Data in these plots are obtained frompeptide sequences containing 12 residues and a single c-terminal arginine residue.

    http://pubs.acs.org/action/showImage?doi=10.1021/pr1011312&iName=master.img-002.png&w=329&h=464

  • 2323 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    polynomial fit for a group of peptide ions. Figure 2 shows theratios of predicted and experimental tDs obtained for both thesize parameter fit and the polynomial fit. These have beenperformed for arginine-terminated peptide ions of 12 amino acidresidues in length using the size parameter values depicted inFigure 1A. In comparison, all predicted tD values are within 8% ofexperimental values using the polynomial fit. All predicted tDs arewithin 5% of experimental values using the size parameters.Additionally, the data for the size parameter fit is more com-pressed around the unity line indicating a higher level of accuracy.This increased density of data points in this region is anindication of the tD prediction improvement obtained whenusing size parameters. Another way to visualize this improvementis to compare the number of ions in the parametrization groupthat are accurately predicted to within (1%. The 1% accuracythreshold has been selected as being representative of the typicalexperimental accuracy of ion mobility measurements.43 Use ofsize parameters results in ∼40% of all predictions meeting thisaccuracy threshold; the use of a polynomial fit to tD versusmolecular weight data results in∼18% of all predictions reachingthis same level of accuracy. Thus there is more than a 2-foldimprovement in predictive capabilities using the size parameterscompared to the polynomial fit. This advantage exists for higheraccuracy thresholds as well. For example, an improvement of afactor of ∼1.7 is observed for predictions that are within 2% ofexperimental values. Here we note that size parameters obtainedfrom peptide ions of this size provide the most accurate predic-tions. That said, the average improvement for arginine-termi-nated peptides of all sizes using the 1% accuracy threshold is∼50%. For all comparisons reported here, a second-orderpolynomial fit has been used because it has been shown toprovide the greater prediction accuracy compared to higher-order polynomials and a linear least-squares fit.

    Although the discussion has focused on the superiority of thesize parameters in predicting tDs to within 1 and 2% of experi-mental values, it is worthwhile to consider the range of accuracyover which this advantage holds. Consider Figure 3, which showsthe average fraction of the peptides correctly predicted as afunction of accuracy threshold. Again a comparison is drawnbetween the prediction capabilities of the size parameter fit andthose of the polynomial fit to tD versus molecular weight data.The data shown in Figure 3 suggests that a significant advantagein predictive capabilities is attainable using intrinsic size para-meters over an accuracy threshold range of (0.5 to (6%. Athigher accuracy threshold values, bothmodels do nearly as well inpredicting tD values.

    Peptide Ion AssignmentsTo determine how intrinsic size parameters would aid peptide

    identification efforts, it is useful to consider two factors influencingthe quality of the fit. This is accomplished by comparing thepredictions obtained for specific peptide ions with those that wouldbe obtained for nearly all peptide ion sequences at the same m/zvalues. Consider the peptide ion [QAYAVSEKþ 2H]2þ from the60S ribosomal protein L4 A. Using the polynomial fit to the tDversus molecular weight data for the eight-residue peptides, areduced tD for the peptide ion [QAYAVSEK þ 2H]2þ is deter-mined to be 1.037. The predicted reduced tD obtained using theappropriate intrinsic size parameters is 0.995. Thus, the predictionaccuracy is ∼0.041 or ∼4.1%. A sampling of the complete list oflysine-terminated peptide ions ranging in length from 7 to 10amino acids and within 0.01 Da of the precursor ion mass (894.45Da) yields ∼7.13 � 105 separate sequences. Predicted tDs for allpossible peptide sequences have been computed using the intrinsicsize parameters obtained from the 7-, 8-, 9-, and 10-residue, lysine-terminated peptide ion groups. It is observed that ∼4% of allisobaric sequences have predicted tDs that are within the predictionaccuracy ((4.1%) of the experimental sequence. In a sense, thisprediction accuracy for incorrect peptide ion assignments can beconsidered a false discovery rate and will be useful in formulating apeptide ion identification scoring scheme outlined below. Thus, forthis peptide ion, the predicted reduced tD outperforms thoseobtained for ∼96.0% of nearly all possible sequences at the samem/z.

    From such an analysis of interfering sequences, one candetermine the degree of overlap at different prediction accuracythresholds. This is shown in Figure 4A. Here consider only thetrace with the solid square symbols as this represents data forpeptide sequences matching the mass (894.45 Da) of the peptideion [QAYAVSEKþ 2H]2þ. As the prediction accuracy thresholdincreases from 0.005 to 0.030 the fraction of total peptide ionsequences within the required threshold value for a match withthe experimental value increases slowly from ∼0.00 to ∼0.02.Going from a prediction accuracy threshold of 0.030�0.040, thefraction of total sequences predicted accurately doubles to∼0.04. Above this value, the fraction of predicted sequencesincreases dramatically to 0.21, 0.63, and 0.88 at accuracy thresh-olds of 0.050, 0.060, and 0.070, respectively. Above an accuracythreshold of 0.070, the fraction of predicted sequences begins tolevel off approaching a value of 1 resembling a sigmoidaldependence. The data can be fitted with an expression for thesigmoidal curve intensity (I) according to71

    I ¼ Aþ B� A1þ e�ðx � x0Þ=w ð6Þ

    Figure 3. Plots showing the fraction of database peptides predictedaccurately for given prediction threshold values. Open- and solid-circlesrepresent data obtained from predictions with intrinsic size parametersand predictions obtained from a polynomial fit to tD versus molecularweight data, respectively. Data points represent average values obtainedfrom peptide sequences ranging in length from 7 to 15 amino acidresidues. Error bars represent one standard deviation from the mean.Prediction accuracy threshold (x-axis) represents the deviation from theexperimental values expressed as a fraction; a prediction accuracythreshold of x = 0.03 would provide the fraction of all database peptidespredicted to within (3% of the experimental values.

    http://pubs.acs.org/action/showImage?doi=10.1021/pr1011312&iName=master.img-003.png&w=240&h=178

  • 2324 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    where the variables A and B represent the minimum andmaximumvalues of the sigmoidal curve (0 and 1 in this case), respectively.The variables x0 and w represent the prediction accuracy thresholdvalue associatedwith the inflection point of and awidth factor of thesigmoidal curve, respectively. Using a prediction accuracy thresholdvalue of ∼0.060 to represent x0 and a value of ∼0.006 for w, thedata for competitive assignments to the peptide ion [QAYAVSEKþ 2H]2þ can be fit as shown in Figure 4A.

    The comparison of overlapping competitive peptide ionassignments can be performed for other assigned peptide ionsfrom the proteome database. For example, Figure 4A also showsdata for accurately predicted interfering sequences having thesame masses as the peptide ions [EAYVPATK þ 2H]2þ and[LNLFLSTK þ 2H]2þ from the proteins suppressor proteinSTM1 and isocitrate dehydrogenase, respectively. The data forcompetitive assignments of the former peptide ion also reveals a

    Figure 4. (A) Fraction of all possible 7-,8-,9-, and 10-residue sequences meeting or exceeding the prediction thresholds (x-axis) for tDs of the selectpeptide ions [EAYVPATK þ 2H]2þ (solid circles), [QAYAVSEK þ 2H]2þ (solid squares), [LNLFLSTK þ 2H]2þ (solid triangles). Because thesevalues show the overlap between all possible sequences of the same m/z and the correct prediction, the value can be thought of as a false discovery rate(see text for details). Unique peptide ions from the complete list of all possible peptide ions that are within 0.01 Da of the select peptides are used in thisanalysis. The three different data sets have been fitted with sigmoidal curves (solid traces) using eq 6. (B)w values (eq 6) for the three different curves as afunction of reduced tD deviation, d (see text for details) as well as a zero value. Also shown is a linear least-squares fit of the data (solid line). Plot C showsthe x0 values (eq 6) for the three different curves as a function of d (see text for details) as well as a zero value. Also shown is a linear least-squares fit of thedata (solid line).

    http://pubs.acs.org/action/showImage?doi=10.1021/pr1011312&iName=master.img-004.png&w=360&h=477

  • 2325 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    sigmoidal dependence albeit x0 and w values are shifted to highervalues (∼0.120 and ∼0.009, respectively). The curve obtainedfor the latter peptide ion reveals a pseudosigmoidal dependencewhere the x0 andw values are shifted to lower values (∼0.007 and∼0.003, respectively). The reduced tDs for the peptide ions[EAYVPATK þ 2H]2þ and [LNLFLSTK þ 2H]2þ are 1.104and 1.023. Thus it is observed that as the reduced tD increases,values for x0 and w providing the best fit to the data increase aswell. This observation is somewhat intuitive as a histogram ofreduced tDs at a given m/z value reveals a Gaussian distributioncentered about 1.000. That is, the majority of the reduced tDs areclose to unity. Therefore, higher prediction accuracy thresholdswould be required to obtain matches between competitive ionassignments and experimental features exhibiting reduced tDsthat are significantly removed from 1.000.

    To obtain a mathematical expression for a simple scoringscheme it is possible to use the data presented in Figure 4A.Examination of this data shows the dependence of a falsediscovery rate on two factors. The first factor is the overallprediction accuracy and the second factor is the magnitude of thereduced tD of the experimental peak. As described above anddemonstrated in Figure 4A, these two factors are correlated. Oneway to estimate potential false discovery rates for data set featuresis to reconstruct sigmoidal curves (Figure 4A) for given reducedtD values. As a first approximation this can be accomplished byexamining the dependence of w and x0 on reduced tD. InFigure 4B and C, this dependence is depicted for w and x0,respectively. Here the dependence is derived as a function of thedeviation of the reduced tD from unity (d). The deviation is thefraction difference of the reduced tD from the “model” peptideion obtained from the polynomial fit to tD versus molecularweight data. For the peptide ions [QAYAVSEK þ 2H]2þ,[EAYVPATK þ 2H]2þ, and [LNLFLSTK þ 2H]2þ havingreduced tDs of 1.037, 1.104, and 1.023 the deviation values are0.037, 0.104, and 0.023, respectively. A linear least-squares fit ofthe data in Figure 4B and C provides the dependence of thesigmoidal curve variables on d. For the w and x0 variables thisdependence is 0.0803 � dþ0.0013 and 1.1489 � d � 0.0022,respectively.

    With the prediction accuracy dependencies on reduced tDdeviation established, it is possible to construct estimated falsediscovery rate curves that are specific for data set features of givenreduced tDs. This is accomplished by substituting the w and x0

    dependencies as well as values for A and B into eq 6, yielding

    I ¼ 11þ e�ðx � ð1:1489d � 0:0022ÞÞ=0:0803d þ 0:0013 ð7Þ

    It is instructive to consider the false discovery rate at the limitsof high- and low-confidence matches to experimental reducedtDs. A low-confidence assignment would consist of a smallreduced tD deviation and a large prediction accuracy threshold.Using values of d = 0 and x = 0.15 (a worst case scenario based onexamination of database values), the exponential expression ineq 7 would approach zero and the fraction of competitivepeptides predicted accurately becomes 1. A high-confidenceassignment where d = 0.15 and x = 0, would result in predictionaccuracy values approaching 0 as the exponential expressionapproaches 3.44 � 105.

    A simple scoring scheme for aiding peptide ion identificationcan be devised based on eq 7. Because the power in theexponential expression in eq 7 essentially determines the falsediscovery rate, this expression can be used to provide a score forpotential sequence matches. For example, the power expressionranges from �117.07 to 12.74 for low- and high-confidencematches, respectively. A scoring scheme can be set up of the form

    S ¼ k� ½x� ð1:1489d� 0:0022Þ�0:0803dþ 0:0013

    � �L ð8Þ

    Here, k is an arbitrary variable used to shift the scoring range ontoa positive scale. L is an arbitrary variable used to scale the outputscore. Values of 117.08 and 0.7703 for k and L, respectively,provide output scores that range from 0 to 100 for nearly allpeptide sequences.

    To evaluate the new scoring approach, consider the peptideion [VSGVSLLALWKþ 2H]2þ from the 40S ribosomal proteinS23 which has a reduced tD of 1.047. The predicted reduced tDfor this peptide ion is 1.036. Using x = 0.011 and d = 0.047, S isdetermined to be 96.38. In the yeast proteome database used toderive the intrinsic size parameters (both arginine- and lysine-terminated peptide ions), there are 7 different peptide ions thatare within ∼1 Da of the molecular weight (1171.702 Da) of thepeptide ion [VSGVSLLALWK þ 2H]2þ. None have higherscores than the correct peptide; scores for these sequences rangefrom 90.15 to 95.80. Here we note that caution should be usedwith such a scoring scheme especially when comparing values for

    Table 1. Scores for Assigned Peptide Ions: Comparison to Sequences of Similar Mass

    protein accessiona proteinb peptidec Sd ranke number of sequencesf

    P39741 60S ribosomal protein L35 OS Saccharomyc QIAFPQR 92.65 1 6

    P16862 6 phosphofructokinase subunit beta OS Sa AVAEAIQAK 87.57 1 7

    P53622 Coatomer subunit alpha OS Saccharomyces IWDISGLR 95.57 1 6

    P12398 Heat shock protein SSC1 mitochondrial OS IIENAEGSR 93.23 3 7

    P08524 Farnesyl pyrophosphate synthase OS Sacch IGTDIQDNK 96.31 4 9

    P38972 Phosphoribosylformylglycinamidine syntha VLNLPSVGSK 92.63 2 5

    P38910 10 kDa heat shock protein mitochondrial TASGLYLPEK 90.00 3 5

    P10614 Lanosterol 14 alpha demethylase OS Sacch GVIYDCPNSR 90.91 1 7

    Q03161 Glucose 6 phosphate 1 epimerase OS Sacch GGIPLVFPVFGK 84.70 2 6

    P48570 Homocitrate synthase cytosolic isozyme O SDLVDLLNIYK 96.73 1 7a Protein accession number obtained from the Protein Knowledgebase (UniProt KB at http://www.uniprot.org/uniprot/). b First 40 characters of theprotein name in the Protein Knowledgebase. cTryptic peptide sequence obtained from the LC�MS/MS analysis. d Peptide sequence score usingpredicted tD values and eq 8 (see text for details).

    e Peptide sequence score rank compared with all other parametrization peptide sequences within∼1Daof the precursor ion mass. fTotal number of sequences within ∼1 Da of the precursor ion mass.

  • 2326 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    species for which reduced tD deviations are significantly different.That said, the results shown above for a peptide ion exhibitingmoderate prediction accuracy and reduced tD deviation areencouraging and suggest that in the future, a similar approachmay be useful in helping to weed out false positive identificationsby indicating more probable matches to experimental data.

    Additional comparisons of peptide ion scores are presented inTable 1. Here, the scores for 10 peptide sequences (selected atrandom) are listed. For half of the comparisons, the assignedpeptide sequence yields the highest score when compared toother database peptide sequences within ∼1 Da in mass. In twoother instances the assigned peptide sequence yields the secondhighest score. In the remaining three instances, the assignedpeptide score is the median score or higher. It is instructive toconsider the cases where the assigned peptide ion did not score ashighly as other sequences. For example, the peptide ion[IGTDIQDNK þ 2H]2þ yields the relatively high score of96.31. However, it is the fourth highest score within a groupcontaining nine total peptide sequences. Scores of the three otherpeptide sequences range from 96.51 to 97.07. These values arevery similar to that obtained for the assigned peptide ion. In thissituation, several peptides that are within ∼1 Da of the assignedpeptide ion in mass have predicted tDs that are similar to theexperimental tD. As such, the clustering of such high scores doesnot warrant discarding the assigned peptide ion sequence.Rather, additional evidence would be required to confirm thepeptide ion assignment.

    It is instructive to consider the peptide ion sequences that havea higher score rank than the sequence associated with the correctidentification (Table 1). Three of the assigned peptide ions havescores yielding a rank of 3 (or lower). Two of these peptide ionshave the highest mass fraction of polar residues compared with allother sequences in Table 1. The third peptide ion is one of thetop five ions with respect to mass fraction of polar residues.Overall, these three peptide ions have a higher average massfraction of polar residues (52.5( 15.2%) compared to the othersequences (30.9 ( 12.7%) in Table 1. Currently, no sequencecorrelation can be drawn between incorrect peptide ion assign-ments and the identified ions presumably because of the limitednumber of comparisons available. Additionally, no correlationcan be made to exact peptide composition. However, it is notedthat the incorrect sequences of higher rank for all three peptidesalso contain a higher mass fraction of polar residues than thosesequences of lower rank. Consider the peptide ion [IIENAEGSRþ2H]2þ having a mass fraction of polar residues of 46.4%. The twodatabase peptides ions with scores of higher rank are[AQELAEATR þ 2H]2þ and [VLQDSGLEK þ 2H]2þ. Thesepeptide ions have mass fractions of polar residues of 49.3 and59.4%, respectively. The average mass fraction of polar residues forthe other scored peptide ions is 34.4 ( 14.8%. This weakcorrelation suggests that the fraction of polar residues in peptideion sequences can influence the scoring capability of the approach.That said, because of the limited amount of data, only a note ofcaution can be suggested in the scoring of such peptides. A greaterelucidation of the effect of peptide ion sequence and compositionon overall ion scores (and size parameters) requires the develop-ment of much larger proteome databases.

    Improving Peptide Identification Capabilities Using IonMobilities

    Several factors need to be addressed in order to improve theability to aid peptide ion identification with ion mobility data.

    These include improvements in ion mobility instrumentation aswell as to the method employed to determine instrinsic sizeparameters for different amino acids. As mentioned above, thedevelopment of instrumentation of higher resolving powerwould provide greater accuracy in the determination of ionmobilities and by association increased accuracy of intrinsic sizeparameters for different amino acids. In a related manner, higherresolving power may also allow the removal of interfering speciesaffecting the mobility determination of peaks in proteomicsmixtures. It is noted that a newer version of the Synapt HDMSsystem has recently been commercialized affording ∼3 timesgreater resolving power. Additionally, careful studies of T-Waveseparation parameters should be explored. It may be possible thatmany high-mobility species are traveling at the velocity of thevoltage wave and are thus not separated as efficiently as otherspecies.

    Improvements in the determination of intrinsic size para-meters may be enhanced by instrumentation developments in adifferent manner. For example, higher resolving power may allowthe resolution of peptide ion conformer types (e.g., helices,partial helices, globules, and elongated structures). The resolu-tion of structural types should allow increased parametrization ofpeptide ion subgroups. This would require the determination ofcorrelations between peptide ion composition and sequence toconformer types. It may also be necessary to employ othermethods such as molecular dynamics simulations to assignstructural types. Another factor that should aid the determinationof conformer types is the construction of much larger databases.Many more sequence measurements would be required. As anote of caution, with larger databases comes the problem ofincreasing numbers of false positives. This is particularly proble-matic for data to be used in the determination of intrinsic sizeparameters. It is noted that a weighting factor can be incorpo-rated into eq 3.69 Such a weighting factor may include theprobability score obtained from protein database searches.

    It is instructive to consider the relevance of using intrinsicamino acid size parameters to validate peptide ion assignments.In a recent publication, Zubarev and Gorshkov and their co-workers described how they addressed a basic tenet of bothanalytical and engineering sciences, the tenet being a require-ment for “...the use of a technique for a model validationmaterially different (complementary) from the one employedin the model creation”.72 For peptide ion identification inproteomics analyses, a verification model that is not based on afragment ion interpretation is required. Employing retention-time modeling algorithms, the authors found many peptidesequences, even those with high scores, illustrating significantdeviations from the theoretical retention times. The observationwas largely attributed to the effects of chimeric spectra asopposed to bias in the independent retention-time models.Similarly, the present work illustrates how predicting the mobi-lity and the use of a statistical strategy can provide increasedspecificity of database search results. Therefore, the use ofaccurate mass, retention-time, and ion mobility all as indepen-dent metrics of peptide validation should significantly reduce thefalse positives in complex mixture analysis.

    ’CONCLUSIONS

    Intrinsic size parameters for amino acid residues have beendetermined for a variety of peptide groups obtained from a yeastproteome database. In general, the size parameters are very

  • 2327 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    similar to those obtained from singly charged, lysine-terminatedpeptide ions indicating a degree of similarity between the types ofstructures (or elements of structure) formed by singly- anddoubly charged peptide ions. Additionally, the size parametersare very similar for peptides of very different lengths (from 7 to15 residues). These size parameters have been used to predict ionmobilities (tDs). Predictions of tDs using intrinsic size parametersare more accurate than predictions obtained from polynomial fitsto tD versus molecular weight data. This ability is proposed as ameans to aid peptide ion identification and a simple scoringscheme has been introduced.

    ’ASSOCIATED CONTENT

    bS Supporting InformationSupplemental Table 1. This material is available free of charge

    via the Internet at http://pubs.acs.org.

    ’AUTHOR INFORMATION

    Corresponding Author*E-mail: [email protected].

    ’ACKNOWLEDGMENT

    We acknowledge support for the development of new instru-mentation by a grant from the National Institutes of Health(1RC1GM090797-01). We are also grateful to Professor RobBeynon from the University of Liverpool and Professor ChrisGrant from the University of Manchester for providing the yeastsamples. Finally, we are grateful to Tim Riley of the WatersCorporation for coordinating the sharing of data and helpfuldiscussions.

    ’REFERENCES

    (1) Scoble, H. A.; Martin, S. A.; Biemann, K. Peptide sequencing bymagnetic deflection tandem mass spectrometry. Biochem. J. 1987,245, 621–622.(2) Johnson, R. S; Martin, S. A.; Biemann, K. Collision induced

    fragmentation of (MþH)þ ions of peptides. Side chain specificsequence. Int. J. Mass Spectrom. Ion Processes 1998, 86, 137–154.(3) Hunt, D. F.; Yates, J. R., III; Shabanowitz, J.; Winston, S.; Hauer,

    C. R. Protein sequencing by tandemmass spectrometry. Proc. Natl. Acad.Sci. U.S.A. 1986, 83, 6233–6237.(4) Wolters, D. A.; Washburn, M. P.; Yates, J. R. An Automated

    Multidimensional Protein Identification Technology for Shotgun Pro-teomics. Anal. Chem. 2001, 73, 5683–5690.(5) Washburn, M. P.; Wolters, D.; Yates, J. R. Large-Scale Analysis of

    the Yeast Proteome by Multidimensional Protein Identification Tech-nology. Nat. Biotechnol. 2001, 19, 242–247.(6) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P.

    Evaluation of Multidimensional Chromatography Coupled with Tan-dem Mass Spectrometry (LC/LC�MS/MS) for Large-Scale ProteinAnalysis: The Yeast Proteome. J. Proteome Res. 2003, 2, 43–50.(7) Adkins, J. N.; Varnum, S. M.; Auberry, K. J.; Moore, R. J.; Angell,

    N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. G. Toward a HumanBlood Serum Proteome: Analysis by Multidimensional SeparationCoupled with Mass Spectrometry. Mol. Cell. Proteomics 2002,1, 947–955.(8) Wu, S. L.; Choudhary, G.; Ramstrom,M.; Bergquist, J.; Hancock,

    W. S. Evaluation of Shotgun Sequencing for Proteomic Analysis ofHuman Plasma Using HPLC Coupled with either Ion Trap or FourierTransform Mass Spectrometry. J. Proteome Res. 2003, 2, 383–393.

    (9) Zhou, M.; Lucas, D. A.; Chan, K. C.; Issaq, H. J.; Petricoin, E. F.;Liotta, L. A.; Veenstra, T. D.; Conrads, T. R. An Investigation into theHuman Serum “Interactome. Electrophoresis 2004, 25, 1289–1298.

    (10) Shen, Y.; Jacobs, J. M.; Camp, D. G., II; Fang, R.; Moore, R. J.;Smith, R. D.; Xiao, W.; Davis, R. W.; Tompkins, R. G. Ultra-High-Efficiency Strong Cation Exchange LC/RPLC/MS/MS for High Dy-namic Range Characterization of the Human Plasma Proteome. Anal.Chem. 2004, 76, 1134–1144.

    (11) Rose, K.; Bougueleret, L.; Baussant, T.; B€ohm, G.; Botti, P;Colinge, J.; Cusin, I.; Gaertner, H.; Gleizes, A.; Heller, M.; Jimenez, S.;Johnson, A.; Kussmann, M.; Menin, L.; Menzel, C.; Ranno, F.;Rodriguez-Tom�e, P.; Rogers, J.; Saudrais, C.; Villain, M.; Wetmore,D.; Bairoch, A.; Hochstrasser, D. Industrial-Scale Proteomics: FromLiters of Plasma to Chemically Synthesized Proteins. Proteomics 2004,4, 2125–2150.

    (12) Omenn, G. S.; States, D. J.; Adamski, M.; Blackwell, T. W.;Menon, R.; Hermjakob, H.; Apweiler, R.; Haab, B. B.; Simpson, R. J.;Eddes, J. S.; Kapp, E. A.;Moritz, R. L.; Chan, D.W.; Rai, A. J.; Admon, A.;Aebersold, R.; Eng, J.; Hancock, W. S.; Hefta, S. A.; Meyer, H.; Paik, Y.;Yoo, J.; Ping, P.; Pounds, J.; Adkins, J.; Qian, X.; Wang, R.; Wasinger, V.;Wu, C. Y.; Zhao, X.; Zeng, R.; Archakov, A.; Tsugita, A.; Beer, I.; Pandey,A.; Pisano, M.; Andrews, P.; Tammen, H.; Speicher, D. W.; Hanash,S. M. Overview of the HUPO Plasma Proteome Project: Results fromthe Pilot Phase with 35 Collaborating Laboratories and MultipleAnalytical Groups, Generating a Core Dataset of 3020 Proteins and aPublicly-Available Database. Proteomics 2005, 5, 3226–3245.

    (13) Tabb, D. L. What’s driving the false discovery rate?. J. ProteomeRes. 2008, 7, 45–46.

    (14) Barton, S.; Richardson, S.; Perkins, D.; Bellahn, I.; Bryant, T.;Whittaker, J. Using Statistical Models To Identify Factors That Have aRole in Defining the Abundance of Ions Produced by TandemMS. Anal.Chem. 2007, 79, 5601–5607.

    (15) Schutz, F.; Kapp, E.; Simpson, R.; Speed, T. Deriving statisticalmodels for predicting peptide tandem MS product ion intensities.Biochem. Soc. Trans. 2003, 31, 1479–1483.

    (16) Elias, J.; Gibbons, F.; King, O.; Roth, F.; Gygi, S. Intensity-basedprotein identification by machine learning from a library of tandemmassspectra. Nat. Biotechnol. 2004, 22, 214–219.

    (17) Frank, A.; Pevzner, P. PepNovo: De Novo Peptide Sequencingvia Probabilistic Network Modeling. Anal. Chem. 2005, 77, 964–973.

    (18) Tanner, S.; Shu, H.; Frank, A.; Mumby, M.; Pevzner, P.; Bafna,V. InsPecT: Fast and accurate identification of post-translationallymodified peptides from tandem mass spectra. Anal. Chem. 2005,77, 4626–4639.

    (19) Wan, Y.; Chen, T. PepHMM: A hidden Markov model basedscoring function for tandem mass spectrometry. Anal. Chem. 2006,78, 432–437.

    (20) Bern, M.; Cai, Y.; Goldberg, D. Lookup Peaks: A Hybrid of deNovo Sequencing and Database Search for Protein Identification byTandem Mass Spectrometry. Anal. Chem. 2007, 79, 1393–1400.

    (21) Colinge, J. Peptide Fragment Intensity Statistical Modeling.Anal. Chem. 2007, 79, 7286–7290.

    (22) Klammer, A.; Reynolds, S.; Bilmes, J.; MacCoss, M.; Noble, W.Modeling peptide fragmentation with dynamic Bayesian networks forpeptide identification. Bioinformatics 2008, 24, i348–356.

    (23) Zhang, Z. Prediction of Low-Energy Collision-Induced Dis-sociation Spectra of Peptides. Anal. Chem. 2004, 76, 3908–3922.

    (24) Zhang, Z. Prediction of Low-Energy Collision-Induced Dis-sociation Spectra of Peptides with Three or More Charges. Anal. Chem.2005, 77, 6364–6373.

    (25) Frank, A. M. Predicting Intensity Ranks of Peptide FragmentIons. J. Proteome Res. 2009, 8, 2226–2240.

    (26) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E.; Shen, Y.; Zhao,R.; Smith, R. D. Anal. Chem. 2003, 75, 1039–1048.

    (27) Strittmatter, E. F.; Kangas, L. J.; Petritis, K.; Mottaz, H. M.;Anderson, G. A.; Shen, Y.; Jacobs, J. M.; Camp, D. G., II; Smith, R. D.Application of Peptide LC Retention Time Information in a

  • 2328 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    Discriminant Function for Peptide Identification by Tandem MassSpectrometry. J. Proteome Res. 2004, 3 (4), 760–769.(28) Krokhin, O. V.; Craig, R.; Spicer, V.; Ens, W.; Standing, K. G.;

    Beavis, R. C.; Wilkins, J. A. An Improved Model for Prediction ofRetention Times of Tryptic Peptides in Ion Pair Reversed-phase HPLCIts Application to Protein Peptide Mapping by Off-Line HPLC-MALDIMS. Mol. Cell. Proteomics 2004, 3.9, 908–919.(29) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-

    Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao,R.; Smith, R. D. Use of artificial neural networks for the accurateprediction of peptide liquid chromatography elution times in proteomeanalyses. Anal. Chem. 2003, 75, 1039–1048.(30) Palmblad, M.; Ramstrom, M.; Markides, K. E.; Hakansson, P.;

    Bergquist, J. Prediction of chromatographic retention and proteinidentification in liquid chromatography/mass spectrometry.Anal. Chem.2002, 74, 5826–5830.(31) Oh, C.; Zak, S. H.; Mirzaei, H.; Buck, C.; Regnier, F. E.; Zhang,

    X. Neural network prediction of peptide separation in strong anionexchange chromatography. Bioinformatics 2007, 23 (1), 114–118.(32) Hsieh, E. J.; Hoopmann, M. R.; MacLean, B.; MacCoss, M. J.

    Comparison of Database Search Strategies for High Precursor MassAccuracy MS/MS Data. J. Proteome Res. 2010, 9 (2), 1138–1143.(33) Xu, H.; Freitas, M. A. A mass accuracy sensitive probability

    based scoring algorithm for database searching of tandem mass spectro-metry data. BMC Bioinform. 2007, 8, 133.(34) Mann, M.; Kelleher, N. L. Precision proteomics: The case for

    high resolution andhighmass accuracy. Proc. Natl. Acad. Sci. U.S.A. 2008,105, 18132–18138.(35) Levin, Y.; Jaros, J. A.; Schwarz, E.; Bahn, S. Multidimensional

    protein fractionation of blood proteins coupled to data-independentnanoLC�MS/MS analysis. J. Protoemics 2010, 73 (3), 689–695.(36) Blackburn, K.; Mbeunkui, F.; Mitra, S. K.; Mentzel, T.; Goshe,

    M. B. Improving Protein and Proteome Coverage through Data-Independent Multiplexed Peptide Fragmentation. J. Proteome Res.2010, 9 (7), 3621–3637.(37) Valentine, S. J.; Counterman, A. E.; Hoaglund-Hyzer, C. S.;

    Clemmer, D. E. Intrinsic Amino Acid Size Parameters from a Series of113 Lysine-Terminated Tryptic Digest Peptide Ions. J. Phys. Chem. B1999, 103, 1203–1207.(38) Henderson, S. C.; Li, J.; Counterman, A. E.; Clemmer, D. E.

    Intrinsic Size Parameters for Val, Ile, Leu, Gln, Thr, Phe, and TrpResidues from Ion Mobility Measurements of Polyamino Acid IonsJ. Phys. Chem. B 1999, 103, 8780–8785.(39) Valentine, S. J.; Counterman, A. E.; Clemmer, D. E. A Database

    of 660 Peptide Ion Cross Sections: Use of Intrinsic Size Parameters forBona Fide Predictions of Cross Sections. J. Am. Soc. Mass Spectrom.1999, 10, 1188–1211.(40) Counterman, A. E.; Clemmer, D. E. Volumes of Individual

    Amino Acid Residues in Gas-Phase Peptide Ions. J. Am. Chem. Soc. 1999,121, 4031–4039.(41) Wang, B.; Valentine, S.; Plasencia, M.; Raghuraman, S.; Zhang,

    X. Artificial neural networks for the prediction of peptide drift time in ionmobility mass spectrometry. BMC Bioinform. 2010, 11, 182.(42) For a review of IMS techniques see (and references therein) :St.

    Louis, R. H.; Hill, H. H. Ion Mobility Spectrometry in AnalyticalChemistry. Crit. Rev. Anal. Chem. 1990, 21, 321–355.(43) For a review of IMS techniques see (and references therein):

    Clemmer, D. E.; Jarrold, M. F. Ion Mobility Measurements and theirApplications to Clusters and Biomolecules. J. Mass Spectrom. 1997,32, 577–592.(44) Liu, Y.; Clemmer, D. E. Characterizing Oligosaccharides Using

    Injected-Ion Mobility/Mass Spectrometry. Anal. Chem. 1997,69, 2504–2509.(45) Liu, Y.; Valentine, S. J.; Counterman, A. E.; Hoaglund, C. S.;

    Clemmer, D. E. Injected-ion Mobility Analysis of Biomolecules. Anal.Chem. 1997, 69, 728A.(46) For a review of IMS techniques see (and references therein):

    Bohrer, B. C.; Merenbloom, S. I.; Koeniger, S. L.; Hilderbrand, A. E.;

    Clemmer, D. E. Biomolecule Analysis by Ion Mobility Spectrometry.Annu. Rev. Anal. Chem. 2008, 1 (10), 1–10.

    (47) Wittmer, D.; Luckenbill, B. K.; Hill, H. H.; Chen, Y. H.Electrospray Ionization Ion Mobility Spectrometry. Anal. Chem. 1994,66, 2348–2355.

    (48) Hoaglund, C. S.; Valentine, S. J.; Sporleder, C. R.; Reilly,J. P.; Clemmer, D. E. Three-Dimensional Ion Mobility/TOFMSAnalysis of Electrosprayed Biomolecules. Anal. Chem. 1998, 70, 2236–2242.

    (49) Gillig, K. J.; Ruotolo, B.; Stone, E. G.; Russell, D. H.; Fuhrer, K.;Gonin, M.; Schultz, A. J. Coupling High-Pressure MALDI with IonMobility/Orthogonal Time-of Flight Mass Spectrometry. Anal. Chem.2000, 72, 3965–3971.

    (50) Hoaglund-Hyzer, C. S.; Li, J.; Clemmer, D. E. MobilityLabeling for Parallel CID of Ion Mixtures. Anal. Chem. 2000, 72, 2737–2740.

    (51) Hoaglund-Hyzer, C. S.; Clemmer, D. E. Ion Trap/Ion Mobi-lity/Quadrupole/Time-of-Flight Mass Spectrometry for Peptide Mix-ture Analysis. Anal. Chem. 2001, 73, 177–184.

    (52) Hoaglund-Hyzer, C. S.; Lee, Y. J.; Counterman, A. E.; ClemmerD. E. Coupling Ion Mobility Separations, Collisional Activation Tech-niques, and Multiple Stages of MS for Analysis of Complex PeptideMixtures. Anal. Chem. 2002, 74, 992–1006.

    (53) Tang, K.; Shvartsburg, A. A.; Lee, H. N.; Prior, D. C.; BuschbachM. A.; Li, F. M.; Tolmachev, A. V.; Anderson, G. A.; Smith, R. D. High-Sensitivity IonMobility Spectrometry/Mass Spectrometry Using Electro-dynamic Ion Funnel Interfaces. Anal. Chem. 2005, 77, 3330–3339.

    (54) Koeniger, S. L.; Merenbloom, S. I.; Valentine, S. J.; Jarrold,M. F.; Udseth, H. R.; Smith, R. D.; Clemmer, D. E. An IMS-IMSAnalogue of MS-MS. Anal. Chem. 2006, 78, 4161.

    (55) Revercomb, H. E.; Mason, E. A. Theory of Plasma Chroma-tography Gaseous Electrophoresis - Review. Anal. Chem. 1975, 47, 970–983.

    (56) Mason, E. A.; McDaniel; E. W. Transport Properties of Ions inGases; Wiley: New York, 1988.

    (57) Shvartsburg, A. A.; Jarrold, M. F. An exact hard-spheresscattering model for the mobilities of polyatomic ions. Chem. Phys. Lett.1996, 261, 86–91.

    (58) Mesleh, M. F.; Hunter, J. M.; Shvartsburg, A. A.; Schatz, G. C.;Jarrold, M. F. Structural information from ion mobility measurements:effects of the long-range potential. J. Phys. Chem. 1996,100, 16082–16086.

    (59) Wyttenbach, T; von Helden, G; Batka, J. J.; Carlat, D.; Bowers,M. T. Effect of the long-range potential on ion mobility measurementsJ. Am. Chem. Soc. 1997, 8, 275–282.(60) Valentine, S. J.; Kulchania, M.; Srebalus Barnes, C. A.; Clem-

    mer, D. E. Multidimensional separations of complex peptide mixtures: acombined high-performance liquid chromatography/ion mobility/time-of-flight mass spectrometry approach. Int. J. Mass Spectrom. 2001,212, 97–109.

    (61) Srebalus Barnes, C. A.; Hilderbrand, A. E.; Valentine, S. J.;Clemmer, D. E. Resolving Isomeric Peptide Mixtures: A CombinedHPLC/Ion Mobility-TOFMS Analysis of a 4000-Component Combi-natorial Library. Anal. Chem. 2002, 74, 26–36.

    (62) Moon, M. H.; Myung, S.; Plasencia, M.; Hilderbrand, A. E.;Clemmer, D. E. Nanoflow LC/IonMobility/CID/TOF for Proteomics:Analysis of a Human Urinary Proteome. J. Proteome Res. 2003,2, 589–597.

    (63) Taraszka, J. A.; Kurulugama, R.; Sowell, R.; Valentine, S. J.;Koeniger, S. L; Arnold, R. J.; Miller, D. F.; Kaufman, T. C.; Clemmer,D. E. Mapping the Proteome of Drosophila melanogaster: Analysis ofEmbryos and Adult Heads by LC-IMS-MS Methods. J. Proteome Res.2005, 4, 1223–1237.

    (64) Valentine, S. J.; Plasencia, M. D.; Liu, X.; Krishnan, M.; Naylor,S.; Udseth, H. R.; Smith, R. D.; Clemmer, D. E. Toward PlasmaProteome Profiling with Ion Mobility-Mass Spectrometry. J. ProteomeRes. 2006, 5, 2977–2984.

  • 2329 dx.doi.org/10.1021/pr1011312 |J. Proteome Res. 2011, 10, 2318–2329

    Journal of Proteome Research ARTICLE

    (65) Liu, X.; Valentine, S. J.; Plasencia, M. D.; Trimpin, S.; Naylor,S.; Clemmer, D. E. Mapping the Human Plasma Proteome by SCX-LC-IMS-MS. J. Am. Soc. Mass Spectrom. 2007, 18, 1249–1264.(66) Silva, J. C.; Denny, R.; Dorschel, C. A.; Gorenstein, M.; Kass,

    I. J.; Li, G. Z.; McKenna, T.; Nold, M. J.; Richardson, K.; Young, P.;Geromanos, S. Quantitative proteomic analysis by accurate mass reten-tion time pairs. Anal. Chem. 2005, 77 (7), 2187–2200.(67) The Mathworks, Inc. http://www.mathworks.com/products/

    matlab/.(68) Leon, S. J. Linear Algebra with Applications, 3rd ed.; Macmillan:

    New York, 1990; p 208.(69) http://en.wikipedia.org/wiki/Numerical_methods_for_linear_

    least_squares.(70) Bethea, R. M.; Duran, B. S.; Boullion, T. L. Statistical Methods

    for Engineers and Scientists, 2nd ed.; Marcel Dekker: New York, 1985.(71) Ledvij, M. Curve fitting made easy. http://physik.uibk.ac.at/

    hephy/muon/origin_curve_fitting_primer.pdf.(72) Goloborodko, A. A.; Mayerhofer, C.; Zubarev, A. R.; Tarasova,

    I. A.; Gorshkov, A. V.; Zubarev, R. A.; Gorshkov, M. G. Empiricalapproach to false discovery rate estimation in shotgun proteomics. RapidCommun. Mass Spectrom. 2010, 24, 454–462.