Invited Speaker Cristina C. Clement, Ph.D.

58
The Student Research and Scholarship Center Grove School of Engineering, And Pathways Bioinformatics Center , CCNY Present Winter Bioinformatics Workshop 10am-6pm, January 20 – 21, 2009 Location: Marshak, Room MR-044 Contact info: [email protected] , 212-650-8870 Invited Speaker Cristina C. Clement, Ph.D. Albert Einstein College of Medicine, Pathology Department Topic: Bioinformatics tools for proteins identification from primary sequence databases using mass spectrometry data January 21, Marshak Building, Room MR-044

description

The Student Research and Scholarship Center Grove School of Engineering, And Pathways Bioinformatics Center , CCNY Present Winter Bioinformatics Workshop 10am-6pm, January 20 – 21, 2009 Location: Marshak , Room MR-044 Contact info: [email protected] , 212-650-8870. Invited Speaker - PowerPoint PPT Presentation

Transcript of Invited Speaker Cristina C. Clement, Ph.D.

Page 1: Invited Speaker Cristina C. Clement, Ph.D.

The Student Research and Scholarship CenterGrove School of Engineering,

And Pathways Bioinformatics Center , CCNYPresent

Winter Bioinformatics Workshop 10am-6pm, January 20 – 21, 2009

Location: Marshak, Room MR-044

Contact info: [email protected], 212-650-8870

Invited Speaker

Cristina C. Clement, Ph.D. Albert Einstein College of Medicine, Pathology Department

Topic: Bioinformatics tools for proteins identification from primary sequence databases

using mass spectrometry data

January 21, Marshak Building, Room MR-044

1:00pm-2:00pm: Presentation 2:30pm-4:00pm: Online practice

Page 2: Invited Speaker Cristina C. Clement, Ph.D.
Page 3: Invited Speaker Cristina C. Clement, Ph.D.
Page 4: Invited Speaker Cristina C. Clement, Ph.D.
Page 5: Invited Speaker Cristina C. Clement, Ph.D.

Accuracy & Resolution in Mass SpectroscopyWhen low molecular weight samples are being analysed using relatively low resolution mass spectrometers, it is common to work with "nominal" mass values, calculated from integer atomic weights. That is, H=1, C=12, N=14, O=16, etc. Nominal mass is rarely used in peptide and protein work because the cumulative error of approximating atomic weights with integers becomes unacceptable.

The presence of isotopes at their natural abundances makes it essential to define whether an experimental mass value is an "average" value, equivalent to taking the centroid of the complete isotopic envelope, or a "monoisotopic" value, the mass of the first peak of the isotope distribution.

For peptides and proteins, the difference between an average and a monoisotopic weight is approximately 0.06%. This is a significant difference when even the most modest instruments are capable of measuring the mass of a small peptide with an accuracy of a fraction of a Dalton. For example, peptide HLKTEAEMK has an average molecular weight of 1086.28 and a monoisotopic weight of 1085.55. At a mass resolution of 5000, the isotopic envelope has this appearance:

m/z

Page 6: Invited Speaker Cristina C. Clement, Ph.D.

Mass resolution is the dimensionless ratio of the mass of the peak divided by its width. Usually, the peak width is taken as the full width at half maximum intensity, (fwhm). However, this definition of peak width is only a convention, and you may also encounter data acquired on magnetic sector instruments where the resolution has been calculated using the peak width at 5% maximum intensity.

To measure a monoisotopic molecular weight requires (i) sufficient mass resolution to resolve the the isotopic distribution (ii) sufficient signal to noise to be able to identify the first peak of the envelope with confidence. For a small peptide, the first peak (often referred to as the 12C peak) is also the most intense peak. This is not the case for larger molecules. The following two examples show the isotopic envelopes for a small protein (insulin) and a larger protein (BSA):

It would be extremely difficult to measure a monoisotopic mass for BSA, and it is far from routine to measure a monoisotopic mass for insulin. In practice, most instruments report monoisotopic molecular weights up to a certain cut-off point. Above this

cut-off, isotopic envelopes are centroided as a whole to provide average mass values.

Accuracy & Resolution in Mass Spectroscopy

m/z m/z

Page 7: Invited Speaker Cristina C. Clement, Ph.D.

Accuracy & Resolution in Mass Spectroscopy

The factor which complicates any general discussion of resolution optimisation is that some types of mass analyser have a trade-off between resolution and sensitivity, while others do not.

Where a monoisotopic peak for a single molecular species can be resolved, mass accuracy tends to follow resolution. This is because the narrower the peak, the less the significance of errors due to variations in the peak shape. So, if unit mass resolution is possible, then the more resolution the better ... unless there is a sensitivity trade-off.

If unit mass resolution is not possible, then there is little benefit to exceeding the instrument resolution at which the isotopic envelope can be defined without significant broadening. For example, the following figure shows the molecular ion of glucagon at resolutions of 1000 (blue), 3000 (red), 10,000 (green) and 30,000 (black).

For an average mass measurement, and where there is no trade-off between sensitivity and resolution, the accuracy at 3000 resolution (red) will be just as good as at higher resolution. On an instrument where a trade-off exists, using a resolution greater than 3000 is very likely to degrade mass accuracy.

m/z

Page 8: Invited Speaker Cristina C. Clement, Ph.D.

In this MS-MS approach (depicted above) a first mass spectrometer (MS-1) that employs a quadrupole mass filter is tuned to allow only the analyte ion of interest (e.g. red above) through.

This is then taken into a collision cell where Argon is used to fragment the analyte, and the so-called daughter ions are then swept into a second time-of-flight MS (MS-2) where they are separated and detected.

Page 9: Invited Speaker Cristina C. Clement, Ph.D.
Page 10: Invited Speaker Cristina C. Clement, Ph.D.
Page 11: Invited Speaker Cristina C. Clement, Ph.D.
Page 12: Invited Speaker Cristina C. Clement, Ph.D.
Page 13: Invited Speaker Cristina C. Clement, Ph.D.
Page 14: Invited Speaker Cristina C. Clement, Ph.D.
Page 15: Invited Speaker Cristina C. Clement, Ph.D.
Page 16: Invited Speaker Cristina C. Clement, Ph.D.
Page 17: Invited Speaker Cristina C. Clement, Ph.D.

The most common peptide fragments observed in low energy collisions are a, b and y ions, as described in the figure above. The b ions appear to extend from the amino terminus, sometimes called the N-terminus, and y ions appear to extend from the carboxyl terminus, or C-terminus. While readily observed and diagnostic for b ions, a ions occur at a lower frequency and abundance in relation to b ions. The a ions are often used as a diagnostic for b ions, such that a-b pairs are often observed in fragment spectra. The a-b pairs are separated by 28u, the mass for the carbonyl, C=O. The fragment types listed above are the most common fragments observed with ion trap, triple quadrupole, and q-TOF mass spectrometers.

Page 18: Invited Speaker Cristina C. Clement, Ph.D.
Page 19: Invited Speaker Cristina C. Clement, Ph.D.

De novo sequencing of peptides

This is an MS/MS spectrum of the tryptic peptide GLSDGEWQQVLNVWGK. This data was collected on an ion trap mass spectrometer. This spectrum will be the subject of our first unblinded de novo sequencing example.

The sequence of the peptide is determined by the mass difference between these peaks.

Page 20: Invited Speaker Cristina C. Clement, Ph.D.

The b fragment peaks are labeled from the amino to the carboxyl terminus. The fragment containing only the amino terminal amino acid is termed b1. The fragment containing the first two amino terminal amino acids is termed the b2 ion, and so forth. The nomenclature is very simple to follow.

You can calculate the mass of any b ion, basically it is the mass of the shortened peptide (M)-17 (OH) = b ion m/z or more simply M-17 = b ion m/z.

Page 21: Invited Speaker Cristina C. Clement, Ph.D.

Shows the first six b ions in a little bit more detail. The b ion m/z value is basically the mass of the peptide minus OH, or -17u.

Page 22: Invited Speaker Cristina C. Clement, Ph.D.
Page 23: Invited Speaker Cristina C. Clement, Ph.D.
Page 24: Invited Speaker Cristina C. Clement, Ph.D.

Loss of Ammonia and Water

y and b ion fragments containing the amino acid residues R, K, Q, and N may appear to lose ammonia, -17.-

y and b ion fragments containing the amino acid residues S, T, and E may appear to lose water, -18. In the case of glutamic acid, E must be at the N-terminus of the fragment for this observation to be made.

Spectral Intensity Rules

b ion intensity will drop when the next residue is P, G or also H, K, and R.-

Internal cleavages can occur at P and H residues. An internal cleavage fragment is a fragment that appears to be a shortened peptide with P and or H at its amino terminus, for example the peptide EFGLPGLQNK may display the b ions PGLQNK, PGLQN, PGLQ, etc. These are the result of a double cleavage event. The y ion intensity will often be the most

prominent peak in the spectrum. -

It is common for b and y ions or y and b ions to swap intensity when a P is encountered in a sequence. This can also be true when the basic residues H, K, or R are encountered in the sequence.

- When a cleavage appears before or after R, the -17 (loss of ammonia) peak can be more prominent than the

corresponding y or b ion.-

When encountering aspartic acid in a sequence, the ion series can die out.

Amino Acid Composition

It is possible to observe immonium ions at the low end of the spectrum that can give a clue to the amino acid composition of a peptide. One caveat is that if you do not see an immonium ion for a particular amino acid, this does

not mean that that amino acid is absent from the sequence.

Page 25: Invited Speaker Cristina C. Clement, Ph.D.

Isobaric MassLeucine and Isolucine have isobaric masses and cannot be differentiated in a low energy collision.

When we see this mass difference in a spectrum we will label it X or Lxx, adopting the Hunt nomenclature.

- Lysine and Glutamine have near isobaric masses, 128.09496 and 128.05858 respectively. The

delta mass is 0.03638 this difference can be used to differentiate K from Q on a mass spectroneter capable of higher mass accuracy and resolution, such as a  q-TOF mass spectrometer. Usually triple quadrupole or ion trap mass spectrometers are incapable of this feat.  On a lower mass

accuracy mass spectrometers an acetylation can be performed to shift the mass of lysine by 42u.  If you like to live dangerously, and we do not, one can assume that a 128 mass shift internally on a

tryptic peptide is a glutamine unless followed by a proline or sometimes aspartic acid.  Other instances of internal lysines left standing after a tryptic digest (this is our personal observation) is

when double lysines occur in a sequence, so be careful.-

There are instances where two residues will nearly equal the mass of a single residue, or a modified residue will nearly equal the mass of another amino acid.

Page 26: Invited Speaker Cristina C. Clement, Ph.D.

More Rules

When starting a de novo sequencing project, start at the high mass end of the spectrum; the lower number of peaks at this end often makes it easier to start sequencing.

 - The region 60 u below the parent mass can be confounded by multiple water and ammonia losses,

be careful.  Realize that glycine may be your first amino acid and may fall in this region. -

Do you want to know if your tryptic peptide ends in a K or an R?  Look for the diagnostic y1 ions at the low end of the spectrum, you may observe 147 for K or 175 for R.

- The b1 fragment is seldom observed making it difficult to determine the order of the first two N-terminal amino acids in a peptide sequence.  Solutions for this problem can include a one step

Edman degradation or an acetylation.-

Once you know the mass of a b or y ion the corresponding y or b ion can be calculated using the following formulas.

- y = (M+H)1+ - b +1

- b = (M+H)1+ - y +1

Page 27: Invited Speaker Cristina C. Clement, Ph.D.
Page 28: Invited Speaker Cristina C. Clement, Ph.D.
Page 29: Invited Speaker Cristina C. Clement, Ph.D.

http://www.matrixscience.com/

Page 30: Invited Speaker Cristina C. Clement, Ph.D.

Mascot Search OverviewMascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases. While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

Peptide Mass Fingerprint in which the only experimental data are peptide mass values.Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query.MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides.

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.The sequence databases that can be searched on this server are:MSDB is a comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London. MSDB is designed specifically for mass spectrometry applications. NCBInr is a comprehensive, non-identical protein database maintained by NCBI for use with their search tools BLAST and Entrez. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB. SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical, so you may get fewer matches for an MS/MS search than you would from a comprehensive database, such as MSDB or NCBInr. SwissProt is ideal for peptide mass fingerprint searches. dbEST is the division of GenBank that contains "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. dbEST is a very large database, and is divided into three sections: EST_human, EST_mouse, and EST_others. Even so, searches of these databases take far longer than a search of one of the non-redundant protein databases. You should only search an EST database if a search of a protein database has failed to find a match.

http://www.matrixscience.com/

Page 31: Invited Speaker Cristina C. Clement, Ph.D.

http://www.matrixscience.com/

Page 32: Invited Speaker Cristina C. Clement, Ph.D.
Page 33: Invited Speaker Cristina C. Clement, Ph.D.
Page 34: Invited Speaker Cristina C. Clement, Ph.D.
Page 35: Invited Speaker Cristina C. Clement, Ph.D.
Page 36: Invited Speaker Cristina C. Clement, Ph.D.
Page 37: Invited Speaker Cristina C. Clement, Ph.D.
Page 38: Invited Speaker Cristina C. Clement, Ph.D.
Page 39: Invited Speaker Cristina C. Clement, Ph.D.
Page 40: Invited Speaker Cristina C. Clement, Ph.D.
Page 41: Invited Speaker Cristina C. Clement, Ph.D.
Page 42: Invited Speaker Cristina C. Clement, Ph.D.
Page 43: Invited Speaker Cristina C. Clement, Ph.D.
Page 44: Invited Speaker Cristina C. Clement, Ph.D.
Page 45: Invited Speaker Cristina C. Clement, Ph.D.
Page 46: Invited Speaker Cristina C. Clement, Ph.D.
Page 47: Invited Speaker Cristina C. Clement, Ph.D.
Page 48: Invited Speaker Cristina C. Clement, Ph.D.
Page 49: Invited Speaker Cristina C. Clement, Ph.D.
Page 50: Invited Speaker Cristina C. Clement, Ph.D.

All matches to this query

ScoreMr(calc): Delta Sequence

46.1 1604.8535 0.0046 THRIHWESASLLR

31.0 1605.7594 -0.9012 SSNSSPSRXSLGQLSE

30.2 1604.8133 0.0448 PRFCASLAGGAWLSGL

28.8 1605.7006 -0.8424 ETKEGTEGEGLQEEA

28.8 1605.7006 -0.8424 ETKEGTEGEGLQEEX

28.8 1604.7722 0.0859 GGSLYEAPVSYTFSK

28.5 1605.8627 -1.0045 APVIPAPWEAKAGGSR

27.6 1604.7715 0.0866 EEGTAASLADIMEIR

27.0 1603.8318 1.0264 HVLAHSESINVIAQS

26.9 1604.7464 0.1118 SSSILAMRDEQSNPA

Protein ViewMatch to: gi|226159 Score: 46: complement C3fNominal mass (Mr): 2020; NCBI BLAST search of gi|226159 against nrUnformatted sequence string for pasting into other applicationsTaxonomy: Homo sapiensNo enzyme cleavage specificitySequence Coverage: 76%Matched peptides shown in Bold Red 1 SSKITHRIHW ESASLLR

Peptide ViewMS/MS Fragmentation of THRIHWESASLLRFound in gi|226159, complement C3fMatch to Query 1: 1604.858172 from(535.960000,3+)

Mascot search of MS/MS databasesCristina C. Clement, unpublished results

Page 51: Invited Speaker Cristina C. Clement, Ph.D.

Monoisotopic mass of neutral peptide Mr(calc): 1604.8535Ions Score: 46 Expect: 5.4 Matches (Bold Red): 43/134 fragment ions using 51 most intense peaks

# b b++ b* b*++ b0 b0++ Seq.

y y++ y* y*++ y0 y0++ #

1 102.0550 51.5311     84.0444 42.5258 T             13

2 239.1139 120.0606     221.1033111.0553

H1504.8132

752.9102

1487.7866

744.3969

1486.8026

743.9049

12

3 395.2150 198.1111378.1884

189.5979

377.2044189.1058

R1367.7542

684.3808

1350.7277

675.8675

1349.7437

675.3755

11

4 508.2990 254.6532491.2725

246.1399

490.2885245.6479

I1211.6531

606.3302

1194.6266

597.8169

1193.6426

597.3249

10

5 645.3580 323.1826628.3314

314.6693

627.3474314.1773

H1098.5691

549.7882

1081.5425

541.2749

1080.5585

540.7829

9

6 831.4373 416.2223814.4107

407.7090

813.4267407.2170

W 961.5102481.2587

944.4836472.7454

943.4996472.2534

8

7 960.4799 480.7436943.4533

472.2303

942.4693471.7383

E 775.4308388.2191

758.4043379.7058

757.4203379.2138

7

8 1047.5119 524.25961030.4853

515.7463

1029.5013

515.2543

S 646.3882323.6978

629.3617315.1845

628.3777314.6925

6

9 1118.5490 559.77811101.5225

551.2649

1100.5384

550.7729

A 559.3562280.1817

542.3297271.6685

541.3457271.1765

5

10

1205.5810 603.29421188.5545

594.7809

1187.5705

594.2889

S 488.3191244.6632

471.2926236.1499

470.3085235.6579

4

11

1318.6651 659.83621301.6385

651.3229

1300.6545

650.8309

L 401.2871201.1472

384.2605192.6339

    3

12

1431.7492 716.37821414.7226

707.8649

1413.7386

707.3729

L 288.2030144.6051

271.1765136.0919

    2

13

            R 175.1190 88.0631 158.0924 79.5498     1

Cristina C. Clement, unpublished results

Page 52: Invited Speaker Cristina C. Clement, Ph.D.

LII_3 #8410 RT: 48.42 AV: 1 NL: 1.95E5T: ITMS + p NSI d Z ms [ 546.00-556.00]

546 547 548 549 550 551 552 553 554 555 556m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

549.80

550.24

550.72

551.24555.26551.82546.70 548.88547.18 553.54548.40 552.64 555.74554.12

550.52-LTQ (2+)

LII_3 #8411 RT: 48.42 AV: 1 NL: 1.93E5T: ITMS + c NSI d Full ms2 [email protected] [ 140.00-1115.00]

200 300 400 500 600 700 800 900 1000 1100m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

775.48

324.25

961.56

646.52

540.90296.28

435.30

472.80757.55 924.45195.21 279.26 811.45401.51 611.35 698.42 971.52896.33 1054.20

Cristina C. Clement, unpublished results

Page 53: Invited Speaker Cristina C. Clement, Ph.D.

All matches to this query

ScoreMr(calc): Delta Sequence

45.2 1097.5618 0.0237 HWESASLLR

30.0 1097.5651 0.0203 HGKEMDLLR

29.2 1098.4441 -0.8586 GFTFSASDMH

27.6 1097.5869 -0.0015 AFSFSSALIR

26.9 1098.4652 -0.8797 HSTYSSLMSS

26.7 1097.4989 0.0865 HGEEASSAIPT

24.6 1097.6094 -0.0240 HQGKLVFNR

24.1 1098.4764 -0.8910 HGEEGMGQGVV

24.0 1097.5353 0.0502 HGEKEEELK

24.0 1099.4274 -1.8420 QSQKSSMDSC

Protein ViewMatch to: gi|226159 Score: 42: complement C3fNominal mass (Mr): 2020; NCBI BLAST search of gi|226159 against nrUnformatted sequence string for pasting into other applicationsTaxonomy: Homo sapiens

No enzyme cleavage specificitySequence Coverage: 52%

Matched peptides shown in Bold Red

1 SSKITHRIHW ESASLLR

Peptide ViewMS/MS Fragmentation of HWESASLLRFound in gi|226159, complement C3fMatch to Query 1: 1097.585448 from(549.800000,2+)

Cristina C. Clement, unpublished results

Page 54: Invited Speaker Cristina C. Clement, Ph.D.

Monoisotopic mass of neutral peptide Mr(calc): 1097.5618Ions Score: 45 Expect: 6.5 Matches (Bold Red): 26/70 fragment ions using 37 most intense peaks

# b b++ b0 b0++ Seq. y y++ y* y*++ y0 y0++ #

1138.066

269.5367     H             9

2324.145

5162.576

4    W

961.5102

481.2587

944.4836

472.7454

943.4996

472.2534

8

3453.188

1227.097

7435.177

5218.092

4E

775.4308

388.2191

758.4043

379.7058

757.4203

379.2138

7

4540.220

1270.613

7522.209

6261.608

4S

646.3882

323.6978

629.3617

315.1845

628.3777

314.6925

6

5611.257

2306.132

3593.246

7297.127

0A

559.3562

280.1817

542.3297

271.6685

541.3457

271.1765

5

6698.289

3349.648

3680.278

7340.643

0S

488.3191

244.6632

471.2926

236.1499

470.3085

235.6579

4

7811.373

3406.190

3793.362

8397.185

0L

401.2871

201.1472

384.2605

192.6339

    3

8924.457

4462.732

3906.446

8453.727

1L

288.2030

144.6051

271.1765

136.0919

    2

9         R175.119

088.0631

158.0924

79.5498     1

1 SSKITHRIHW ESASLLR - 1st hit for complement c3f

1 SSKITHRIHW ESASLLR - 2nd hit for complement c3f

Cristina C. Clement, unpublished results

Page 55: Invited Speaker Cristina C. Clement, Ph.D.

Monoisotopic:733.32, Charge= +2

LII_3 #12446 RT: 72.23 AV: 1 NL: 7.50E3T: ITMS + p NSI d Z ms [ 729.00-739.00]

729 730 731 732 733 734 735 736 737 738 739m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

733.32

733.76

734.22

734.72

735.18729.38 731.20 737.22730.04 731.82 735.48 738.30

LII_3 #12447 RT: 72.23 AV: 1 NL: 6.88E2T: ITMS + c NSI d Full ms2 [email protected] [ 190.00-1480.00]

200 400 600 800 1000 1200 1400m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

645.41

445.28 574.35

758.45

1077.53821.45

1021.39905.49

1291.42561.34 1135.51388.31

331.27 1192.391360.78

Cristina C. Clement, unpublished results

Probability Based Mowse ScoreIons score is -10*Log(P), where P is the probability that the observed match is a random event.Individual ions scores > 56 indicate peptides with significant homology.Individual ions scores > 73 indicate identity or extensive homology (p<0.05).Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits.

Page 56: Invited Speaker Cristina C. Clement, Ph.D.

Peptide View MS/MS Fragmentation of DSGEGDFLAEGGGVRFound in gi|229185, fibrinopeptide AMatch to Query 1: 1464.625448 from(733.320000,2+)Data file C:\Documents and Settings\Laura Santambrogio\Desktop\rksah062708\LTQ\LTQ-ccc\LII_3RAW071608\12446Raw_2+.txt

# b b++ b0 b0++ Seq. y y++ y* y*++ y0 y0++ #1 116.0342 58.5207 98.0237 49.5155 D             15

2 203.0662 102.0368 185.0557 93.0315 S 1350.6284 675.8179 1333.6019 667.3046 1332.6179 666.8126 14

3 260.0877 130.5475 242.0771 121.5422 G 1263.5964 632.3018 1246.5699 623.7886 1245.5858 623.2966 13

4 389.1303 195.0688 371.1197 186.0635 E 1206.5749 603.7911 1189.5484 595.2778 1188.5644 594.7858 12

5 446.1518 223.5795 428.1412 214.5742 G 1077.5323 539.2698 1060.5058 530.7565 1059.5218 530.2645 11

6 561.1787 281.0930 543.1681 272.0877 D 1020.5109 510.7591 1003.4843 502.2458 1002.5003 501.7538 10

7 708.2471 354.6272 690.2366 345.6219 F 905.4839 453.2456 888.4574 444.7323 887.4734 444.2403 9

8 821.3312 411.1692 803.3206 402.1639 L 758.4155 379.7114 741.3890 371.1981 740.4050 370.7061 8

9 892.3683 446.6878 874.3577 437.6825 A 645.3315 323.1694 628.3049 314.6561 627.3209 314.1641 7

10 1021.4109 511.2091 1003.4003 502.2038 E 574.2944 287.6508 557.2678 279.1375 556.2838 278.6455 6

11 1078.4324 539.7198 1060.4218 530.7145 G 445.2518 223.1295 428.2252 214.6162     5

12 1135.4538 568.2305 1117.4433 559.2253 G 388.2303 194.6188 371.2037 186.1055     4

13 1192.4753 596.7413 1174.4647 587.7360 G 331.2088 166.1081 314.1823 157.5948     3

14 1291.5437 646.2755 1273.5331 637.2702 V 274.1874 137.5973 257.1608 129.0840     2

15         R 175.1190 88.0631 158.0924 79.5498     1

Monoisotopic mass of neutral peptide Mr(calc): 1464.6481Ions Score: 123 Expect: 5.3e-07 Matches (Bold Red): 38/130 fragment ions using 34 most intense peaks

Cristina C. Clement, unpublished results

Page 57: Invited Speaker Cristina C. Clement, Ph.D.

Score Mr(calc): Delta Sequence

123.5 1464.6481 -0.0227 DSGEGDFLAEGGGVR

79.8 1463.6641 0.9614 DSGEGDFLAEGGGVR

37.2 1464.6079 0.0176 LDLCQDSFPGNPTG

36.6 1467.6776 -3.0521 EMYRNLAQGRNV

36.6 1467.6776 -3.0521 EMYRNLAQGRNV

35.6 1464.6721 -0.0466 CARGWAFDIWGQG

33.7 1464.7606 -0.1352 SSVGTEMIITKAGR

33.5 1465.6645 -1.0390 RSSGGETETTGQSAV

33.5 1466.6485 -2.0231 RSSGGETETTGQSAV

32.8 1464.6568 -0.0314 CARDQAFDIWGQG

1 MFSMRIVCLV LSVVGTAWTA DSGEGDFLAE GGGVRGPRVV ERHQSACKDS 51 DWPFCSDEDW NYKCPSGCRM KGLIDEVNQD FTNRINKLKN SLFEYQKNNK 101 DSHSLTTNIM EILRGDFSSA NNRDNTYNRV SEDLRSRIEV LKRKVIEKVQ 151 HIQLLQKNVR AQLVDMKRLE VDIDIKIRSC RGSCSRALAR EVDLKDYEDQ 201 QKQLEQVIAK DLLPSRDRQH LPLHSSLGDR ARLHLKTNKT AKKKKKKKKK

Cristina C. Clement, unpublished results

Page 58: Invited Speaker Cristina C. Clement, Ph.D.

Bioinformatics Workshop project

Leading project supervisor: Cristina C. Clement, Ph.D.

Set up the MS/MS Ion search parameters using Mascot algorithmhttp://www.matrixscience.com/

Import txt. files of ms/ms data at the Mascot server ON LINE(3 independent txt files are provided/student)

01/21/2009

Analyze Mascot results

-Analyze Peptide View, understand the scoring functions -Analyze the Protein View, understand the sequence coverage

Re-analyze the txt MS/MS files using independent searching algorithmsat PROSPECT and PROFOUND webservers