MasSPIKE (Mass SPectrum Interpretation and Kernel ...€¦ · substantially simplify interpretation...
Transcript of MasSPIKE (Mass SPectrum Interpretation and Kernel ...€¦ · substantially simplify interpretation...
MasSPIKE (Mass SPectrumInterpretation and Kernel
Extraction) for Biological SamplesParminder Kaur
���
��
�
, Konstantin Aizikov
��
��
�
,
Bogdan Budnik
��
�
and Peter B. O’Connor
��
��
�
�
Department of Electrical and Computer Engineering, Boston University
�
Cardiovascular Proteomics Center, Boston University School of Medicine
�
Mass Spectrometry Resource, Department of Biochemistry,
Boston University School of Medicine
�
Department of Bioinformatics, Boston University
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.1
Introduction
Goal - Reducing complex mass spectra intomonoisotopic mass lists
Noise Baseline Modelling
Isotopic Distribution (ID) Identification
Charge State Determination
Picking Experimental Isotopic Peaks
Alignment of a Theoretical Isotopic Distribution (TID)with the Experimental Isotopic Distribution (EID)
Generating the Monoisotopic Mass List
Matching Observed Masses against TheoreticalFragment Masses from Given Sequence
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.2
Noise Baseline Modelling
Baseline of a top down spectrum of bovine carbonic anhydrase (blue), noise mean vs m/z(white)
Model based on the mean of the signal across m/z range
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.3
Isotopic Distribution (ID) Identification
(a) (b)
(a) Top down spectrum of BCA with red and green lines indicating start and end of IDs (b)Zoomed-in view
Isotopic distribution identification uses (default) S/N=3 as a trigger threshold
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.4
Charge State Determination
Isotopic Distributions obtained from previous step arepassed as input for z determination
Two new methodsMaximum Likelihood (ML) method using FourierTransform (FT) of EIDMatched Filter Approach
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.5
ML method using FT of EID
An EID is composed of complex exponentials withfundamental frequency corresponding to the chargestate and its harmonics
Peak locations are used to identify the charge state
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.6
Matched Filter (MF) Approach
Parameters for generating TID (peak width, inter-pointspacing, MAX Z, MIN Z) are based upon the data
The TID (represented by T(Z) for charge state Z) thatgives maximum value of cross-correlation coefficientwith EID (E) generally represents the true charge state
� ���� � ��� � ���� � �� �� ���� � � � � (1)
� � � �� arg max� � � �� � � (2)
� �� ���
� � � �� � � ��� � �� � �� � � � �� � � � ��� �� �
� � � �� � � ��� � ! � �� � �� � � � �� � � � � � �� � ! (3)
�est
� arg max� � �� �
(4)
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.7
Typical MF Match
(a) Raw Spectrum
(b) Z=3
(c) Output List
(a) EID of a fragment of BCA (b) TID with Z=3 (red) and EID (blue) TID shift corresponds tomaximum value of cross-correlation coefficient (0.954) between the two (c) Snapshot ofoutput listing corresponding to above fragment ( � ! � )
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.8
Automated Comparison of Charge State Determination Methods
Results using 775 isotopic distributions from myoglobin using 26 spectra with charge statesranging from 8-22 and from S/N of 1-100, comparison of different methods
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.9
Advantages of MF over ML
Results are better 91%(MF) vs 88%(ML)
Allows for pulling out EIDs from the observed signaleven when signal contains multiple distributions
Works better in case of overlapping distributions
Since ML method uses FT map, it works better forhigher z than for lower z, while MF works equally wellfor both cases
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.10
Picking Isotopic Peaks and ML alignment
(a) (b) (c)
(d) (e) (f)
(a) Picking isotopic peaks of EID of myoglobin, Z=16 (b) TID of myoglobin, Alignment with (c)TID shifted by 5 (d) TID shifted by 6 (e) TID shifted by 7 (f) Probability of alignment as afunction of TID indices
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.11
Testing ML Alignment with Low Ion Numbers
(a) (b)
(a) Alignment of myoglobin IDs using 3150 simulations (100 ions in each simulation) (b) Atypical 100 ion distribution of myoglobin
� � � � �� � �� � � ��� � � � � �� �� � � � � � � ��
� ���� �� � � � � �� � � (5)
index� arg � ��� � � � � � �� � � �
(6)
� �� � � �� � � �� � � � � � � �� where
� � =Length of E (7)
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.12
Separating Overlapping Distributions
(a) Raw Spectrum
(b) Z=3, r=0.74
(c) Z=4, r=0.64
(d) Residual Signal
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.13
Low Charge State Overlapping Distributions from Top-Down Spectrum of BCA
(a) Raw Spectrum
(b) Z=4
(c) Residual
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.14
(d) Z=1
(e) Z=3
(f) Residual
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.15
Analysis of top-down spectrum of Ubch10 - Mixed Z Cases
(a) Input Signal
(b) Z=14, r=0.76
(c) Residual
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.16
(d) Z=14, r=0.74
(e) Residual
(f) Z=1, r=0.5
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.17
(g) Z=2, r=0.51
(h) Z=14, r=0.57
(i) Residual
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.18
(j) Z=1, r=0.54
(k) Z=2, r=0.5
(l) Z=14, r=0.58
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.19
(m) Residual
(n) Z=1,r=0.55
(o) Z=14,r=0.55
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.20
(p) Final Residual
Applying MasSPIKE to a particular noisy region of a top-down mass spectrum of abiologically derived protein Ubch10 (a) Input signal (b) z=14 detected (c) Residual aftersubtraction of (b) from (a) (d) z=14 detected in region m/z=1056-1057 (e) Residual signal (f),(g) & (h) z=1, 2 and 14 detected simultaneously (z=1 and 2 are probably false positives dueto chemical noise) (i) Residual signal after subtraction of signal due to already determinedcharge states (j), (k) & (l) z=1, 2 and 14 detected simultaneously again, sharing threecommon peaks (m) Remaining signal (n) & (o) z=1 and 14 being detected (p) Final residual.Overall, 10 isotopic distributions were recovered in an 8 m/z window
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.21
Mass Spectrum of Hemoglobin of a normal person
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.22
Hemoglobin Variants Analysis
Spectrum of Hemoglobin variants and comparison between theoretical and experimentalmasses
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.23
Conclusions
Matched Filter method works best for charge statedetermination, helps in resolving overlappingdistributions
Maximum likelihood based alignment improves theaccuracy of monoisotopic masses
MasSPIKE has been tested against analysis of complexspectra from biologically derived proteins
Once fully implemented in BUDA[5], MasSPIKE willsubstantially simplify interpretation of mass spectra
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.24
Acknowledgments
Prof W. Clem Karl Dr Amit JunejaDr Hua Huang Dr Judith JebanathirajahJason J. Cournoyer Dr Cheng ZhaoRaman Mathur Dr Cheng LinDr Roger Theberge Vera IvlevaDr Mark McComb Dr Jason PittmanProf Catherine E. Costello Prof Richard CohenDr David Perlman
This work was supported in part by Federal funds from theNational Center for Research Resources under grant No.P41-RR10888 and the National Heart, Lung, and Blood
Institute under Contract No. HHSN268200248178C.
MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples – p.25
References
[1] M W Senko; S C Beu; F W McLafferty, “Automated Assign-
ment of Charge States from Resolved Isotopic Peaks for
Multiply Charged Ions”, J. Am. Soc. Mass Spectrom.; 1995;
6, 52-56
[2] A L Rockwood, “Ultrahigh-Speed Calculation of Isotope Dis-
tributions”, Anal Chem; 1996; 68; 2027-2030
[3] D M Horn; R A Zubarev; F W McLafferty, “Automated Re-
duction and Interpretation of High Resolution Electrospray
Mass Spectra of Large Molecules”, J. Am. Soc. Mass Spec-
trom.; 2000; 11; 320-332
[4] P Kaur; P B O’Connor, “Use of Statistical Methods for Esti-
mation of Total Number of Charges in a Mass Spectrometry
Experiment”, Anal Chem; 2004; 76; 2756-2762
[5] P B O’Connor, “BUDA - Boston University Data Analysis
www.bumc.bu.edu/ftms”