The 2012-2013 ABRF Proteomics Research Group study ... · 0 500 1000 1500 2000 Number of identified...
Transcript of The 2012-2013 ABRF Proteomics Research Group study ... · 0 500 1000 1500 2000 Number of identified...
0
500
1000
1500
2000
Number of identified spectra from top twelve bovine proteins
Conclusions.
Preliminary conclusions indicate that this study has provided a relatively large resource for assessing the amount of variability across
datasets within and between laboratories. This repository of data was collected over a 9 month period from more than 50 laboratories
and contains mass spectrometry data, information on instrument (both mass spectrometer and liquid chromatography system) operation
and performance, workflows, and operator experience. The survey data indicates that most users participating in this study have
significant operating experience, however some laboratories have much less longitudinal variation than others. Continued data analyses
may reveal factors which contribute most significantly to both greater and lesser amounts of variation within a laboratory. This information
could prove useful for the design and recommendation of best practices for proteomics analyses.
The 2012-2013 ABRF Proteomics Research Group study: Assessing longitudinal variability in
routine peptide LC-MS/MS analysis.
Maureen Bunger1; Tracy Andacht2; Keiryn Bennett3; Cory Bystrom4; Matthew Chambers5; Larry Dangott6; Felix Elortza7; John Leszyk8; Henrik Molina9; Robert Moritz10; Brett Phinney11; David Tabb 5; J. Will Thompson12; Xia Wang13; Jason Williams14 1Proteovations, LLC, RTP, NC; 2Centers for Disease Control and Prevention, Atlanta, GA; 3CeMM Research Center for Molecular Medicine, Vienna, Austria; 4Cleveland HeartLab, Inc., Akron, OH; 5Vanderbilt University, Nashville , TN; 6Texas A&M University, College Station, TX; 7Centro de Investigacion Cooperativa en Biociencias,
Bilbao, Spain; 8University of Massachusetts, Shrewsbury, MA; 9The Rockefeller University, New York, NY;10Institute for Systems Biology, Seattle, WA; 11University of California-Davis, Davis, CA; 12Duke University, Durham, NC; 13University of Cincinnati, Cincinnati, OH; 14National Institute of Environmental Health Sciences, Research Triangle Park, NC
Introduction. Many factors contribute variability in LC-MS/MS identification of complex peptide mixtures, and yet few studies have characterized the stability of
performance among many laboratories. The PRG study for 2012-2013 collected LC-MS/MS data sets spanning sixty-four participants across nine
months for a common digested protein mixture, with the goal of recognizing key sources of variability in HPLC and MS performance through QC
metrics. No standard operating protocol was imposed on participants; instead, participants employed methods that were typical for their
laboratories. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition
settings, and routine and non-routine maintenance procedures.
Methods. Participants were provided with nine vials of Michrom 6 Bovine Tryptic Digest Equal Molar Mix. At monthly intervals, participants uploaded raw data
files from data-dependent LC-MS/MS of these peptides. Waters data were translated to mzML using the Protein Lynx Global Server, and AB SCIEX
data were converted with the AB SCIEX MS Data Converter; all other files were translated using ProteoWizard msConvert (see Figure 1). The
mzML files for Waters instruments were incompatible with downstream tools. MyriMatch provided semi-tryptic database search identifications
against the RefSeq bovine proteome. IDPicker 3 filtered identifications and conducted parsimonious protein assembly. QuaMeter was run in
identification-dependent and identification-independent modes for metric computation (http://fenchurch.mc.vanderbilt.edu). The “IDFree” mode
generates 40 metrics for extracted ion chromatograms, retention time, mass spectrometry, and tandem mass spectrometry characterization. The
“NIST” mode generates metrics modeled after those published by Rudnick for the CPTAC consortium in 2009
(http://dx.doi.org/10.1074/mcp.M900223-MCP200). The R Statistical Environment generated robust principal component analysis and data
visualizations to guide evaluation of the data sets.
Results. Sixty-four participants contributed at least one data file, and forty-two uploaded at least eight experiments, yielding a large repository of raw data
files collected longitudinally within and across laboratories. Instruments included multiple architectures from AB SCIEX, Agilent, Bruker, Thermo,
and Waters. No effort was made to standardize LC-MS conditions between laboratories, and consequently LC-MS/MS gradients ranged from 22 to
160 minutes, and MS/MS acquisition ranged from less than 100 to nearly 30,000 tandem mass spectra per experiment. As a result, the number of
identifications produced from data sets also ranged widely (see bottom image). Low-concentration bovine proteins were frequently identified in
addition to the anticipated six proteins, including superoxide dismutase and alpha-S2-casein. PCA based on identification-independent quality
metrics visualized all of these participants in a single plane and demonstrated that for most participants, submitted experiments clustered together.
In some cases, however, the LC-MS/MS experiments were broadly dispersed, revealing significant variation in the data. These dispersions were
evaluated in light of the survey results provided by participants, reporting major tuning and calibration events along with method changes.
Once a day, 5
Once a month, 1
Once a week, 4
Once every few
days, 6
Other, 5
Several times per
day, 15
How Often Do You Perform Quality Control Tests?
Figure 3. Pie chart summarizing participant responses
regarding frequency of QC analyses. 72% of the participants used
a single protein digest for quality control. 85% used data
dependent acquisition to collect QC data.
Figure 6. Dispersion values
describe how tightly clustered the
experiments are for each participant.
The metrics from QuaMeter were
normalized based on robust variance
estimation, and deviation for a given file
was defined as the distance of its
metrics vector from the mean vector for
all files from that participant. A low
deviation suggests that most data
produced by a participant conformed
well with others for that participant. The
IDFree and identification-dependent
metrics from QuaMeter, however,
produced different rankings of
participants. The legend is the same as
in Figure 5.
Figure 2. Data flow employed peer-
reviewed tools plus two commercial
packages. Waters exports to mzML
proved to be incompatible with both
QuaMeter and MyriMatch.
The ‘Median’ Laboratory participating in the study: • Uploaded data 8 times during the 9 month test period.
• Injected 200 femtomoles of the 6-protein digest per analysis.
• Used a 2-5 year old mass spectrometer - typically an Orbitrap type instrument.
• Used a commercial column with 75 µm inner diameter, 15 cm length and 3 µm particle size.
• Separated peptides using a 50 min gradient and a flow rate of 300 nL/min.
• Used a trap/pre-column
• Ran quality controls several times per day.
• Employed an operator with 6-10 years of experience.
Data Reduction Using Principal Components Analysis
Figure 5. Raw data from the participants who
uploaded at least 8 LC-MS/MS experiments were
included in robust principal components analysis
(PCA). This technique reduced the 40 QuaMeter
"IDFree" metrics (some of which correlate strongly
with each other) to a set of uncorrelated components.
The first two components (plotted) account for 50% of
the variance among the files. For some participants,
the data were invariant in the first two components but
considerably more so in the other components. The
QuaMeter IDFree metrics do not take derived
identifications into account, allowing their rapid
computation directly from raw data.
Comp.2
Com
p.1
-10
-5
0
5
-10 -5 0 5
@
@
@
@
@ @
@
@
@
@ @
@
@ @
@ @
A
A
A
A A A
A
A A
B
B
C
C
C C
C
C
C
C
C
D
D D
D
D
D
D D
D
E
E
E
E
E E
E E E
F
F
F
F
F
F F F
F
G
G
G
G
G G
G
G
G
G
H
H
H H
H
H H
H
I I I
I
I
I
I I
I
J
J J
J J J
J J
J
K
K
K K K
K
K
K
K
L L
L L
L
L
L L
L
L
L
L
M
M
M
M
N
N
N
N
N
N N
N
N
N O
O
O
O
P
P
P
P
P
P P
P
P
Q
Q
Q
Q
Q
Q
Q
Q R
R R
R
R R
R
R R
S
S S
S
S
S
S S
T
T
T
T
T
T
T
T
T
U U
U
U
U
V
V V
V
V
V
V V
V
W
W W
W W
W W
W
W
X
X
X
Y Y
Y
Z Z
Z Z
Z
Z
Z
Z
Z a
a
a
a a a
a a a
b
b
b
c
c
c
c
c
c
c c
d
d
d
d
d
d
d
d
d
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e e
e
e
f
g g
g g
g g
g
g
h h
h
h
h
h
h
h h
i
i
i
i
i
i
i
127094 150039 156516 202862 244614 249451 285495 305057 324601 340305 351993 353717 360955 364386 374072 378803 500565 503295 514465 529726 531515 542341 554062 623424 624176 628705 668931 670870
685497 686321 696216 698174 712609 719674 725094 757813 767982 773968 774709 776353 781603 782674 784265 784361 817229 826082 837789 870711 873353 874338 904417 914061 924502 948259 962210 971316
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
a
b
c
d
e
f
g
h
i
ID free metrics
participants
media
n d
evia
tion
0.7
0.8
0.9
767982870711696216712609542341773968353717249451202862324601624176374072781603244614360955776353784361725094837789531515529726686321962210670870668931340305784265914061364386698174127094774709874338623424514465628705948259500565378803685497904417503295
H
P
DF
I
L
K NG O
CU A
@ M S
EJ
Q
T
B R
ID dependent matrix
participants
media
n d
evia
tion
0.6
0.7
0.8
0.9
249451324601686321374072914061962210712609668931698174948259670870784361542341244614202862774709340305353717870711531515696216767982685497773968364386529726725094503295784265127094904417514465628705360955378803781603624176874338837789500565776353623424
C
S U
F
@E
T
AN
J
PD
H B I
G M
R
L
Q O
K
A B Instrumentation Utilized in the Study
Figure 4. Summary of LC (left chart) and MS (right chart) instrumentation utilized by
survey participants. Dark and light colors indicate the number of participants that began and
completed the QC study
0
50
100
150
200
Number of distinct peptides identified from top twelve bovine proteins
0
5
10
Number of top twelve bovine proteins identified
Acknowledgements.
The PRG2012 would like to acknowledge Michrom Bioresources for
contributing the 6 protein mix and Bioproximity, LLC for contributing data
storage and transfer capabilities. This work was supported by ABRF.
Mar May Jul Sep Nov Jan
Raw Data Collection for 56 Labs
Data upload Time Stamp
Pa
rtic
ipa
nts
Figure 1. Data collection for each participant was evenly
spread across the study period.