Daniel Cuthbertson Agilent Technologies Denver, Colorado · A = 0.1% formic acid in water B = 0.1%...
Transcript of Daniel Cuthbertson Agilent Technologies Denver, Colorado · A = 0.1% formic acid in water B = 0.1%...
Biology is Transitioning Into the Era of Big Data
• Agilent Mass Spectrometry (MS)
technology can provide information on
hundreds and potentially thousands of
biomolecules in a single analysis
• Multivariate statistics are required to mine
the data for useful correlations that can
be interpreted in a biological context
• Use your statistical plan to influence your
experimental design in the era of Big
Data
• Agilent Mass Profiler Professional and
Pathway Architect provides you with a
MS centric multivariate statistics platform
that provides biological context
Genomics Transcriptomics Proteomics Metabolomics
Genes mRNA Proteins Metabolites
Prevailing Paradigm for Biological Information
Flow
April 2013 3
Agilent Software MPP and Pathway Architect:
Changing Data Into A Pathway Visualization
Start here Finish here
Work Flow To Convert Data To A Pathway
Visualization
Acquire Data
• Analyze Metabolomics or Proteomics samples
• Use GC/MS or LC/MS to analyze samples
Analyze Data
• Mine data using either MassHunter or Spectrum Mill
• Analyze in Mass Profiler Professional
Identify Compounds
• Use Mass Profiler Professional’s ID Browser
• Search METLIN or Fiehn to Annotate Metabolites
Pathway Analysis
• Select data and interpretation for pathway mapping
• Specify species, select pathway database and go
MassHunter Profinder: Achieving High Data
Fidelity is Crucial for Multivariate Statistics
Challenges:
• Incomplete peak separation
• Isomeric and Isobaric Compounds
• Unresolved peaks contributing to:
– False peak detection
– Missing values
– Wasted Time
– Identification Errors
– False Biomarker Discovery
The MassHunter Profinder Solution:
Key improvements
• Batch centric untargeted and targeted feature extraction
• Designed specifically for metabolomics and differential analysis users.
• Recursive analysis in a single program.
• Compound group centric: Easy manual review and editing
• It’s FREE!
Batch Molecular Feature Extraction Process
• Related isotopes, adducts and dimers are grouped into a single compound. -> Reduces Noise
• Profinder uniquely aligns the compounds, builds a consensus library which enables recursive, re-extraction of the batch data.
The Three Primary Workflows in Profinder
1. Batch Molecular Feature Extraction
– Reduces False Positives
2. Batch Recursive Feature Extraction
– Reduces False Negatives, Allows Editing
3. Batch Targeted Feature Extraction
– Uses database targets, Allows Editing
Bakers Yeast is an Ideal Model Organism For
Studying Pathways
• Saccharomyces cerevisiae
is an extensively used
model organism.
• Biochemisty and pathways
extensively studied.
• Fully sequenced genome.
• Ideal for “Multi-Omics”
studies with the goal of
facilitating research for
other organisms.
Calcinuerin Inhibitors Were be Used to Study
Pathways Related to Immunosuppression
Goal: Determine additional metabolites, proteins and
pathways affected by the drug treatment
Cyclosporin A FK-506
Wild Type
BJ5459
Calcium
Control Cyclosporin A
FK-506
CaCl2 200mM
Drug Treatment 4 ug/mL
Metabolomics N=9 each condition
Proteomics N=4 each condition
ESI /APCI-
Positive
ESI-
Negative
Mass Profiler
Professional
Agilent 6530 QTOF and 1260 series HPLC is a
Robust Choice for Metabolomics
• High femtogram-level sensitivity
• Better than 1-ppm MS mass accuracy
• Better than 3-ppm MS/MS mass accuracy;
• Mass resolution (resolving power) of 20,000 -- not dependent on spectral acquisition rate
• Fast data acquisition ( = 10 MS/MS spectra/sec) compatible with UHPLC liquid chromatography
• Broad mass range from m/z 25 to 20,000.
FK1.d
Treatment Positive Negative
WT 1237 810
Ca Control 1311 968
FK-506 1207 766
Cyclosporin A 1223 835
Total 2144 1372
Mass Profiler Professional (MPP):
Mass Spec Centric Data Analysis Designed for Mass Spectrometry data from
multiple platforms
Import, store, and visualize
Agilent LC/MS TOF, QTOF, and QQQ
Agilent GC/MS Quad, QQQ, and QTOF
Agilent ICP/MS
Agilent CE/MS
Generic file format import
Data Filtering, Multivariate Statistical Tools and Class Prediction
ID Browser for compound identification
Pathway Architect for biological contextualization
Customizable with R and Python Editors
2/13/2014 19
Correlation-Covariance Plot Can Help Uncover the
Most Important Features in PCA Clusters
Name Formula Score Mass
Coenzyme A C21 H36 N7 O16 P3 S 97.12 767.1163
Ethyl 3-hydroxyoctanoate O-[glucosyl-(1->6)-glucoside] C22 H40 O13 89.27 512.2465
Pandamarilactone 31 C19 H25 N O4 80.26 331.1788
Chenodeoxycholic acid sulfate C24 H40 O7 S 66.27 472.2532
Glycinoeclepin B C31 H42 O9 64.55 558.288
Lucidenic acid D2 C29 H38 O8 61.94 514.2616
'(±)-2',4',5,7-Tetrahydroxy-3',8-diprenylisoflavanone' C25 H28 O6 54.47 424.193
Biocytin C16 H28 N4 O4 S 44.18 372.187
C55 H48 N8 O28 S3 91.97 1364.175
C34 H2 N2 O19 S 95.09 773.8964
C30 H40 N2 O18 87.84 716.2264
• Selected the 23 with
the Highest P-Cor and
P-Cov scores for PC1
• Top 10 shown below
Increasing Annotation
Confidence:
1.Database matching using
accurate mass measurement
2.Database matching with isotope
pattern matching
3.Database matching with isotope
pattern matching and retention
time
4.Accurate Mass and MS/MS
library matching
5.MS/MS library and retention time
matching
Co
nfid
ence
Compound Annotation is a Critical Challenge in
Metabolomics
• Accurate Mass, Retention Time and
MS/MS libraries can be searched via
Agilent Personal Compound
Database and Library and IDBrowser
• Metlin Library has over 64000
metabolites and roughly ~8000 have
MS/MS spectra.
• Molecular Structure Correlator
software can assist in MS/MS
structural confirmation.
Wild Type
BJ5459
Calcium
Control Cyclosporin A
FK-506
CaCl2 200mM
Drug Treatment 4 ug/mL
Metabolomics N=9 each condition
Proteomics N=4 each condition
ESI /APCI-
Positive
ESI-
Negative
Mass Profiler
Professional
Agilent 6550 and Nano-Flow Chip Cube Bring Enhance
Sensitivity for Targeted or Shotgun Proteomics
• High attogram to low femtogram
sensitivity
• Sub ppm mass accuracy (MS)
• Scan Speeds up to 50 Spectra/s
while maintaining 40k resolving
power
• 5 orders of magnitude dynamic
range
• Low injection volumes and nano-
flow for enhanced sensitivity for
proteomics applications
1200 HPLC-Chip Method: 120 min Gradient
Parameter Settings
HPLC-Chip Polaris C-18A, 3 µm, 75 µm x 150 mm analytical and 500 nL enrichment column;
Gradient delay reduction: On; IFV = 4 µL; to enrichment at 135 min
Autosampler Injection volume = 2 µL; Autosampler temp = 4 ºC; needle wash in flushport
(methanol:water 50:50) for 10 seconds
Mobile phase
(both pumps) A = 0.1% formic acid in water
B = 0.1% formic acid in 90:10 acetonitrile:water
Cap pump (loading) 2 µL/min of 3% B
Nanopump Flow rate = 300 nL/min
Gradient: Time (min) %B
0 3
90 25
120 60
125 90
130 90
130.1 3
Stop time 150 min
6550 Q-TOF Method
Parameter Settings
Ion Mode positive, 1700 2GHz mode
Source conditions Drying gas 250ºC at 12 L/min; Vcap = 2000 V
MS 275-1700, 8 spectra/
MS/MS 50-1700, 3 spectra/sec
Precursor Narrow (~1.3 isolation width); Max precursors = 20; Threshold: 1000
(Abs) and 0.001 (Rel); Active Exclusion: After 1 spectra, release after
0.25 min; Abundance based ON, Target 25k, MSMS accumulation
time limit ON, Time limit OFF, Purity stringency 100%, cutoff 30%;
Peptide isotope model; Sort precursors by abundance only, z = 2, 3,
>3
Collision energy Used equation based on m/z: for z=2, (m/z* 0.031)+1; for z=>3,
(m/z*0.036) – 4.8.
Spectrum Mill Workflow For MS/MS Data
March 2013 30
File_1.d
File_2.d
File_3.d
.
.
.
File_n.d
Filtered
MS/MS
spectra
Database
search
Low
scoring
spectra
Medium
scoring
spectra
High scoring
spectra Extraction
Raw
MS/MS
spectra
Spectrum Mill Workflow For MS/MS: Iterative
Approach to Reducing Complex Datasets
March 2013 31
Manually validate
Autovalidate Create .res file
containing list of all
identified proteins
Validated
matches
spectra
not
validated
Low
scoring
spectra
Medium
scoring
spectra
High scoring
spectra
Iterative database searches:
• in homology mode against previous
protein hits
• in identity or homology mode against a
different database
Remaining
MS/MS
spectra
Database
search
Spectrum Mill Protein Database Search Conditions
Search engine: Agilent Spectrum Mill
Fixed modification: carbamidomethylation of Cys
Variable modification: oxidation of Met, deamidation (N)
Extraction: no merge, MS Noise TH=200
Database: SwissProt human database (April 2011)
Mass tolerances: 20 ppm precursor, 50 ppm product ion
Other search settings: Agilent Q-TOF defaults
Validation (reporting): auto score adjustment to 1% FDR
Label-free Protein Discovery Results for Yeast
Lysates
April 2013 33
• Protein database search in Spectrum Mill
• Protein-protein comparison in Spectrum Mill groups proteins across the entire set
• Color coding = abundance (based on EIC of peptides assigned to the protein)
• Export results to MPP
Protein
Group
CY CA FK WT
Injection
Treatments:
CY = Cyclosporin A treated
CA = Calcium control
FK = FK506 treated
WT = wild type
An ANOVA Analysis Was Used to Determine
Peptides Responding to Drug Treatment
Blue = Significant
Difference in Post-hoc
Comparison
Orange = No Significant
Difference in Post-hoc
Comparison
Significance Cut-Off= 0.05
142 Compounds Pass
Calcium Vector Introduces Changes to the
Metabolic Profile Versus Wild-Type
204 Compounds had a P-Value of Less Than 0.1 when comparing Calcium
Treated Groups and Wild Type.
Venn Diagrams Were Used to Determine 93
Metabolites that Uniquely Respond to Drug
Treatment in Positive Polarity
Protein p (corr) Raw FC vs Control Log FC vs Control
Mitochondrial import receptor subunit TOM5 5.85E-12 254400.39 17.956741
Ubiquitin-conjugating enzyme E2 13 0.04925708 95805.26 16.547817
Probable E3 ubiquitin-protein ligase HUL5 0.08858416 54262.176 15.727659
Damage response protein 1 6.16E-12 45124.137 15.461612
Protein transport protein sec1 0.055262066 31862.016 14.95955
Phosphatidylinositol transfer protein SFH5 1.03E-09 11778.65 13.523887
C-5 sterol desaturase 0.043293692 4570.4185 12.158111
Replication factor A protein 3 0.06183812 2928.936 11.516161
Probable 1-acyl-sn-glycerol-3-phosphate
acyltransferase 0.06327902 2507.4373 11.291998
ER membrane protein complex subunit 2 0.06327902 2471.677 11.271275
Pathway Architect 12.5:
Canonical Pathway Data Mapping and Visualization
Browse, filter, and search
Analyze one or two types of –omic
data
Supports biological pathways from
publicly available databases
•WikiPathways
•BioCyc
Supported formats
•BioPAX 3 – Pathway Commons,
Reactome, NCI Nature Pathway
•GPML – PathVisio –custom drawing
Export compound list from pathways
Easy Mining of Complex Pathways for Biological Understanding
Central Carbon Metabolism
Agilent-BridgeDB: Enhanced Metabolite and
Protein Mapping
Metabolites Identifiers – more coverage •KEGG
•MetaCyc
•PubChem
•LMP
•HMDB
•ChEBI
•CAS
Proteins Identifiers: •Swiss-Prot
•UniProt
•UniProt/TrEMBL
Genes Identifiers : •Entrez Gene, GenBank, Ensembl
•EC Number, RefSeq, UniGene, HUGO
•HGNC, EMBL
Resolve the Mapping Problem Between Databases
Wikipathways
WikiPathways is an open, collaborative platform
• Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo C, Pico AR.
(2011) WikiPathways: building research communities on biological pathways. NAR
doi: 10.1093/nar/gkr1074
• Number of Organisms: 13
• Public pathways: 1780, Private pathways: 13
• Analysis Collection – Reviewed manually at Evelo lab
• GenMAPP– Reviewed by Pico/Conklin at UCSF
• Reactome – Reviewed by Reactome consortium
Agilent Pathway Architect downloads the database from
Wikipathways site
BioCyc Pathways
Produced by SRI under the direction of Peter Karp
BioCyc content
• Number of species – 2037
•Curated and computationally derived pathways
• Tier 1 – literature assembled and manually reviewed
• Tier 2 – computationally generated with moderate review
• Tier 3 – computationally generated no review
Pathway Architect downloads BioCyc database from Agilent
server
Pathway Source that has a Large and Deep Species Coverage
Experimental Design
Data Measurement
Statistical Analysis
Genes
Proteins
Metabolites
Biological Relevance /
Network Expansion
Pathway Analysis Directed Experiment
Enabling hypothesis-driven experimental design by incorporating prior
biological knowledge from multiple measurement technologies
Pathway Directed Experiment: Target Protein List
Is Exported To Skyline
Copy protein accession numbers from Pathway Architect
Generate peptide lists and MRM transitions for QQQ MRM using:
• Spectrum Mill results as a library in Skyline
• MRM Atlas for yeast
April 2013 43