Daniel Cuthbertson Agilent Technologies Denver, Colorado · A = 0.1% formic acid in water B = 0.1%...

44
Daniel Cuthbertson Agilent Technologies Denver, Colorado

Transcript of Daniel Cuthbertson Agilent Technologies Denver, Colorado · A = 0.1% formic acid in water B = 0.1%...

Daniel Cuthbertson

Agilent Technologies

Denver, Colorado

Biology is Transitioning Into the Era of Big Data

• Agilent Mass Spectrometry (MS)

technology can provide information on

hundreds and potentially thousands of

biomolecules in a single analysis

• Multivariate statistics are required to mine

the data for useful correlations that can

be interpreted in a biological context

• Use your statistical plan to influence your

experimental design in the era of Big

Data

• Agilent Mass Profiler Professional and

Pathway Architect provides you with a

MS centric multivariate statistics platform

that provides biological context

Genomics Transcriptomics Proteomics Metabolomics

Genes mRNA Proteins Metabolites

Prevailing Paradigm for Biological Information

Flow

April 2013 3

Agilent Software MPP and Pathway Architect:

Changing Data Into A Pathway Visualization

Start here Finish here

Work Flow To Convert Data To A Pathway

Visualization

Acquire Data

• Analyze Metabolomics or Proteomics samples

• Use GC/MS or LC/MS to analyze samples

Analyze Data

• Mine data using either MassHunter or Spectrum Mill

• Analyze in Mass Profiler Professional

Identify Compounds

• Use Mass Profiler Professional’s ID Browser

• Search METLIN or Fiehn to Annotate Metabolites

Pathway Analysis

• Select data and interpretation for pathway mapping

• Specify species, select pathway database and go

MassHunter Profinder: Achieving High Data

Fidelity is Crucial for Multivariate Statistics

Challenges:

• Incomplete peak separation

• Isomeric and Isobaric Compounds

• Unresolved peaks contributing to:

– False peak detection

– Missing values

– Wasted Time

– Identification Errors

– False Biomarker Discovery

The MassHunter Profinder Solution:

Key improvements

• Batch centric untargeted and targeted feature extraction

• Designed specifically for metabolomics and differential analysis users.

• Recursive analysis in a single program.

• Compound group centric: Easy manual review and editing

• It’s FREE!

Batch Molecular Feature Extraction Process

• Related isotopes, adducts and dimers are grouped into a single compound. -> Reduces Noise

• Profinder uniquely aligns the compounds, builds a consensus library which enables recursive, re-extraction of the batch data.

The Three Primary Workflows in Profinder

1. Batch Molecular Feature Extraction

– Reduces False Positives

2. Batch Recursive Feature Extraction

– Reduces False Negatives, Allows Editing

3. Batch Targeted Feature Extraction

– Uses database targets, Allows Editing

Profinder Interface :

Compound Centric Visualization and Editing

Bakers Yeast is an Ideal Model Organism For

Studying Pathways

• Saccharomyces cerevisiae

is an extensively used

model organism.

• Biochemisty and pathways

extensively studied.

• Fully sequenced genome.

• Ideal for “Multi-Omics”

studies with the goal of

facilitating research for

other organisms.

Calcinuerin Inhibitors Were be Used to Study

Pathways Related to Immunosuppression

Goal: Determine additional metabolites, proteins and

pathways affected by the drug treatment

Cyclosporin A FK-506

Wild Type

BJ5459

Calcium

Control Cyclosporin A

FK-506

CaCl2 200mM

Drug Treatment 4 ug/mL

Metabolomics N=9 each condition

Proteomics N=4 each condition

ESI /APCI-

Positive

ESI-

Negative

Mass Profiler

Professional

Agilent 6530 QTOF and 1260 series HPLC is a

Robust Choice for Metabolomics

• High femtogram-level sensitivity

• Better than 1-ppm MS mass accuracy

• Better than 3-ppm MS/MS mass accuracy;

• Mass resolution (resolving power) of 20,000 -- not dependent on spectral acquisition rate

• Fast data acquisition ( = 10 MS/MS spectra/sec) compatible with UHPLC liquid chromatography

• Broad mass range from m/z 25 to 20,000.

Metabolomics Data Acquired using Accurate Mass

Retention Time (AMRT) Methodology

Consistency in MS Acquisition Methodology is Key

For Reproducible Results Between Experiments

FK1.d

Treatment Positive Negative

WT 1237 810

Ca Control 1311 968

FK-506 1207 766

Cyclosporin A 1223 835

Total 2144 1372

Mass Profiler Professional (MPP):

Mass Spec Centric Data Analysis Designed for Mass Spectrometry data from

multiple platforms

Import, store, and visualize

Agilent LC/MS TOF, QTOF, and QQQ

Agilent GC/MS Quad, QQQ, and QTOF

Agilent ICP/MS

Agilent CE/MS

Generic file format import

Data Filtering, Multivariate Statistical Tools and Class Prediction

ID Browser for compound identification

Pathway Architect for biological contextualization

Customizable with R and Python Editors

2/13/2014 19

Data Reduction is an Essential Part of Any

“Omics” Experiment

All Entities

Data Reduction is an Essential Part of Any

“Omics” Experiment

Responds to Drug Treatment

Correlation-Covariance Plot Can Help Uncover the

Most Important Features in PCA Clusters

Name Formula Score Mass

Coenzyme A C21 H36 N7 O16 P3 S 97.12 767.1163

Ethyl 3-hydroxyoctanoate O-[glucosyl-(1->6)-glucoside] C22 H40 O13 89.27 512.2465

Pandamarilactone 31 C19 H25 N O4 80.26 331.1788

Chenodeoxycholic acid sulfate C24 H40 O7 S 66.27 472.2532

Glycinoeclepin B C31 H42 O9 64.55 558.288

Lucidenic acid D2 C29 H38 O8 61.94 514.2616

'(±)-2',4',5,7-Tetrahydroxy-3',8-diprenylisoflavanone' C25 H28 O6 54.47 424.193

Biocytin C16 H28 N4 O4 S 44.18 372.187

C55 H48 N8 O28 S3 91.97 1364.175

C34 H2 N2 O19 S 95.09 773.8964

C30 H40 N2 O18 87.84 716.2264

• Selected the 23 with

the Highest P-Cor and

P-Cov scores for PC1

• Top 10 shown below

Increasing Annotation

Confidence:

1.Database matching using

accurate mass measurement

2.Database matching with isotope

pattern matching

3.Database matching with isotope

pattern matching and retention

time

4.Accurate Mass and MS/MS

library matching

5.MS/MS library and retention time

matching

Co

nfid

ence

Compound Annotation is a Critical Challenge in

Metabolomics

• Accurate Mass, Retention Time and

MS/MS libraries can be searched via

Agilent Personal Compound

Database and Library and IDBrowser

• Metlin Library has over 64000

metabolites and roughly ~8000 have

MS/MS spectra.

• Molecular Structure Correlator

software can assist in MS/MS

structural confirmation.

Wild Type

BJ5459

Calcium

Control Cyclosporin A

FK-506

CaCl2 200mM

Drug Treatment 4 ug/mL

Metabolomics N=9 each condition

Proteomics N=4 each condition

ESI /APCI-

Positive

ESI-

Negative

Mass Profiler

Professional

Agilent 6550 and Nano-Flow Chip Cube Bring Enhance

Sensitivity for Targeted or Shotgun Proteomics

• High attogram to low femtogram

sensitivity

• Sub ppm mass accuracy (MS)

• Scan Speeds up to 50 Spectra/s

while maintaining 40k resolving

power

• 5 orders of magnitude dynamic

range

• Low injection volumes and nano-

flow for enhanced sensitivity for

proteomics applications

1200 HPLC-Chip Method: 120 min Gradient

Parameter Settings

HPLC-Chip Polaris C-18A, 3 µm, 75 µm x 150 mm analytical and 500 nL enrichment column;

Gradient delay reduction: On; IFV = 4 µL; to enrichment at 135 min

Autosampler Injection volume = 2 µL; Autosampler temp = 4 ºC; needle wash in flushport

(methanol:water 50:50) for 10 seconds

Mobile phase

(both pumps) A = 0.1% formic acid in water

B = 0.1% formic acid in 90:10 acetonitrile:water

Cap pump (loading) 2 µL/min of 3% B

Nanopump Flow rate = 300 nL/min

Gradient: Time (min) %B

0 3

90 25

120 60

125 90

130 90

130.1 3

Stop time 150 min

6550 Q-TOF Method

Parameter Settings

Ion Mode positive, 1700 2GHz mode

Source conditions Drying gas 250ºC at 12 L/min; Vcap = 2000 V

MS 275-1700, 8 spectra/

MS/MS 50-1700, 3 spectra/sec

Precursor Narrow (~1.3 isolation width); Max precursors = 20; Threshold: 1000

(Abs) and 0.001 (Rel); Active Exclusion: After 1 spectra, release after

0.25 min; Abundance based ON, Target 25k, MSMS accumulation

time limit ON, Time limit OFF, Purity stringency 100%, cutoff 30%;

Peptide isotope model; Sort precursors by abundance only, z = 2, 3,

>3

Collision energy Used equation based on m/z: for z=2, (m/z* 0.031)+1; for z=>3,

(m/z*0.036) – 4.8.

Spectrum Mill B.04.00

29 March 2013

Spectrum Mill Workflow For MS/MS Data

March 2013 30

File_1.d

File_2.d

File_3.d

.

.

.

File_n.d

Filtered

MS/MS

spectra

Database

search

Low

scoring

spectra

Medium

scoring

spectra

High scoring

spectra Extraction

Raw

MS/MS

spectra

Spectrum Mill Workflow For MS/MS: Iterative

Approach to Reducing Complex Datasets

March 2013 31

Manually validate

Autovalidate Create .res file

containing list of all

identified proteins

Validated

matches

spectra

not

validated

Low

scoring

spectra

Medium

scoring

spectra

High scoring

spectra

Iterative database searches:

• in homology mode against previous

protein hits

• in identity or homology mode against a

different database

Remaining

MS/MS

spectra

Database

search

Spectrum Mill Protein Database Search Conditions

Search engine: Agilent Spectrum Mill

Fixed modification: carbamidomethylation of Cys

Variable modification: oxidation of Met, deamidation (N)

Extraction: no merge, MS Noise TH=200

Database: SwissProt human database (April 2011)

Mass tolerances: 20 ppm precursor, 50 ppm product ion

Other search settings: Agilent Q-TOF defaults

Validation (reporting): auto score adjustment to 1% FDR

Label-free Protein Discovery Results for Yeast

Lysates

April 2013 33

• Protein database search in Spectrum Mill

• Protein-protein comparison in Spectrum Mill groups proteins across the entire set

• Color coding = abundance (based on EIC of peptides assigned to the protein)

• Export results to MPP

Protein

Group

CY CA FK WT

Injection

Treatments:

CY = Cyclosporin A treated

CA = Calcium control

FK = FK506 treated

WT = wild type

An ANOVA Analysis Was Used to Determine

Peptides Responding to Drug Treatment

Blue = Significant

Difference in Post-hoc

Comparison

Orange = No Significant

Difference in Post-hoc

Comparison

Significance Cut-Off= 0.05

142 Compounds Pass

Calcium Vector Introduces Changes to the

Metabolic Profile Versus Wild-Type

204 Compounds had a P-Value of Less Than 0.1 when comparing Calcium

Treated Groups and Wild Type.

Venn Diagrams Were Used to Determine 93

Metabolites that Uniquely Respond to Drug

Treatment in Positive Polarity

Protein p (corr) Raw FC vs Control Log FC vs Control

Mitochondrial import receptor subunit TOM5 5.85E-12 254400.39 17.956741

Ubiquitin-conjugating enzyme E2 13 0.04925708 95805.26 16.547817

Probable E3 ubiquitin-protein ligase HUL5 0.08858416 54262.176 15.727659

Damage response protein 1 6.16E-12 45124.137 15.461612

Protein transport protein sec1 0.055262066 31862.016 14.95955

Phosphatidylinositol transfer protein SFH5 1.03E-09 11778.65 13.523887

C-5 sterol desaturase 0.043293692 4570.4185 12.158111

Replication factor A protein 3 0.06183812 2928.936 11.516161

Probable 1-acyl-sn-glycerol-3-phosphate

acyltransferase 0.06327902 2507.4373 11.291998

ER membrane protein complex subunit 2 0.06327902 2471.677 11.271275

Hierarchical Cluster Can Help Uncover

Relationships Between Compounds or Protiens

Subtree (1)

Pathway Architect 12.5:

Canonical Pathway Data Mapping and Visualization

Browse, filter, and search

Analyze one or two types of –omic

data

Supports biological pathways from

publicly available databases

•WikiPathways

•BioCyc

Supported formats

•BioPAX 3 – Pathway Commons,

Reactome, NCI Nature Pathway

•GPML – PathVisio –custom drawing

Export compound list from pathways

Easy Mining of Complex Pathways for Biological Understanding

Central Carbon Metabolism

Agilent-BridgeDB: Enhanced Metabolite and

Protein Mapping

Metabolites Identifiers – more coverage •KEGG

•MetaCyc

•PubChem

•LMP

•HMDB

•ChEBI

•CAS

Proteins Identifiers: •Swiss-Prot

•UniProt

•UniProt/TrEMBL

Genes Identifiers : •Entrez Gene, GenBank, Ensembl

•EC Number, RefSeq, UniGene, HUGO

•HGNC, EMBL

Resolve the Mapping Problem Between Databases

Wikipathways

WikiPathways is an open, collaborative platform

• Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo C, Pico AR.

(2011) WikiPathways: building research communities on biological pathways. NAR

doi: 10.1093/nar/gkr1074

• Number of Organisms: 13

• Public pathways: 1780, Private pathways: 13

• Analysis Collection – Reviewed manually at Evelo lab

• GenMAPP– Reviewed by Pico/Conklin at UCSF

• Reactome – Reviewed by Reactome consortium

Agilent Pathway Architect downloads the database from

Wikipathways site

BioCyc Pathways

Produced by SRI under the direction of Peter Karp

BioCyc content

• Number of species – 2037

•Curated and computationally derived pathways

• Tier 1 – literature assembled and manually reviewed

• Tier 2 – computationally generated with moderate review

• Tier 3 – computationally generated no review

Pathway Architect downloads BioCyc database from Agilent

server

Pathway Source that has a Large and Deep Species Coverage

Experimental Design

Data Measurement

Statistical Analysis

Genes

Proteins

Metabolites

Biological Relevance /

Network Expansion

Pathway Analysis Directed Experiment

Enabling hypothesis-driven experimental design by incorporating prior

biological knowledge from multiple measurement technologies

Pathway Directed Experiment: Target Protein List

Is Exported To Skyline

Copy protein accession numbers from Pathway Architect

Generate peptide lists and MRM transitions for QQQ MRM using:

• Spectrum Mill results as a library in Skyline

• MRM Atlas for yeast

April 2013 43

Multi-omics Analysis Using Targeted Proteomics

(QQQ) and Metabolomics Data

April 2013 44

Thank you!

April 2013 45