Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

50
Data Repository and Coordination Center (DRCC) Eoin Fahy, Dawn Cotter, Manish Sud, Kenan Azam, Andrew Caldwell, Shankar Subramaniam University of California San Diego NIH Common Fund's Metabolomics Data Repository and Coordinating Center (supported by NIH grant, U01-DK097430)

Transcript of Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Page 1: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Data Repository and Coordination Center (DRCC)

Eoin Fahy, Dawn Cotter, Manish Sud, Kenan Azam,Andrew Caldwell, Shankar Subramaniam

University of California San Diego

NIH Common Fund's Metabolomics Data Repository and Coordinating Center (supported by NIH grant, U01-DK097430)

Page 2: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Metabolomics Workbench websitehttp://www.metabolomicsworkbench.org

Page 3: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Mayo, 116

ERCMRC, 135

UC Davis, 107

Florida, 69

Kentucky, 11

Michigan, 237

Non-RCMRC, 219

Studies in MW (Apr. 2018)

894 Studies total, 681 publicly available

Including 52 studiesFrom foreign institutes

Page 4: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Overview of DRCC Inputs and Outputs

Online data submission

MetadataTargeted data measurementsProtocols/Methods filesUntargeted data measurement files

Raw data (MS/NMR binary files)+ other pertinent files

DRCC Data Repository

FTP

Browsing/searching/statistical analysisvia web browser

Targeted data (named metabolites)

Untargeted data(Features, NMR binned increments)

Download targeted measurementsDownload untargeted measurements

Download mwTab file(Metadata/data)

Formats:Plain textJSON

Download Raw Data +(via FTP)

REST service

Page 5: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Metabolomics Data Repository: Overall StrategyExperimental data

Online study submission system

Named metabolite data/ smaller datasets

(paste into web forms)

Relational Database(Data/Metadata)

AnalysisToolbox

Untargeted data/larger datasets

(upload as files)

“Big data” file system:Mwtab files, Results files

Processed text files(smaller)Save to file system for data analysis

TransposeFix zero/null dataAverage technical reps.Apply filters

AnalysisToolbox

Data analysis

Metadata

Page 6: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

NIH Data Repository Portalhttp://metabolomicsworkbench.org/repository/index.php

Page 7: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Update on the DRCC data submission workflow

New context-specific metadata entry workflow which is dependent on a mandatory “subject type” pulldown menu.Choices such as human subject, animal subject, plants, cultured cells, etc. allows us to customize the downstream metadata options to show only those fields that are relevant to the subject type.This greatly reduces the number of fields in each section and presents a less daunting task to the submitter.

“Crowdsourcing” of the ~900 studies to date was used to generate controlled vocabulariesItems such as MS instrument, chromatography instrument, chromatography column, sample source, temperature, etc. are now presented as pulldown menus with options to add new items.

Auto-population of fields based on user input and profile Name/address/institution, Taxonomy id/species, etc.

Page 8: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Update on the DRCC data submission workflow (contd.)

Navigation improvementsLarge Chromatography and MS/NMR metadata input pages allow horizontal/vertical scrolling with column headings fixed at the top.

Consolidation of metadata fieldsA small number of confusing and irrelevant metadata items were removed from the workflow.

The updated DRCC submission workflow went public in mid-February.

Page 9: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Example: CHEAR studiesCustomized portals accessing the Metabolomics Workbench: Project source

Page 10: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

RefMet: A Reference list of Metabolite names

The main objective of RefMet is to provide a standardized reference nomenclature for both discrete metabolite structures and metabolite species identified by spectroscopic techniques in metabolomics experiments. This is an essential prerequisite for the ability to compare and contrast metabolite data across different experiments and studies. The use of identifiers such as PubChem compound id's and InChIKeys offers only a partial solution because these identifiers will vary depending on parameters such as the salt form and degree of stereochemical detail. In addition, many metabolite species, especially lipids, are not reported by MS methods as discrete structures but rather as isobaric mixtures (such as PC(34:1) and TG(54:2)). To this end, a list of over 170,000 names from a set of ~ 900 MS and NMR studies on the Metabolomics Workbench has been used to generate a highly curated analytical chemistry-centric list of common names for metabolite structures and isobaric species. Additionally, the vast majority of these names have been linked to a metabolite classification systemusing a combination of LIPID MAPS and ClassyFire classification methods.

Page 11: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

RefMet Metabolite Classification and indexing

Lipids

LIPID MAPSClassification

Indexing

Non-lipids

ClassyFireClassification

Uncurated classes

Curation,Indexing

RefMet database with indexed metabolite classification

Indexing of metabolite classes/subclasses facilitates logical ordering of data

Page 12: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

All studies on Metabolomics Workbench~900 experimental studies reporting ~170,000 metabolite species

~140,000 of these metabolite species were mapped to RefMet classification

Page 13: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Customized portals accessing the Metabolomics Workbench: Metabolite classesExample: Lipidomics studies on MW

~600 studies in MW have reported named lipids (excluding polyketides)>300 of those have >= 20 named lipids

Page 14: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

RefMet Metabolite Classification enables class-specific data analysis

Untargeted data(m/z_ret. time “features”)

General Statistical analysis

(Un)Targeted data(Named metabolites/species)

Map names to RefMetVia name/synonym table

RefMet database with indexed metabolite classification

General + class-specific Statistical analysis

Page 15: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Opportunity to compare MW statistical tool performance with published resultson clinical studies and key disease models

Study ST000608Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot diabetic and control samples for longitudinal studies (Pacific Northwest Labs)http://onlinelibrary.wiley.com/doi/10.1002/rcm.7808/abstract

Study ST000899Alterations in Lipid, Amino Acid, and Energy Metabolism Distinguish Crohn Disease from Ulcerative Colitis and Control Subjects by Serum Metabolomic Profiling (Vanderbilt)https://doi.org/10.1007/s11306-017-1311-y

Study ST000260Hexokinases link DJ-1 to the PINK1/parkin pathway (Parkinson’s) (National Institute on Aging)Cookson, MR et al. Molecular Neurodegeneration 2017 12:70https://doi.org/10.1186/s13024-017-0212-x

Study ST000075Systemic alterations in the metabolome of diabetic NOD mice delineate increased oxidative stress accompanied by reduced inflammation and hypertriglyceremia (UC Davis)https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451288/

Study ST000915-ST000917Biomarkers of NAFLD progression: a lipidomics approach to an epidemic (LIPID MAPS)https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4340319/

Page 16: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000260: Study on DJ-1 KO mice (Parkinson’s model)(National Institute on Aging)

Page 17: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Volcano Plot Analysis interface on MW

REPORTS: (a)Volcano Plot(b)Bubble plot of metabolite classes (c) Metabolite class enrichment graphs(d) Barplot of significant metabolites by class(e)Table of metabolite classes with mean p-values, fold-change(f)Table of individual metabolites with p-values, fold-change, metabolite classification

Page 18: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Volcano plot: pairwise comparison of 2 experimental conditions

Page 19: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000075: Comparing Diabetic and control mice

P-value on y axis

Classes order by classification index on x axis

Size of colored circles represents# of (significant) metabolites per classwith p-value and fold changeexceeding selected cutoff values

Size of gray circles represents# of all reportedmetabolites per class

Color of circles represents fold change value red:group2/group1 >1 (upregulated)green:group2/group1<1 (downregulated)

Mouse over bubble to view# of metabolites per class

Page 20: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000608: Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies

Serum data:Diabetes VsControl

Notice the difference in color for DAG and TAG (similar p-values)

Page 21: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Classes may contain both up-and down- regulated metabolitesi.e. mean fold-change per classmay be close to 1 (more yellow)

Page 22: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

In contrast to TAG, all significantDAGs are upregulated

Page 23: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000899:Crohn’s disease vs controls: Superimposed data for 4 analyses: C18/HILIC, +/- ion mode

LIPIDS NON-LIPIDS

Page 24: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Single analysis: C18 ,+ modeCrohn’s disease vs controls:

Page 25: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000899:Crohn’s disease vs controls

Metabolomics Workbench Volcano plot/class enrichment Publication: pathway enrichment

Page 26: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000899:Crohn’s disease vs controls

Metabolomics Workbench Volcano plot/Bargraph by class Publication: Bargraph by pathway

Page 27: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

The abundance of 240 biochemicals extracted from the brains of 4-month-old DJ-1 knockout and WT mice was measured. In addition to the polyols 1,5-anhydroglucitol and fructose, there was a significant decrease in dihydroxyacetone phosphate. A significant reduction in the levels of glucose, glucose 6-phosphate, and fructose 1,6-bisphosphate in the DJ-1 knockout brains was observed but only without multiple testing correction

ST000260: Study on DJ-1 knockout mice (Parkinson’s model)

Activation of the polyol pathway of glucose metabolism in the DJ-1 knockout mouse brain.

Page 28: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000260: Study on DJ-1 KO mice (Parkinson’s model)Volcano plot analysis with RefMet classfications

Page 29: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

ST000260: Study on DJ-1 mice (Parkinson’s model)Tabular display of Volcano plot analysis with RefMet classfications

Page 30: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Study ST000684: Macrophages from RAGE knockout mice: untreated vs addition of palmitate/oleate (U. Michigan)

RAGE: Receptor for Advanced Glycation End Products

Page 31: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

LIPID MAPS NAFLD Study: Steatosis vs NASH in blood

Page 32: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

LIPID MAPS NAFLD Study: Steatosis vs NASH in blood (ST000916)

Page 33: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Study ST000916: NAFLD Study: Steatosis vs NASH in blood: Tabular display of significant classes and individual metabolites

Page 34: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Comparison of Volcano plot and ANOVA on the same data (ST000075)

Volcano Plot

Colors represent mean fold change

ANOVA

Colors represent p-value (same as y-axis)

Page 35: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Summary: Univariate analysis linked to Metabolite Classification

Useful way to visualize changes in metabolite classes due to selected experimental conditions Degree of statistical significance (p-value) Degree and direction of fold-change Number of significant detected metabolites per class Number of total detected metabolites per class Indexing and alignment of similar metabolite classes

IMPLEMENTATION :Mapping of reported metabolites in MW studies to a standardized nomenclature (RefMet)Linking and indexing of RefMet metabolites to a classification systemDoesn’t require identification of exact structures: e.g. assignments from LCMS experiments such as PC(34:2), Cer(d36:1), Ile/Leu, Hexose etc. can still be classified

Volcano plot analysis and ANOVA analysis for studies with named metabolites on the MW

Page 36: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Metabolomics Tools:Load and analyze your own datasetModular, portable suite of statistical tools for metabolomics analysis R statistics-based approach

• Normalization and scaling• Bar graphs and Boxplots• Univariate Analysis• Multivariate Analysis• Clustering and Correlation• Feature Analysis

Ability to select and combine groups of experimental conditions (factors) Applicable to targeted and untargeted datasets Workflow enables classification of metabolite names via RefMet Classified datasets are then amenable to class-specific and pathway-specific analysis

Page 37: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Portable metabolomics analysis toolbox design

Metabolomics Workbench

REST serviceobtains RefMetclassification data

Portable analysis toolbox codebase

(R files, PHP, Javascript)+ configuration file

R statistics application

+Libraries

User interfaces Results

Page 38: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Metabolomics Tools:Load and analyze your own dataset

Page 39: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Statistical analysis tools on MW deliver powerful insights into differences among experimental groups

Page 40: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Data sharing with other institutions:MetabolomeXchange (http://metabolomexchange.org)

MetabolomeXchange is an outcome of the European-Commission-funded COSMOS project. Coordinated by EMBL-EBI, to set and promote community standards among data providers. MetabolomeXchange is now an independent consortium that will continue its work beyond the end of COSMOS.

Currently 681 MW studies

Page 41: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Data sharing with other institutions:EBI: Omics Discovery Index (http://www.ebi.ac.uk/Tools/omicsdi/)

Omics Discovery Index is an integrated and open source platform facilitating the access and dissemination of omics datasets. It provides a unique infrastructure to integrate datasets coming from multiple omics studies

Page 42: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Data sharing with other institutions:PubChem: Sharing the MW structure database

https://www.ncbi.nlm.nih.gov/pcsubstance/?term=MetabolomicsWorkbench

Page 43: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

@MetabolomicsWB Twitter account

Twitter Overview• Started in June 2017 to engage academics and industry

worldwide on social media• A useful method to disseminate information regarding

Metabolomics Workbench updates as well as Consortium events and opportunities

• The account has grown to 300 followers in ~10 months• “Live-tweeting” the 2017 UAB Metabolomics Symposium

and the 2017 Fall NIH Metabolomics Consortium meeting at UC Davis led to worthwhile engagement of academics in the field both in the U.S. and worldwide

Example Twitter engagement metrics from January 2018 – top tweets

Page 44: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

@MetabolomicsWB Twitter account

Twitter Engagement ApproachesTwitter allows individual tweets to be tagged with a hashtag – allowing for quick retrieval of tweets associated with a given topic. The hashtagselected for the 5th Annual UAB Metabolomics Workshop was #UABMetabolomics2017 –allowing twitter users to quickly find tweets associated with the topic and interact with all tweets produced by @MetabolomicsWB for the speaker and workshop sessions.

Tweet threading for message continuity of each speaker

All tweets tagged #UABMetabolomics2017

Page 45: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

@MetabolomicsWB Twitter account

Reach MetricsRepresentative speaker session tweet• Figures from slides showing key concepts• Multiple slides per tweet• Summary of content combined with images

Breadth of impression

Online Engagement

Tweeting at the event Non-attendee engagement

Page 46: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

@MetabolomicsWB Twitter account

Post-WorkshopFollowing the conclusion of the UAB Metabolomics workshop, UAB began compiling the speaker and workshop PDFs, editing video presentations, and creating presentation slide decks overlaid with speaker audio for upload to the UAB Metabolomics website. UAB and DRCC coordinated to share the links to each of these presentations to allow for researchers to download material.

Post-event recap•Sharing of audio and video (when recorded) from key speaker sessions

•Many presentations and workshop classes available for download as PDF; shared via Twitter for online users

•Awareness of Metabolomics Workbench due to workshop tweeting has increased followers

Find presentations at www.uab.edu/proteomics/metabolomics/workshop/workshop_july_2017.php

Page 47: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

Journal Partnership

• DRCC is in contact with multiple Journals regarding establishing the Metabolomics Workbench as a recommended repository

• Two MDPI journals – Data and Metabolites now recommend Metabolomics Workbench as a data repository

• The PLOS family of journals recommends the Metabolomics Workbench

• New subpage on the Workbench to highlight this partnership with links to the journal’s repository recommendations

Page 48: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

DRCC Summary/ ongoing efforts

Data repository currently contains 894 studies from over 150 different institutions

Online data submission process has numerous enhancements.

Updated the RefMet database of recommended metabolites to >11,000 records. Critical for comparative analysis and multi-omics efforts

New online analysis tools leverage metabolite classification to provide study overviews and gain rapid insights into differences among experimental groups

Outreach efforts to provide metabolite class-specific portals to the Metabolomics Workbench for the benefit of specialists in lipidomics, nucleic acids, sugars, alkaloids, etc.

Modular, portable suite of statistical tools for metabolomics analysis of user-uploaded (processed) data

Active collaborations with EBI on MetabolomeXchange and Omics Discovery Index websites. Additional exposure for MW studies

Page 49: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1
Page 50: Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1

NMR12%

ESI-MS67%

GC-MS21%

Analysis type