Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1
Transcript of Eoin Fahy Data Repository and Coordination Center Presentation · green:group2/group1
Data Repository and Coordination Center (DRCC)
Eoin Fahy, Dawn Cotter, Manish Sud, Kenan Azam,Andrew Caldwell, Shankar Subramaniam
University of California San Diego
NIH Common Fund's Metabolomics Data Repository and Coordinating Center (supported by NIH grant, U01-DK097430)
Metabolomics Workbench websitehttp://www.metabolomicsworkbench.org
Mayo, 116
ERCMRC, 135
UC Davis, 107
Florida, 69
Kentucky, 11
Michigan, 237
Non-RCMRC, 219
Studies in MW (Apr. 2018)
894 Studies total, 681 publicly available
Including 52 studiesFrom foreign institutes
Overview of DRCC Inputs and Outputs
Online data submission
MetadataTargeted data measurementsProtocols/Methods filesUntargeted data measurement files
Raw data (MS/NMR binary files)+ other pertinent files
DRCC Data Repository
FTP
Browsing/searching/statistical analysisvia web browser
Targeted data (named metabolites)
Untargeted data(Features, NMR binned increments)
Download targeted measurementsDownload untargeted measurements
Download mwTab file(Metadata/data)
Formats:Plain textJSON
Download Raw Data +(via FTP)
REST service
Metabolomics Data Repository: Overall StrategyExperimental data
Online study submission system
Named metabolite data/ smaller datasets
(paste into web forms)
Relational Database(Data/Metadata)
AnalysisToolbox
Untargeted data/larger datasets
(upload as files)
“Big data” file system:Mwtab files, Results files
Processed text files(smaller)Save to file system for data analysis
TransposeFix zero/null dataAverage technical reps.Apply filters
AnalysisToolbox
Data analysis
Metadata
NIH Data Repository Portalhttp://metabolomicsworkbench.org/repository/index.php
Update on the DRCC data submission workflow
New context-specific metadata entry workflow which is dependent on a mandatory “subject type” pulldown menu.Choices such as human subject, animal subject, plants, cultured cells, etc. allows us to customize the downstream metadata options to show only those fields that are relevant to the subject type.This greatly reduces the number of fields in each section and presents a less daunting task to the submitter.
“Crowdsourcing” of the ~900 studies to date was used to generate controlled vocabulariesItems such as MS instrument, chromatography instrument, chromatography column, sample source, temperature, etc. are now presented as pulldown menus with options to add new items.
Auto-population of fields based on user input and profile Name/address/institution, Taxonomy id/species, etc.
Update on the DRCC data submission workflow (contd.)
Navigation improvementsLarge Chromatography and MS/NMR metadata input pages allow horizontal/vertical scrolling with column headings fixed at the top.
Consolidation of metadata fieldsA small number of confusing and irrelevant metadata items were removed from the workflow.
The updated DRCC submission workflow went public in mid-February.
Example: CHEAR studiesCustomized portals accessing the Metabolomics Workbench: Project source
RefMet: A Reference list of Metabolite names
The main objective of RefMet is to provide a standardized reference nomenclature for both discrete metabolite structures and metabolite species identified by spectroscopic techniques in metabolomics experiments. This is an essential prerequisite for the ability to compare and contrast metabolite data across different experiments and studies. The use of identifiers such as PubChem compound id's and InChIKeys offers only a partial solution because these identifiers will vary depending on parameters such as the salt form and degree of stereochemical detail. In addition, many metabolite species, especially lipids, are not reported by MS methods as discrete structures but rather as isobaric mixtures (such as PC(34:1) and TG(54:2)). To this end, a list of over 170,000 names from a set of ~ 900 MS and NMR studies on the Metabolomics Workbench has been used to generate a highly curated analytical chemistry-centric list of common names for metabolite structures and isobaric species. Additionally, the vast majority of these names have been linked to a metabolite classification systemusing a combination of LIPID MAPS and ClassyFire classification methods.
RefMet Metabolite Classification and indexing
Lipids
LIPID MAPSClassification
Indexing
Non-lipids
ClassyFireClassification
Uncurated classes
Curation,Indexing
RefMet database with indexed metabolite classification
Indexing of metabolite classes/subclasses facilitates logical ordering of data
All studies on Metabolomics Workbench~900 experimental studies reporting ~170,000 metabolite species
~140,000 of these metabolite species were mapped to RefMet classification
Customized portals accessing the Metabolomics Workbench: Metabolite classesExample: Lipidomics studies on MW
~600 studies in MW have reported named lipids (excluding polyketides)>300 of those have >= 20 named lipids
RefMet Metabolite Classification enables class-specific data analysis
Untargeted data(m/z_ret. time “features”)
General Statistical analysis
(Un)Targeted data(Named metabolites/species)
Map names to RefMetVia name/synonym table
RefMet database with indexed metabolite classification
General + class-specific Statistical analysis
Opportunity to compare MW statistical tool performance with published resultson clinical studies and key disease models
Study ST000608Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot diabetic and control samples for longitudinal studies (Pacific Northwest Labs)http://onlinelibrary.wiley.com/doi/10.1002/rcm.7808/abstract
Study ST000899Alterations in Lipid, Amino Acid, and Energy Metabolism Distinguish Crohn Disease from Ulcerative Colitis and Control Subjects by Serum Metabolomic Profiling (Vanderbilt)https://doi.org/10.1007/s11306-017-1311-y
Study ST000260Hexokinases link DJ-1 to the PINK1/parkin pathway (Parkinson’s) (National Institute on Aging)Cookson, MR et al. Molecular Neurodegeneration 2017 12:70https://doi.org/10.1186/s13024-017-0212-x
Study ST000075Systemic alterations in the metabolome of diabetic NOD mice delineate increased oxidative stress accompanied by reduced inflammation and hypertriglyceremia (UC Davis)https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451288/
Study ST000915-ST000917Biomarkers of NAFLD progression: a lipidomics approach to an epidemic (LIPID MAPS)https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4340319/
ST000260: Study on DJ-1 KO mice (Parkinson’s model)(National Institute on Aging)
Volcano Plot Analysis interface on MW
REPORTS: (a)Volcano Plot(b)Bubble plot of metabolite classes (c) Metabolite class enrichment graphs(d) Barplot of significant metabolites by class(e)Table of metabolite classes with mean p-values, fold-change(f)Table of individual metabolites with p-values, fold-change, metabolite classification
Volcano plot: pairwise comparison of 2 experimental conditions
ST000075: Comparing Diabetic and control mice
P-value on y axis
Classes order by classification index on x axis
Size of colored circles represents# of (significant) metabolites per classwith p-value and fold changeexceeding selected cutoff values
Size of gray circles represents# of all reportedmetabolites per class
Color of circles represents fold change value red:group2/group1 >1 (upregulated)green:group2/group1<1 (downregulated)
Mouse over bubble to view# of metabolites per class
ST000608: Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies
Serum data:Diabetes VsControl
Notice the difference in color for DAG and TAG (similar p-values)
Classes may contain both up-and down- regulated metabolitesi.e. mean fold-change per classmay be close to 1 (more yellow)
In contrast to TAG, all significantDAGs are upregulated
ST000899:Crohn’s disease vs controls: Superimposed data for 4 analyses: C18/HILIC, +/- ion mode
LIPIDS NON-LIPIDS
Single analysis: C18 ,+ modeCrohn’s disease vs controls:
ST000899:Crohn’s disease vs controls
Metabolomics Workbench Volcano plot/class enrichment Publication: pathway enrichment
ST000899:Crohn’s disease vs controls
Metabolomics Workbench Volcano plot/Bargraph by class Publication: Bargraph by pathway
The abundance of 240 biochemicals extracted from the brains of 4-month-old DJ-1 knockout and WT mice was measured. In addition to the polyols 1,5-anhydroglucitol and fructose, there was a significant decrease in dihydroxyacetone phosphate. A significant reduction in the levels of glucose, glucose 6-phosphate, and fructose 1,6-bisphosphate in the DJ-1 knockout brains was observed but only without multiple testing correction
ST000260: Study on DJ-1 knockout mice (Parkinson’s model)
Activation of the polyol pathway of glucose metabolism in the DJ-1 knockout mouse brain.
ST000260: Study on DJ-1 KO mice (Parkinson’s model)Volcano plot analysis with RefMet classfications
ST000260: Study on DJ-1 mice (Parkinson’s model)Tabular display of Volcano plot analysis with RefMet classfications
Study ST000684: Macrophages from RAGE knockout mice: untreated vs addition of palmitate/oleate (U. Michigan)
RAGE: Receptor for Advanced Glycation End Products
LIPID MAPS NAFLD Study: Steatosis vs NASH in blood
LIPID MAPS NAFLD Study: Steatosis vs NASH in blood (ST000916)
Study ST000916: NAFLD Study: Steatosis vs NASH in blood: Tabular display of significant classes and individual metabolites
Comparison of Volcano plot and ANOVA on the same data (ST000075)
Volcano Plot
Colors represent mean fold change
ANOVA
Colors represent p-value (same as y-axis)
Summary: Univariate analysis linked to Metabolite Classification
Useful way to visualize changes in metabolite classes due to selected experimental conditions Degree of statistical significance (p-value) Degree and direction of fold-change Number of significant detected metabolites per class Number of total detected metabolites per class Indexing and alignment of similar metabolite classes
IMPLEMENTATION :Mapping of reported metabolites in MW studies to a standardized nomenclature (RefMet)Linking and indexing of RefMet metabolites to a classification systemDoesn’t require identification of exact structures: e.g. assignments from LCMS experiments such as PC(34:2), Cer(d36:1), Ile/Leu, Hexose etc. can still be classified
Volcano plot analysis and ANOVA analysis for studies with named metabolites on the MW
Metabolomics Tools:Load and analyze your own datasetModular, portable suite of statistical tools for metabolomics analysis R statistics-based approach
• Normalization and scaling• Bar graphs and Boxplots• Univariate Analysis• Multivariate Analysis• Clustering and Correlation• Feature Analysis
Ability to select and combine groups of experimental conditions (factors) Applicable to targeted and untargeted datasets Workflow enables classification of metabolite names via RefMet Classified datasets are then amenable to class-specific and pathway-specific analysis
Portable metabolomics analysis toolbox design
Metabolomics Workbench
REST serviceobtains RefMetclassification data
Portable analysis toolbox codebase
(R files, PHP, Javascript)+ configuration file
R statistics application
+Libraries
User interfaces Results
Metabolomics Tools:Load and analyze your own dataset
Statistical analysis tools on MW deliver powerful insights into differences among experimental groups
Data sharing with other institutions:MetabolomeXchange (http://metabolomexchange.org)
MetabolomeXchange is an outcome of the European-Commission-funded COSMOS project. Coordinated by EMBL-EBI, to set and promote community standards among data providers. MetabolomeXchange is now an independent consortium that will continue its work beyond the end of COSMOS.
Currently 681 MW studies
Data sharing with other institutions:EBI: Omics Discovery Index (http://www.ebi.ac.uk/Tools/omicsdi/)
Omics Discovery Index is an integrated and open source platform facilitating the access and dissemination of omics datasets. It provides a unique infrastructure to integrate datasets coming from multiple omics studies
Data sharing with other institutions:PubChem: Sharing the MW structure database
https://www.ncbi.nlm.nih.gov/pcsubstance/?term=MetabolomicsWorkbench
@MetabolomicsWB Twitter account
Twitter Overview• Started in June 2017 to engage academics and industry
worldwide on social media• A useful method to disseminate information regarding
Metabolomics Workbench updates as well as Consortium events and opportunities
• The account has grown to 300 followers in ~10 months• “Live-tweeting” the 2017 UAB Metabolomics Symposium
and the 2017 Fall NIH Metabolomics Consortium meeting at UC Davis led to worthwhile engagement of academics in the field both in the U.S. and worldwide
Example Twitter engagement metrics from January 2018 – top tweets
@MetabolomicsWB Twitter account
Twitter Engagement ApproachesTwitter allows individual tweets to be tagged with a hashtag – allowing for quick retrieval of tweets associated with a given topic. The hashtagselected for the 5th Annual UAB Metabolomics Workshop was #UABMetabolomics2017 –allowing twitter users to quickly find tweets associated with the topic and interact with all tweets produced by @MetabolomicsWB for the speaker and workshop sessions.
Tweet threading for message continuity of each speaker
All tweets tagged #UABMetabolomics2017
@MetabolomicsWB Twitter account
Reach MetricsRepresentative speaker session tweet• Figures from slides showing key concepts• Multiple slides per tweet• Summary of content combined with images
Breadth of impression
Online Engagement
Tweeting at the event Non-attendee engagement
@MetabolomicsWB Twitter account
Post-WorkshopFollowing the conclusion of the UAB Metabolomics workshop, UAB began compiling the speaker and workshop PDFs, editing video presentations, and creating presentation slide decks overlaid with speaker audio for upload to the UAB Metabolomics website. UAB and DRCC coordinated to share the links to each of these presentations to allow for researchers to download material.
Post-event recap•Sharing of audio and video (when recorded) from key speaker sessions
•Many presentations and workshop classes available for download as PDF; shared via Twitter for online users
•Awareness of Metabolomics Workbench due to workshop tweeting has increased followers
Find presentations at www.uab.edu/proteomics/metabolomics/workshop/workshop_july_2017.php
Journal Partnership
• DRCC is in contact with multiple Journals regarding establishing the Metabolomics Workbench as a recommended repository
• Two MDPI journals – Data and Metabolites now recommend Metabolomics Workbench as a data repository
• The PLOS family of journals recommends the Metabolomics Workbench
• New subpage on the Workbench to highlight this partnership with links to the journal’s repository recommendations
DRCC Summary/ ongoing efforts
Data repository currently contains 894 studies from over 150 different institutions
Online data submission process has numerous enhancements.
Updated the RefMet database of recommended metabolites to >11,000 records. Critical for comparative analysis and multi-omics efforts
New online analysis tools leverage metabolite classification to provide study overviews and gain rapid insights into differences among experimental groups
Outreach efforts to provide metabolite class-specific portals to the Metabolomics Workbench for the benefit of specialists in lipidomics, nucleic acids, sugars, alkaloids, etc.
Modular, portable suite of statistical tools for metabolomics analysis of user-uploaded (processed) data
Active collaborations with EBI on MetabolomeXchange and Omics Discovery Index websites. Additional exposure for MW studies
NMR12%
ESI-MS67%
GC-MS21%
Analysis type