Metabolomic Data Analysis Case Studies

32
Metabolomic Data Analysis Case Studies Dmitry Grapov, PhD Case Studies

description

Part of a lectures series for the international summer course in metabolomics 2013 (http://metabolomics.ucdavis.edu/courses-and-seminars/courses). Get more material and information here (http://imdevsoftware.wordpress.com/2013/09/08/sessions-in-metabolomics-2013/).

Transcript of Metabolomic Data Analysis Case Studies

Page 1: Metabolomic Data Analysis Case Studies

Metabolomic Data Analysis Case Studies

Dmitry Grapov, PhD

Case

Stu

dies

Page 2: Metabolomic Data Analysis Case Studies

Case Studies

1. Data Exploration and Analysis Planning• Lung Cancer

2. Multifactorial Design• Mouse Cerebellum

3. Time Course• OGTT Metabolomics

Page 3: Metabolomic Data Analysis Case Studies

Analysis Planning

DOD Lung Cancer Plasma (CARET)Summary•Analysis of plasma primary metabolites to identify circulating markers related with lung cancer histology type.

Methods•Exploratory data analysis using principal components analysis (PCA)•Analysis of covariance (ANCOVA)•Orthogonal partial least squares discriminant analysis (OPLS-DA)•Hierarchical cluster analysis (HCA) and multidimensional scaling (MDS)

Page 4: Metabolomic Data Analysis Case Studies

Lung Cancer: Exploratory Analysis Purpose•Overview data variance structureMethods•Singular value decomposition (SVD) on autoscaled data

PC1 and 2 (14% variance explained) display 2 clusters of points

Cluster structure could not be explained by histology or any other metadata

Cluster structure is best explained by instrumental acquisition date

Black - 110629 to 110701Red - 110702 to 110705

Page 5: Metabolomic Data Analysis Case Studies

Lung Cancer: Analysis PlanningPurpose•Identify significant changes in metabolites while adjusting for the noted batch effect, gender and smoking status covariates. Methods•Shifted logarithm (natural) transformed data•ANCOVA: batch + gender + smoking•False Discovery Rate correction and estimation

PCA used to overview covariate adjusted data structure

Cluster structure in the adjusted data suggests that there is another unexplained covariate

OPLS-DA was used to evaluate covariate adjustments and hypothesis testing strategies

Modeling histology (control in green) Modeling control/cancer and histology

Page 6: Metabolomic Data Analysis Case Studies

Lung Cancer: ANCOVA• Summary• Optimal testing strategy was identified as :• Using covariate adjusted data ( ~batch +gender +smoking) to test for differences between control and

cancer (adenocarcinoma, NSCLC and squamous)

OPLS-DA overview of optimized modeling strategy

Identified 24 (8%) significantly changes species (3 post FDR)

Page 7: Metabolomic Data Analysis Case Studies

Lung Cancer: Correlation Analysis

PurposeIdentify relationships between known and unknown metabolic features.

Methods•Hierarchical cluster analysis (euclidean distances from spearmans correlations, linked by wards method)

Summary•Top features could be grouped into 8 major correlated clusters

Top changed unknown metabolites could be linked to named species•223566 tryptophan ∝•225405 1/ beta-alanine∝•274174 methionine, glucuronic acid∝•228377 tryptophan∝•362112 tryptophan∝

Page 8: Metabolomic Data Analysis Case Studies

Lung Cancer

Conclusions• Metabolic data contained batch effects, which could be in part explained

by data acquisition date• Univariate analyses were limited by the effects of outliers• Multivariate modeling was used to identify 64 features (21%) which best

explain differences in plasma metabolites from patients with or without lung cancer

• hydroxylamine, aspartic acid, and tryptophan displayed patterns of change consistent with differences in patient cancer histology

• Correlation analysis was used to link many significant changes in unknowns to tryptophan

Page 9: Metabolomic Data Analysis Case Studies

Multifactorial Design

Mouse Cerebellum MetabolomicsSummary•Analysis of mice carrying a gene mutation in ERCC8. Cockayne Syndrome B, rare autosomal recessive congenital disorder, which is related to premature aging. Mutant animals display altered glycolytic and mitochondrial metabolism which is benefited by a high fat diet.

Study Design•2 genotypes (WT, CSB; n=20)•4 diets per genotype (SD, Resv, CR, HFD; n=5)

Analysis•principal components analysis (PCA)•two-way analysis of variance (ANOVA)•orthogonal partial least squares discriminant analysis (OPLS-DA)•network mapping

Page 10: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: PCA

MethodConducted on autoscaled data using SVD.

FindingsIdentified 6 possible outliers all of which are in the WT genotype

Page 11: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Outliers

methodsUse PLS-DA to determine if outlier samples hold when trying to maximize the difference between WT and CSB animals.

FindingsNoted outliers in WT should be removed or analyzed separately

PCA

PLS-DA

Page 12: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: ANOVAMethods•shifted log transformed data•two-way ANOVA (genotype, diet)FindingsIdentification of significant changes in metabolites due to genotype, diet (treatment) and interaction between genotype and diet

genotype effect treatment effect interaction effect

Page 13: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Multivariate ModelingMethods•autoscaled data•classification of sample genotype OSC-PLS-DA/OPLS-DA

OSC-PLS-DA/OPLS-DA Validation

Page 14: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Multivariate ModelingMethods•autoscaled data•classification of sample genotype and diet (OPLS-DA)•evaluation of Y construction (separate and combined)

multiple Y single Y

Page 15: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Multivariate ModelingMethods•autoscaled data•classification of diet (treatment) effects independently in each genotype

WT CSB

Page 16: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Network AnalysisMethods•generate biochemical and chemical similarity network•map statistical and OPLS-DA model results to network•Analyze

– genotype network– Treatment networks in WT and CSB separately

Page 17: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: Genotype Network

Page 18: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: WT Treatment Network

Page 19: Metabolomic Data Analysis Case Studies

Mouse Cerebellum: CSB Treatment Network

Page 20: Metabolomic Data Analysis Case Studies

Mouse Cerebellum

Conclusions

Major differences between CSB and WT :• elevation of 2-hydroxyglutaric acid in CSB

• 2-hydroxyglutaric aciduria is either autosomal recessive or autosomal dominant

• perturbations in methionine and (potentially) single-carbon metabolisms.

– Increase in the related species methionine, homoserine and serine and decrease in adenosine-5'phosphate may point to decreases in s-adenosyl methionine (SAM-e) synthesis. Reduction in SAM-e could have detrimental effects on single carbon metabolism and methylation reactions, which through a systemic reduction in choline would impact phospotidylcholine synthesis.

•Independent of genotype, treatment effects can be classified on a continuum of metabolic change from CR >HFD > Resv > SD.

– Treatment-related changes in citrulline were modified based on genotype (strong genotype/treatment interaction).

•Similar changes due to treatment in both genotypes (e.g. 1,5-anhydroglycitol) may be an outcome of diet composition and not biology.

Page 21: Metabolomic Data Analysis Case Studies

Time Course

Oral Glucose Tolerance Test MetabolomicsSummary•Analysis of changes in plasma primary metabolites during an oral glucose tolerance test (OGTT) before and after a 14 week diet and exercise intervention.

Study Design•Overweight women (12-15, obese sedentary, glucose 100 -128 mg/dL )

–Pre and post intervention•Clinical panel: insulin, glucose, lipids•Primary metabolites at 0, 30, 60, 90, 120 minutes

Analysis•principal components analysis (PCA)•two-way analysis of variance (ANOVA)•orthogonal partial least squares discriminant analysis (OPLS-DA)•network mapping

Page 22: Metabolomic Data Analysis Case Studies

OGTT: Data PropertiesExcursion

Baseline and Area Under the Curve

(AUC)

Page 23: Metabolomic Data Analysis Case Studies

Time Course: Options

Baseline adjusted vs AUCRaw (top) vs Baseline adjusted (bottom)

Page 24: Metabolomic Data Analysis Case Studies

OGTT: Data Analysis

• Identification of OGTT effects– significant metabolomic excursions (one sample t-Test on AUC)

• pre, post or both– intervention-adjusted PLS model– OGTT biochemical/chemical similarity network

• Identification of treatment effects– Univariate statics

• Two-way ANOVA time and intervention• Mixed effects modeling (intervention as the main effect and individual subjects as

random effects)– PLS-DA modeling and feature selection of changes in

• Baseline (t =0)• AUC• Combined baseline and AUC

– Analysis of correlations

Page 25: Metabolomic Data Analysis Case Studies

OGTT: effects on primary metabolism

PCAPLS-DA

(intervention adjusted data modeling time)

Page 26: Metabolomic Data Analysis Case Studies

OGTT: effects network

Page 27: Metabolomic Data Analysis Case Studies

OGTT: Treatment Effects

PLS-DA

Page 28: Metabolomic Data Analysis Case Studies

OGTT: Treatment Effects

Learning from the samples scores position

Page 29: Metabolomic Data Analysis Case Studies

OGTT: Treatment Effects

Feature Selection on Loadings

Variable Loadings

Page 30: Metabolomic Data Analysis Case Studies

OGTT: Linking biology with our experiment

Page 31: Metabolomic Data Analysis Case Studies

OGTT: Analysis of Correlations

Page 32: Metabolomic Data Analysis Case Studies

Conclusion

• Each data analysis is unique• Which method “should” be used is

defined by how the data “looks” and the goal of the analysis

• Different analysis techniques are used to get independent perspectives of the data

• Combination of similar evidence from different techniques is used to define the robust explanation of the experiment