Methylation and Expression data integration

36
Introduction Methods Results Future Directions Making sense of Methylation & Expression data in Cordblood and Placenta tissues Sahir Rai Bhatnagar 1 March 5, 2015 1 Greenwood Group Lab Meeting 1 / 27

Transcript of Methylation and Expression data integration

Page 1: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Making sense of Methylation & Expression data inCordblood and Placenta tissues

Sahir Rai Bhatnagar1

March 5, 2015

1Greenwood Group Lab Meeting1 / 27

Page 2: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Outline

1 Talk about the data I’m working with

2 Some preliminary results

3 A proposition

2 / 27

Page 3: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Motivation

1 in 4 adult Canadians and 1 in 10 children are clinically obese.

6 million Canadians are at higher risk for type 2 diabetes, high bloodpressure, cardiovascular disease.

Overweight and obesity related health care costs ≈ $6 billion, or4.1% of Canada’s total health care budget

Events during pregnancy are suspected to play a role in childhoodobesity → we don’t know about the mechanisms involved

Children born to women who had a gestational diabetesmellitus-affected pregnancy are more likely to be overweight andobese

Evidence suggests epigenetic factors are important piece of thepuzzle

3 / 27

Page 4: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Research Question(s)

Objectives

1 Identify epigenetic marks observed at birth that help predictchildhood obesity

2 Determine if these epigenetic changes are associated with specificmaternal factors (GD, weight gain during pregnancy)

3 Impact of these epigenetic changes on gene expression levels

4 / 27

Page 5: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Research Question(s)

Objectives

1 Identify epigenetic marks observed at birth that help predictchildhood obesity

2 Determine if these epigenetic changes are associated with specificmaternal factors (GD, weight gain during pregnancy)

3 Impact of these epigenetic changes on gene expression levels

4 / 27

Page 6: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Research Question(s)

Objectives

1 Identify epigenetic marks observed at birth that help predictchildhood obesity

2 Determine if these epigenetic changes are associated with specificmaternal factors (GD, weight gain during pregnancy)

3 Impact of these epigenetic changes on gene expression levels

4 / 27

Page 7: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 8: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 9: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 10: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 11: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 12: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 13: Methylation and Expression data integration

Expressionp = 46, 889

MethylationIllumina 450kp = 375, 561

Phenotype

Placentan = 45

Cord bloodn = 45

GestationalDiabetes(Binary)n = 45GD = 29

7 ContinuousFat MeasuresChild age=5n = 23GD = 16

?? ??

??

Page 14: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Percent Fat and Gestational Age

● ●

●●

5

10

15

20

NGT DGcase

perc

entF

AT case

NGT

DG

●●

●●

38

39

40

41

NGT DGcase

Age

_ges

tatio

nnel

case

NGT

DG

Figure 1 : Distribution of covariates

6 / 27

Page 15: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

Child age and Zscore BMI

● ●

●●

●●

60

70

80

90

NGT DGcase

Age

Moi

s case

NGT

DG

−1

0

NGT DGcase

ZS

core

BM

I

case

NGT

DG

Figure 2 : Distribution of covariates

7 / 27

Page 16: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

5

10

Tricep Bicep Sous_Scapulaire Iliaqueplis_adipeux

valu

e

case

NGT

DG

Figure 3 : Distribution of plis adipeux

8 / 27

Page 17: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

MotivationThe dataVisual Representations

mean methylation values for each probe by tissue

Den

sity

0

1

2

3

0.0 0.2 0.4 0.6 0.8 1.0

cord

0.0 0.2 0.4 0.6 0.8 1.0

placenta

Figure 4 : Density plot of Mean methylation values for each probe by tissue

9 / 27

Page 18: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Motivation

Methylationin Cordblood& Placenta

GestationalDiabetes

Cell typemixture

??

10 / 27

Page 19: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Motivation

Methylationin Cordblood& Placenta

GestationalDiabetes

Cell typemixture

??

10 / 27

Page 20: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Motivation

We perform the adjustment for cell type mixture using SVA

Why SVA ?

see Kevin for details

11 / 27

Page 21: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Motivation

We perform the adjustment for cell type mixture using SVA

Why SVA ?

see Kevin for details

11 / 27

Page 22: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Motivation

We perform the adjustment for cell type mixture using SVA

Why SVA ?

see Kevin for details

11 / 27

Page 23: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Methylation (M) and Expression (E) for Cord blood andPlacenta

M or E ∼ Gestational Diabetes+Gestational Age+Cell Mixture (1)

M or E ∼ Body Fat Measures + Gestational Age+

Sex and Age of child + Cell Mixture (2)

note: The 7 body fat measures were modelled separatelynote: n=45 for model (1), n=23 for model (2)

12 / 27

Page 24: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Adjusting for Cell type mixturesRegression formsq-value

Reporting Evidence

Evidence reported in terms of the p-value and q-value

The q-value is an extension of the False Discovery Rate (FDR), bygiving each feature its own individual measure of significance.

The q-value for a CpG site is the expected proportion of falsepositives incurred when calling that site significant.

Whereas the p-value is a measure of significance in terms of thefalse positive rate, the q-value is a measure in terms of the FDR.

Example: if 10 CpG sites with q-values ≤ 5% are called significantin an EWAS, 1 of these 10 sites is a false positive

The q-value methodology estimates the proportion of features thatare truly null (from the given p-values) denoted by π0 whereas theFDR methodology assumes π0 = 1.

We calculated the q-values using the qvalue package in R.

13 / 27

Page 25: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

Cord blood and Placenta

Table 1 : The number of differentially methylated CpG sites in cord blood andplacenta DNA samples from newborns with or without exposure to gestationaldiabetes mellitus, for unadjusted, age adjusted, age and cell mixture adjustedmodels at different p and q value thresholds.

Threshold 1 × 10−3 0.01 0.025 0.05 0.10

Criteria p q p q p q p q p q

ModelCordbloodUnadj 389 0 3,961 0 10,321 0 21,620 0 44,988 4Age 253 1 2,648 1 6,904 1 14,457 1 31,250 1Cell-adj 575 1 4,150 1 9,531 3 18,365 5 36,100 9

PlacentaUnadj 260 0 2,520 0 6,437 0 13,445 0 28,571 0Age 259 0 2,492 0 6,493 0 13,425 0 28,692 0Cell-adj 451 0 3,368 1 7,997 2 15,919 6 32,333 7

14 / 27

Page 26: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

15 / 27

Page 27: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

16 / 27

Page 28: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

Cord blood

Table 2 : # of significant CpG sites out of 229,550, restricted to probeswith mean methylation values between 10% and 90%.

Threshold 0.01 0.001 1 × 10−4 1 × 10−5

Criteria p q p q p q p q

Primary CovariateBMI 11881 300 2492 44 644 16 192 7BMI.latent 3713 93 924 26 305 11 124 7ZScoreBMI 5558 12 987 2 204 0 64 0ZScoreBMI.latent 3686 43 789 8 243 1 90 0bicep 1843 0 295 0 73 0 20 0bicep.latent 6154 202 1651 56 558 34 208 15iliaque 2815 6 509 1 117 0 33 0iliaque.latent 3580 40 791 5 227 4 81 1percentFAT 1533 40 318 17 117 7 59 4percentFAT.latent 3498 95 914 45 288 27 122 14scap 4947 27 842 8 178 2 48 0scap.latent 3899 72 882 9 254 2 98 2tricep 3965 32 775 12 187 8 65 2tricep.latent 5172 115 1302 41 387 14 150 7

17 / 27

Page 29: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

Placenta

Table 3 : # of significant CpG sites out of 229,533, restricted to probeswith mean methylation values between 10% and 90%.

Threshold 0.01 0.001 1 × 10−4 1 × 10−5

Criteria p q p q p q p q

Primary CovariateBMI 5052 98 1164 32 365 12 134 7BMI.latent 7043 339 2006 90 704 33 293 13ZScoreBMI 4168 33 935 13 254 1 82 0ZScoreBMI.latent 5515 138 1425 50 451 17 167 12bicep 2275 7 436 0 80 0 19 0bicep.latent 3431 40 745 7 200 0 76 0iliaque 9235 326 2304 77 686 13 262 6iliaque.latent 5415 98 1419 40 453 13 143 7percentFAT 2154 91 546 51 219 38 116 29percentFAT.latent 3185 101 797 37 280 17 124 8scap 6699 181 1647 48 484 14 183 7scap.latent 6698 317 1934 104 673 43 289 18tricep 3221 86 781 30 269 12 122 4tricep.latent 4136 88 1016 25 310 13 120 7

18 / 27

Page 30: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

19 / 27

Page 31: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

20 / 27

Page 32: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

Gene Expression Results

Threshold 0.05 0.01 1 × 10−3 1 × 10−4 1 × 10−5

Criteria p q p q p q p q p qBMI 3579 0 938 0 157 0 27 0 2 0

BMI.latent 3281 4 921 0 153 0 36 0 5 0ZScoreBMI 3680 0 994 0 151 0 24 0 3 0

ZScoreBMI.latent 3570 2 1043 1 194 0 34 0 5 0bicep 4262 15 1294 0 220 0 41 0 12 0

bicep.latent 3010 0 777 0 123 0 30 0 5 0iliaque 2518 0 495 0 39 0 4 0 0 0

iliaque.latent 2665 0 618 0 96 0 18 0 6 0percentFAT 2892 0 579 0 67 0 10 0 0 0

percentFAT.latent 2862 6 754 1 123 0 21 0 6 0scap 2886 0 687 0 92 0 21 0 3 0

scap.latent 2780 3 736 0 119 0 21 0 5 0tricep 3335 6 848 0 124 0 18 0 6 0

tricep.latent 2896 3 728 1 137 0 30 0 6 0

21 / 27

Page 33: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

22 / 27

Page 34: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Methylation ∼ Gestational DiabetesMethylation ∼ Body Fat measuresGene Expression ∼ Body Fat measures

23 / 27

Page 35: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Overlap AnalysisPLS Path Modeling

Ad-hoc Overlap Analysis

Table 4 : # of Overlapping locationswithin 5kb that are significant at p < 10−3

in the placenta, for both expression andmethylation data

Primary Covariate # of Overlapping locations

BMI 3BMI.latent 6ZScoreBMI 5ZScoreBMI.latent 10bicep 4bicep.latent 1iliaque 1iliaque.latent 1percentFAT 1percentFAT.latent 3scap 3scap.latent 4tricep 2tricep.latent 1

24 / 27

Page 36: Methylation and Expression data integration

IntroductionMethodsResults

Future Directions

Overlap AnalysisPLS Path Modeling

Expression

Methylation

Fat

Gene 1

CpG 1

BMI

Gene 2

CpG 2

% Fat

Gene 3

CpG 3

Bicep

25 / 27