Enzymatic Methyl-seq: Next Generation Methylomes

1
Fiona J. Stewart 1 , V K Chaithanya Ponnaluri 1 , Brittany S. Sexton 1 , Louise Williams 1 , Bradley W. Langhorst 1 , Romualdas Vaisvila 1 , Yvonne Helbert 2 , Lei Zhang 2 , Timothy Harkins 2 , Kevin J. McKernan 2 , Eileen T. Dimalanta 1 , Theodore B. Davis 1 1 New England Biolabs, Inc., Ipswich, MA 01938 2 Medicinal Genomics Corp., Woburn, MA 01801 Enzymatic Methyl-seq: Next Generation Methylomes DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) gives us greater insight into potential gene regulatory mechanisms. Bisulfite sequencing (BS) is traditionally used to detect methylated Cs, however, BS does have its drawbacks. DNA is commonly damaged and degraded by the chemical bisulfite reaction resulting in libraries that demonstrate high GC bias and are enriched for methylated regions. To overcome these limitations, we developed an enzymatic approach, NEBNextÒ Enzymatic Methyl-seq (EM-seqÔ), for methylation detection that minimizes DNA damage, resulting in longer fragments and minimal GC bias. Illumina libraries were prepared using bisulfite and EM-seq methods with 50 ng DNA from Arabidopsis thaliana and Cannabis sativa DNA. Libraries were sequenced using Illumina’s NextSeq 500 (2x75). Reads were aligned using BWAMeth 0.2. and methylation information was extracted from the alignments using MethyDackel. Total 5mC levels were compared between the sequencing data from EM-seq and WGBS libraries and LCMS (Liquid Chromatography Mass Spectrometry). 5mC levels determined by EM-seq are close to those from LCMS, whereas, WGBS results in an overestimation of 5mC. Additionally, EM-seq libraries produce higher quality sequencing metrics such as longer inserts, lower duplication rates, a higher percentage of mapped reads and less GC bias compared to bisulfite converted libraries. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage, increased sequencing length, and decreased GC-bias. Sample Preparation Data Analysis Methyl Kit BWAmeth Methyl Extraction Align (BWAmeth) Deduplicate Picard Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) or the Arabidopsis reference genome (TAIR10) (for Jamaican Lion, four miscellaneous contigs were removed from methylation analysis) Data were analyzed using the tools in above flowchart. Two plant DNAs were used to make EM-seq libraries Cannabis sativa genomic DNA (Jamaican Lion): female clones (leaf, seeded and unseeded flowers) & male sibling (flowers) plants Arabidopsis thaliana Libraries were made using 50 ng genomic DNA, spiked with control DNA (unmethylated lambda & CpG-methylated pUC19) Libraries were sequenced using an Illumina NextSeq 500, 2x76 base paired reads. 5caC is sequenced as C and deaminated C as T. Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-Gold TM kit Cannabis sativa: Higher Quality Sequencing Metrics with EM-seq compared to WGBS Cannabis sativa female leaf EM-seq libraries are superior to WGBS EM-seq libraries outperform WGBS for Cannabis sativa input DNA. (A) Fewer PCR cycles are required for EM-seq. (B) EM-seq libraries have lower duplication than WGBS. (C) EM-seq libraries have larger library insert sizes than WGBS. In addition, the EM-seq protocol has been optimized for both standard and large insert libraries. (D) GC distribution is more even for EM-seq than WGBS. (E) EM-seq has higher coverage depths than WGBS. 50 million 2 x 76 base reads were used for analysis (Figures A-D) and 130 million 2 x 76 base reads were used for Figure E. PCR Yield (nM) EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 0 5 10 15 20 25 6 PCR cycles 8 PCR cycles 0.5 1.0 1.5 2.0 2.5 3.0 3.5 EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 % Duplication 0 EM-seq can be used to investigate plant genomic DNA analysis of the Cannabis sativa methylome identified genes involved in seed and THC production the Arabidopsis methylome was successfully probed Acknowledgements Thank you to Laurie Mazzola and Danielle Fuchs for their technical support. Higher library yields with fewer PCR cycles Lower percent duplication More even base coverage Larger library insert sizes Less GC bias Similar percentage methylation as LC-MS Differential CpG methylation identified between Cannabis flower tissues Differential methylation across Cannabis sativa flower tissues using EM-seq data. Female flower and female seeded flower (clones) as well as the male flower (sibling) were studied. (A) Percent cytosine methylation in the CpG, CHG, and CHH contexts. Female methylation levels for both flower and seeded flower were higher than for the male flower indicating methylation patterns are potentially determined by sex. Control DNA methylation levels were <0.5% for unmethylated lambda and >97.5% for CpG methylated pUC19 (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG context between (1) female flower and female seeded flower (2) female flower and male flower. The differential methylation calls for female flower and female seeded flower identified 30 hypomethylated CpGs but no hypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000 hypomethylated CpGs and >11,000 hypermethylated CpGs. (C) Comparison of the hypomethylated CpGs with differential methylation between the female flower samples & female and male flowers. B Methylated Unmethylated Genomic Location (504 bp) Methylated Unmethylated Differential methylation analysis comparing the female flower with the seeded flower (clones) and the male flower (sibling). Significant differential methylation calls (q<0.01) for CpG context (1x coverage minimum) between female flower with female seeded flower and/or male flower were identified. (A) Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked to the positive regulation of seed production. The female flower is CpG methylated in this region, suggesting the expression of the genes are turned off, while the female seeded flower is unmethylated at this CpG. (B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and this gene is linked to the positive regulation of THCA production. The female flower is CpG unmethylated in this region, suggesting the expression of the gene is turned on, while the male flower is methylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce log scale less THCA. (A) Comparison of the total percentage of 5mC determined using LC- MS, EM-seq and WGBS for female leaf. The percentage of methylated cytosines using LC-MS is calculated by determining the amount of methylated cytosines and the total cytosines ((5mC/(total C)x100). 5mC percentages for EM-seq and WGBS were determined by combining 5mC in the three cytosine contexts (CpG, CHH, CHG). EM-seq cytosine methylation numbers are closer to LCMS methylation values than WGBS. (B) Cytosine methylation in the CpG, CHG, and CHH contexts for EM-seq and WGBS for female leaf. Control cytosine methylation for unmethylated lambda DNA was <0.5% for EM-seq and <2% for WGBS, and for CpG methylated pUC19 was >97.5% for both EM-seq and WGBS (data not shown). (C) CpG context correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates and methods (CHG and CHH context data not shown). 50 million, 2 x 76 base reads were used for methylation analysis. Female & Female Seeded Flower Female & Male Flower C B Methylation profiles identified genes involved in seed & cannabinoid production Genomic Location (992 bp) WGBS Total % 5mC A 40 30 20 10 0 EM-seq LC-MS A 5 0 5 100 75 50 25 0 % 5mC per C context Female Flower Female Seeded Flower Male Flower C Hypo Hyper 0 100 q-value Methylation Difference 100 50 0 50 100 0 5 0 5 0 100 50 0 50 100 0.0100 0.0075 0.0050 0.0025 0.0000 -100 0 100 Methylation Difference -100 0.0100 0.0075 0.0050 0.0025 0.0000 q-value Female & Female Seeded Flower Female & Male Flower % of Reads EM-seq (standard) EM-seq (large insert) WGBS 0 4 8 12 16 20 50 100 200 300 400 500 600 700 Insert Size (bp) Normalized Coverage Percent GC 0 10 20 30 40 50 60 70 80 90 100 EM-seq WGBS 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 A B C D Coverage Depth % of Bases at Coverage Depth 10 0 4 2 6 8 1 EM-seq WGBS 30 5 10 15 20 25 E EM-seq Female Leaf WGBS Female Leaf B 100 75 50 25 0 % 5mC per C context CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH C 0 0.05 0.1 0.15 0.2 0.25 0.3 0 200 400 600 Count Millions Insert Size (bp) Library Insert Size WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 Histogram of CpG coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500000 1000000 1500000 2000000 0.3000.30.20 0.50.30.3 1.4 2.1 7.3 21 41.1 20.4 4.2 0.20 0 0 0 00.10.10 0 0 0 00.10 0 0 EMseq1 Histogram of CpG coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 200000 400000 600000 800000 1000000 1200000 1.2 00 2.2 3.6 0 11.7 7.5 7.8 21.4 10.5 10.5 7.8 5.8 3 2.2 1.7 1.1 0.8 0.5 0.30.20.10 0 0 0 0 0 0 0 0 0 WGBS1 Histogram of % CpG methylation % methylation per base Frequency 0 20 40 60 80 100 0 500000 1500000 2500000 3500000 64.5 1.8 0.7 0.6 0.5 0.5 0.5 0.5 0.4 0.7 0.4 0.7 0.8 1.2 1.7 2.3 3.2 4.9 5.9 8.1 EMseq1 Histogram of % CpG methylation % methylation per base Frequency 0 20 40 60 80 100 0 500000 1500000 2500000 3500000 63.6 2 1.2 1 0.6 0.4 0.6 0.5 0.4 1 0.2 0.9 0.6 1.4 1.9 2.1 2.4 4.5 4.1 10.7 WGBS1 0 1 2 3 4 5 6 1 11 21 31 41 CpGs covered Millions Minimum Coverage Depth CpG coverage distribution WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 EM-seq and WGBS libraries were made using 50 ng Arabidopsis thaliana genomic DNA. Libraries were sequenced on an Illumina NextSeq 500 (2 x 75 bases). 125 million paired end reads for each library were aligned to TAIR10 using bwa-meth 0.2.2. CpG, CHH and CHG sites on both strands were counted independently. (A) EM-seq libraries have larger insert sizes. (B) Cytosine methylation in the CpG and CHN contexts for EM-seq and WGBS. (C) GC Bias plot for EM-seq and WGBS libraries. EM-seq libraries show less bias across the GC content. Histogram of CHG coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500000 1000000 1500000 2000000 2500000 0.3000.20.20 0.40.20.2 1.2 1.8 6.5 19.9 41.6 22 4.7 0.20 0 0 0 00.10.10 0 0 0 00.10 0 0 EMseq1 Histogram of CHG coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 200000 400000 600000 800000 1200000 1.1 00 2.1 3.5 0 11.8 7.7 8.2 22.5 11 10.6 7.5 5.3 2.6 1.9 1.4 0.9 0.6 0.4 0.30.20.10 0 0 0 0 0 0 0 0 0 WGBS1 Histogram of % CHG methylation % methylation per base Frequency 0 20 40 60 80 100 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 80.7 1.8 1 1 1 0.9 1 1.1 0.9 1.4 0.8 1.2 1 1.2 1.2 1 0.9 0.8 0.5 0.7 EMseq1 Histogram of % CHG methylation % methylation per base Frequency 0 20 40 60 80 100 0e+00 1e+06 2e+06 3e+06 4e+06 78.7 2.5 1.5 1.3 1 0.7 1 1 0.8 1.6 0.5 1.2 0.8 1.3 1.3 1.1 0.9 1 0.6 1.5 WGBS1 Histogram of CHH coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0e+00 2.0e+06 4.0e+06 6.0e+06 8.0e+06 1.0e+07 1.2e+07 0.3000.30.20 0.40.20.3 1.3 1.9 6.8 20.3 41.4 21.4 4.6 0.20 0 0 0 00.10.10 0 0 0 00.10 0 0 EMseq1 Histogram of CHH coverage log10 of read coverage per base Frequency 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 1.2 00 2 3.4 0 11.2 7.2 7.7 21.4 10.7 10.8 8.1 6 3.1 2.3 1.7 1.1 0.8 0.5 0.30.20.10 0 0 0 0 0 0 0 0 0 WGBS1 Histogram of % CHH methylation % methylation per base Frequency 0 20 40 60 80 100 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 88.9 3.5 1.6 1.3 0.9 0.6 0.6 0.5 0.4 0.5 0.2 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1 EMseq1 Histogram of % CHH methylation % methylation per base Frequency 0 20 40 60 80 100 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 84.8 4.5 2.5 1.8 1.1 0.7 0.8 0.6 0.4 0.7 0.2 0.4 0.2 0.3 0.2 0.1 0.1 0.1 0 0.2 WGBS1 0 1 2 3 4 5 6 7 1 11 21 31 41 CHGs covered Millions Minimum Coverage Depth CHG coverage distribution WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 0 5 10 15 20 25 30 35 1 21 41 CHH covered Millions Minimum Coverage Depth CHH coverage distribution WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 www.NEBNext.com 0% 10% 20% 30% 40% EM-seq WGBS % 5mC in CpG context CpG EM-seq WGBS 0% 2% 4% 6% 8% 10% EM-seq WGBS % 5mC in CHN context CHN EM-seq WGBS EM-seq libraries cover more cytosines to greater minimum depths than WGBS. EM-seq identifies more CpGs, CHHs and CHGs, at higher coverage depth compared to WGBS, resulting in more usable information. Cytosine correlations of EM- seq and WGBS libraries (1x minimum coverage). Both EM- seq and WGBS libraries were highly correlated between replicates and methods). 125 million, 2 x75 base reads were used for methylation analysis. Correlations in the CpG, CHG and CHH contexts are shown Cannabis sativa Arabidopsis thaliana Introduction Methods Methods CpG CHG CHH CpG CHG CHH CpG CHG CHH C A Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 0 1 2 3 4 5 6 7 8 0 20 40 60 80 100 Normalized coverage Percent GC GC-bias plot WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 A C C Conclusion Histograms showing CpG, CHG and CHH coverage of EM-seq (A) and WGBS (C) libraries were plotted using MethylKit. The histogram is shifted right for EM-seq compared to WGBS indicating that more CpGs are detected at higher coverage for EM-seq libraries compared to WGBS. Percent methylation for libraries made using EM-seq (B) and WGBS (D) are shown in the CpG, CHG and CHH contexts. EM-seq and WGBS data both show that most cytosines in the CHG and CHH contexts are not methylated. Methylation is found in the CpG context for both EM-seq and WGBS libraries. A B C D EM-seq libraries compared to WGBS libraries had:

Transcript of Enzymatic Methyl-seq: Next Generation Methylomes

Page 1: Enzymatic Methyl-seq: Next Generation Methylomes

Fiona J. Stewart1, V K Chaithanya Ponnaluri1, Brittany S. Sexton1, Louise Williams1, Bradley W. Langhorst1, Romualdas Vaisvila1, Yvonne Helbert2, Lei Zhang2, Timothy Harkins2, Kevin J. McKernan2, Eileen T. Dimalanta1, Theodore B. Davis1

1 New England Biolabs, Inc., Ipswich, MA 01938 2 Medicinal Genomics Corp., Woburn, MA 01801

Enzymatic Methyl-seq: Next Generation Methylomes

DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) gives us greater insight into potential gene regulatory mechanisms. Bisulfite sequencing (BS) is traditionally used to detect methylated Cs, however, BS does have its drawbacks. DNA is commonly damaged and degraded by the chemical bisulfite reaction resulting in libraries that demonstrate high GC bias and are enriched for methylated regions. To overcome these limitations, we developed an enzymatic approach, NEBNextÒEnzymatic Methyl-seq (EM-seqÔ), for methylation detection that minimizes DNA damage, resulting in longer fragments and minimal GC bias.

Illumina libraries were prepared using bisulfite and EM-seq methods with 50 ng DNA from Arabidopsis thaliana and Cannabis sativa DNA. Libraries were sequenced using Illumina’s NextSeq500 (2x75). Reads were aligned using BWAMeth 0.2. and methylation information was extracted from the alignments using MethyDackel. Total 5mC levels were compared between the sequencing data from EM-seq and WGBS libraries and LCMS (Liquid Chromatography Mass Spectrometry). 5mC levels determined by EM-seq are close to those from LCMS, whereas, WGBS results in an overestimation of 5mC. Additionally, EM-seq libraries produce higher quality sequencing metrics such as longer inserts, lower duplication rates, a higher percentage of mapped reads and less GC bias compared to bisulfite converted libraries. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage, increased sequencing length, and decreased GC-bias.

Sample Preparation

Data Analysis

Methyl KitBWAmeth

Methyl Extraction

Align(BWAmeth) Deduplicate Picard

• Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) or theArabidopsis reference genome (TAIR10) (for Jamaican Lion, four miscellaneous contigswere removed from methylation analysis)

• Data were analyzed using the tools in above flowchart.

• Two plant DNAs were used to make EM-seq libraries• Cannabis sativa genomic DNA (Jamaican Lion): female clones (leaf, seeded and

unseeded flowers) & male sibling (flowers) plants• Arabidopsis thaliana

• Libraries were made using 50 ng genomic DNA, spiked with control DNA (unmethylatedlambda & CpG-methylated pUC19)

• Libraries were sequenced using an Illumina NextSeq 500, 2x76 base paired reads. 5caC issequenced as C and deaminated C as T.

• Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-GoldTM kit

Cannabis sativa: Higher Quality Sequencing Metrics with EM-seq compared to WGBS Cannabis sativa female leaf EM-seq libraries are superior to WGBS

EM-seq libraries outperform WGBS for Cannabis sativa input DNA. (A)Fewer PCR cycles are required for EM-seq. (B) EM-seq libraries havelower duplication than WGBS. (C) EM-seq libraries have larger libraryinsert sizes than WGBS. In addition, the EM-seq protocol has beenoptimized for both standard and large insert libraries. (D) GC distributionis more even for EM-seq than WGBS. (E) EM-seq has higher coveragedepths than WGBS. 50 million 2 x 76 base reads were used for analysis(Figures A-D) and 130 million 2 x 76 base reads were used for Figure E.

0

5

10

15

20

25

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Yiel

d (n

M)

PCR

Yie

ld (n

M)

EM-seq-1 EM-seq-2 WGBS-1 WGBS-20

5

10

15

20

25

6 PCR cycles 8 PCR cycles

0

0.5

1

1.5

2

2.5

3

3.5

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Perc

ent D

uplic

atio

n

0.5

1.0

1.5

2.0

2.5

3.0

3.5

EM-seq-1 EM-seq-2 WGBS-1 WGBS-2

% D

uplic

atio

n

0

EM-seq can be used to investigate plant genomic DNA• analysis of the Cannabis sativa methylome identified genes involved in seed and THC production • the Arabidopsis methylome was successfully probed

Acknowledgements Thank you to Laurie Mazzola and Danielle Fuchs for their technical support.

• Higher library yields with fewer PCR cycles• Lower percent duplication• More even base coverage

• Larger library insert sizes• Less GC bias• Similar percentage methylation as LC-MS

Differential CpG methylation identified between Cannabis flower tissues

Differential methylation across Cannabis sativa flower tissues using EM-seq data. Female flower and female seeded flower (clones) as well as the male flower (sibling)were studied. (A) Percent cytosine methylation in the CpG, CHG, and CHH contexts. Female methylation levels for both flower and seeded flower were higher than for themale flower indicating methylation patterns are potentially determined by sex. Control DNA methylation levels were <0.5% for unmethylated lambda and >97.5% for CpGmethylated pUC19 (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG context between (1) female flower andfemale seeded flower (2) female flower and male flower. The differential methylation calls for female flower and female seeded flower identified 30 hypomethylatedCpGs but no hypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000 hypomethylated CpGs and >11,000hypermethylated CpGs. (C) Comparison of the hypomethylated CpGs with differential methylation between the female flower samples & female and male flowers.

B

MethylatedUnmethylated

Genomic Location (504 bp)

MethylatedUnmethylated

Differential methylation analysis comparingthe female flower with the seeded flower(clones) and the male flower (sibling).Significant differential methylation calls(q<0.01) for CpG context (1x coverageminimum) between female flower with femaleseeded flower and/or male flower wereidentified. (A) Region 1 BLASTs to the edestingene family of the Cannabis sativa genome.These genes are linked to the positiveregulation of seed production. The femaleflower is CpG methylated in this region,suggesting the expression of the genes areturned off, while the female seeded flower isunmethylated at this CpG. (B) Region 2BLASTs to the THC acid synthase (THCA)gene of the Cannabis sativa genome and thisgene is linked to the positive regulation ofTHCA production. The female flower is CpGunmethylated in this region, suggesting theexpression of the gene is turned on, while themale flower is methylated at this CpG,suggesting the expression of the gene isturned off. Male flowers produce log scaleless THCA.

(A) Comparison of the total percentage of 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The percentage ofmethylated cytosines using LC-MS is calculated by determining theamount of methylated cytosines and the total cytosines ((5mC/(totalC)x100). 5mC percentages for EM-seq and WGBS were determinedby combining 5mC in the three cytosine contexts (CpG, CHH, CHG).EM-seq cytosine methylation numbers are closer to LCMSmethylation values than WGBS. (B) Cytosine methylation in the CpG,CHG, and CHH contexts for EM-seq and WGBS for female leaf.Control cytosine methylation for unmethylated lambda DNA was<0.5% for EM-seq and <2% for WGBS, and for CpG methylatedpUC19 was >97.5% for both EM-seq and WGBS (data not shown).(C) CpG context correlations of EM-seq and WGBS libraries (1xminimum coverage). Both EM-seq and WGBS libraries were highlycorrelated between replicates and methods (CHG and CHH contextdata not shown). 50 million, 2 x 76 base reads were used formethylation analysis.

Female & Female Seeded Flower

Female & Male Flower

C

B

Methylation profiles identified genes involved in seed & cannabinoid production

Genomic Location (992 bp)

WGBS0

10

20

30

40

LC-MS Female Leaf EM-seq Female Leaf WGBS Female Leaf

Tota

l % 5

mC

A 40

30

20

10

0EM-seqLC-MS

A

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH

100

75

50

25

0

% 5

mC

per

C c

onte

xt

Female Flower Female Seeded Flower

Male Flower

C

Hypo Hyper

0 100

q-va

lue

Methylation Difference

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue factor(factor)

hypo

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue

factor(factor)hypo

hyper

0.0100

0.0075

0.0050

0.0025

0.0000

-100 0 100Methylation Difference

-100

0.0100

0.0075

0.0050

0.0025

0.0000

q-va

lue

Female & Female Seeded Flower Female & Male Flower

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

Insert Size

JL

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

0.18%

0.20%

0.22%

Avg. %

of in

sert

s

% o

f Rea

ds

EM-seq (standard)EM-seq (large insert)WGBS

0

4

8

12

16

20

50 100 200 300 400 500 600 700Insert Size (bp)

Nor

mal

ized

Cov

erag

e

Percent GC0 10 20 30 40 50 60 70 80 90 100

EM-seqWGBS

00.20.40.60.81.01.21.41.61.82.02.2

A B

C D

Coverage Depth

% o

f Bas

es a

t Cov

erag

e D

epth

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Coverage Depth

Pro

ject

Medic

inalG

enom

ics_Leaf_

WG

BS

_E

Mseq_C

om

bin

edB

am

s

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

Avg. F

rac B

ases

10

0

4

2

6

8

1

EM-seqWGBS

305 10 15 20 25

E

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH

EM-seq Female Leaf WGBS Female Leaf

B 100

75

50

25

0

% 5

mC

per

C c

onte

xt

CpG

CH

GC

HH

CpG

CH

GC

HH

CpG

CH

GC

HH

CpG

CH

GC

HH

C

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600

Cou

ntM

illion

s

Insert Size (bp)

Library Insert Size WGBS-1WGBS-2EM-seq-1EM-seq-2

Histogram of CpG coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

050

0000

1000

000

1500

000

2000

000

0.30 00.30.200.50.30.31.42.1

7.3

21

41.1

20.4

4.2

0.20 0 0 0 00.10.10 0 0 0 00.10 0 0

EM−seq−1

Histogram of CpG coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

020

0000

4000

0060

0000

8000

0010

0000

012

0000

0

1.20 0

2.2

3.6

0

11.7

7.57.8

21.4

10.510.5

7.8

5.8

32.2

1.71.10.80.50.30.20.10 0 0 0 0 0 0 0 0 0

WGBS−1

Histogram of % CpG methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

050

0000

1500

000

2500

000

3500

000 64.5

1.8 0.7 0.6 0.5 0.5 0.5 0.5 0.4 0.7 0.4 0.7 0.8 1.2 1.7 2.3 3.24.9 5.9

8.1

EM−seq−1

Histogram of % CpG methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

050

0000

1500

000

2500

000

3500

000

63.6

2 1.2 1 0.6 0.4 0.6 0.5 0.4 1 0.2 0.9 0.6 1.4 1.9 2.1 2.44.5 4.1

10.7

WGBS−1

0

1

2

3

4

5

6

1 11 21 31 41

CpG

s co

vere

dM

illion

s

Minimum Coverage Depth

CpG coverage distribution

WGBS-1WGBS-2EM-seq-1EM-seq-2

EM-seq and WGBS libraries were made using 50 ng Arabidopsis thaliana genomic DNA. Libraries were sequenced on an Illumina NextSeq 500 (2 x 75 bases). 125 million paired end reads for each library were aligned to TAIR10 using bwa-meth 0.2.2. CpG, CHH and CHG sites on both strands were counted independently. (A) EM-seq libraries have larger insert sizes. (B) Cytosine methylation in the CpG and CHN contexts for EM-seq and WGBS. (C) GC Bias plot for EM-seq and WGBS libraries. EM-seq libraries show less bias across the GC content.

Histogram of CHG coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

050

0000

1000

000

1500

000

2000

000

2500

000

0.30 00.20.200.40.20.21.21.8

6.5

19.9

41.6

22

4.7

0.20 0 0 0 00.10.10 0 0 0 00.10 0 0

EM−seq−1

Histogram of CHG coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

020

0000

4000

0060

0000

8000

0012

0000

0

1.10 0

2.1

3.5

0

11.8

7.78.2

22.5

1110.6

7.5

5.3

2.61.91.40.90.60.40.30.20.10 0 0 0 0 0 0 0 0 0

WGBS−1

Histogram of % CHG methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

0e+0

01e

+06

2e+0

63e

+06

4e+0

65e

+06

80.7

1.8 1 1 1 0.9 1 1.1 0.9 1.4 0.8 1.2 1 1.2 1.2 1 0.9 0.8 0.5 0.7

EM−seq−1

Histogram of % CHG methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

0e+0

01e

+06

2e+0

63e

+06

4e+0

6

78.7

2.5 1.5 1.3 1 0.7 1 1 0.8 1.6 0.5 1.2 0.8 1.3 1.3 1.1 0.9 1 0.6 1.5

WGBS−1

Histogram of CHH coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0e

+00

2.0e

+06

4.0e

+06

6.0e

+06

8.0e

+06

1.0e

+07

1.2e

+07

0.30 00.30.200.40.20.31.31.9

6.8

20.3

41.4

21.4

4.6

0.20 0 0 0 00.10.10 0 0 0 00.10 0 0

EM−seq−1

Histogram of CHH coverage

log10 of read coverage per base

Freq

uenc

y

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0e+0

01e

+06

2e+0

63e

+06

4e+0

65e

+06

6e+0

6

1.20 0

23.4

0

11.2

7.27.7

21.4

10.710.8

8.1

6

3.12.3

1.71.10.80.50.30.20.10 0 0 0 0 0 0 0 0 0

WGBS−1

Histogram of % CHH methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

0.0e

+00

5.0e

+06

1.0e

+07

1.5e

+07

2.0e

+07

2.5e

+07

88.9

3.5 1.6 1.3 0.9 0.6 0.6 0.5 0.4 0.5 0.2 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1

EM−seq−1

Histogram of % CHH methylation

% methylation per base

Freq

uenc

y

0 20 40 60 80 100

0.0e

+00

5.0e

+06

1.0e

+07

1.5e

+07

2.0e

+07

2.5e

+07 84.8

4.5 2.5 1.8 1.1 0.7 0.8 0.6 0.4 0.7 0.2 0.4 0.2 0.3 0.2 0.1 0.1 0.1 0 0.2

WGBS−1

01234567

1 11 21 31 41

CH

Gs

cove

red

Milli

ons

Minimum Coverage Depth

CHG coverage distribution

WGBS-1WGBS-2EM-seq-1EM-seq-2

05

101520253035

1 21 41

CH

H c

over

edM

illion

s

Minimum Coverage Depth

CHH coverage distribution

WGBS-1WGBS-2EM-seq-1EM-seq-2

www.NEBNext.com

0%

10%

20%

30%

40%

EM-seq WGBS

% 5

mC

in C

pG c

onte

xt

CpG

EM-seq WGBS

0%

2%

4%

6%

8%

10%

EM-seq WGBS

% 5

mC

in C

HN

con

text

CHN

EM-seq WGBS

EM-seq libraries cover more cytosines to greater minimum depths than WGBS. EM-seq identifies more CpGs, CHHs and CHGs, at higher coverage depth compared to WGBS, resulting in more usable information.

Cytosine correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates and methods). 125 million, 2 x75 base reads were used for methylation analysis. Correlations in the CpG, CHG and CHH contexts are shown

Cannabis sativa

Arabidopsis thaliana

Introduction Methods

Methods

CpG

CH

G

CH

H

CpG

CH

G

CH

H

CpG

CH

G

CH

H

C

AFemale Flower-1

Female Flower-2

Female Seeded Flower-1Female Seeded Flower-2

Male Flower-1

Male Flower-2

Female Flower-1

Female Flower-2

Female Seeded Flower-1Female Seeded Flower-2

Male Flower-1

Male Flower-2

012345678

0 20 40 60 80 100

Nor

mal

ized

cov

erag

e

Percent GC

GC-bias plotWGBS-1WGBS-2EM-seq-1EM-seq-2

A

C

C

Conclusion

Histograms showing CpG,CHG and CHH coverage ofEM-seq (A) and WGBS (C)libraries were plotted usingMethylKit. The histogram isshifted right for EM-seqcompared to WGBS indicatingthat more CpGs are detected athigher coverage for EM-seqlibraries compared to WGBS.Percent methylation forlibraries made using EM-seq(B) and WGBS (D) are shownin the CpG, CHG and CHHcontexts. EM-seq and WGBSdata both show that mostcytosines in the CHG and CHHcontexts are not methylated.Methylation is found in the CpGcontext for both EM-seq andWGBS libraries.

A

B

C

D

EM-seq libraries compared to WGBS libraries had: