DNA Methylation Data Analysis

59
- Academia Sinica LSL NGS Workshop - DNA Methylation Data Analysis Yi-Feng Chang Ph.D. Molecular Medicine Research Center, Chang Gung University [email protected] 03-2118800 #3166 or #3528 2015/11/18 1

Transcript of DNA Methylation Data Analysis

Page 1: DNA Methylation Data Analysis

- AcademiaSinica LSLNGSWorkshop-DNAMethylationDataAnalysis

Yi-FengChang Ph.D.MolecularMedicineResearchCenter,ChangGungUniversity

[email protected]

03-2118800#3166or#3528

2015/11/18

1

Page 2: DNA Methylation Data Analysis

Outlines

• DNAMethylation:FunctionsandDiseases• MethodsofMeasuringDNAMethylationStatus• DNAMethylationDataAnalysis• ACaseStudyofDNAMethylationDataAnalysis• DNAMethylationDataVisualization

2

Page 3: DNA Methylation Data Analysis

http://commonfund.nih.gov/epigenomics/figure.aspx3

Page 4: DNA Methylation Data Analysis

DNAMethylation:FunctionsandDiseases

4Portela,A.&Esteller,M.Epigeneticmodificationsandhumandisease.NatBiotechnol 28,1057-1068,doi:10.1038/nbt.1685(2010).

Page 5: DNA Methylation Data Analysis

DNAEpigeneticModificationsinHumanDiseases

5Portela,A.&Esteller,M.Epigeneticmodificationsandhumandisease.NatBiotechnol 28,1057-1068,doi:10.1038/nbt.1685(2010).

Page 6: DNA Methylation Data Analysis

DNAMethylationPathway

6Moore,L.D.,Le,T.&Fan,G.DNAmethylation anditsbasicfunction.Neuropsychopharmacology 38,23-38(2013).

Page 7: DNA Methylation Data Analysis

DNADemethylation Pathway

7Moore,L.D.,Le,T.&Fan,G.DNAmethylation anditsbasicfunction.Neuropsychopharmacology 38,23-38(2013).

• 5mC:5-Methylcytosine

• 5hmC:5-hydroxymethylcytosine

• 5hmU:5-hydroxymethyluracil

• 5fC:5-formylcytosine

• 5caC:5-carboxycytosine

• Tet:Ten-eleventranslocationenzymes

• AID/APOBEC:activation-inducedcytidine

deaminase/apolipo-proteinBmRNA-

editingenzymecomplex

• TDG:ThymineDNAglycosylase

• SMUG1:Single-strand-selective

monofunctional uracil-DNAglycosylase1

Page 8: DNA Methylation Data Analysis

MethodsofMeasuringDNAMethylationStatus

8

Page 9: DNA Methylation Data Analysis

TimelineofTechnologiesforStudyingDNAMethylation

9

COBRA:CombinedBisulfite Restriction AnalysisAP-PCR:Methylation-SensitiveArbitrarily PrimedPCRAIMS:DNA methylation byamplification ofintermethylated sitesRRBS:Reducedrepresentation bisulfite sequencing

MS-HRM:Methylation-sensitivehighresolution meltingMeDIP-Seq:MethylatedDNAimmunoprecipitation sequencingMethylC-Seq/BS-Seq:Bisulfite sequencingTAB-Seq:Tet-AssistedBs-SeqMAB-Seq:M.SssI methylase-assistedBS-Seq

MS-HRM

MeDIP-SeqBS-Seq

MethylC-SeqTAB-Seq

MAB-Seq

Harrison, A.&Parle-McDermott, A.DNAmethylation:atimeline ofmethodsandapplications.FrontGenet2,74(2011).

2015

Page 10: DNA Methylation Data Analysis

TheStepstoDeterminingtheMethylationStatusofCytosineinaKnownDNASequencebyTheBisulfiteConversionMethod

10Singal,R.&Ginder, G.D.DNAMethylation.BloodJournal 93,4059-4070(1999).

Page 11: DNA Methylation Data Analysis

11

Lister, R.&Ecker,J.R.Findingthefifthbase:genome-widesequencing ofcytosinemethylation.GenomeRes19,959-66(2009).

GenomicDNA

DeepSequencing

Techniques for Genome-Wide Sequencing of Cytosine Methylation Sites

Page 12: DNA Methylation Data Analysis

12

GenomicDNA

DeepSequencing

Techniques for Enrichment of Methylated or Target Regions Prior to BS-Seq

Lister,R.&Ecker,J.R.Findingthefifthbase:genome-widesequencingofcytosinemethylation.GenomeRes19,959-66(2009).

Page 13: DNA Methylation Data Analysis

ApproachesforDetectingActiveDNADemethylationatSingleBaseResolution

13

TAB-Seq: Tet-Assisted Bs-Seq

Yu,M.etal.Tet-assistedbisulfite sequencingof5-hydroxymethylcytosine. NatProtoc 7,2159-70(2012).Yu,M.etal.Base-resolution analysisof5-hydroxymethylcytosine inthemammaliangenome.Cell149,1368-80(2012).

MAB-Seq: M.SssI methylase-assisted BS-Seq

Wu,H.,Wu,X.,Shen,L.&Zhang,Y.Single-baseresolution analysisofactiveDNAdemethylation usingmethylase-assistedbisulfite sequencing.NatBiotechnol 32,1231-40(2014).

Page 14: DNA Methylation Data Analysis

KeyMetricsoftheTechnologyComparison

14Beck,S.Takingthemeasureofthemethylome.Nat Biotechnol 28,1026-8(2010).

Human Methylation 450K contains approximately 480k CpG sites, covering 99% RefSeq genes (hg19) and

96% CpG islands (CGIs).

Page 15: DNA Methylation Data Analysis

GenomicCoverageofMeDIP-seq,MethylCap-seq,RRBSandInfinium

15Bock,C.etal.Quantitativecomparison ofgenome-wideDNAmethylation mappingtechnologies. NatBiotechnol 28,1106-14(2010).

MeDIP-seqandMethylCap-seqprovidebroadcoverageofthegenome,whereasRRBSandInfiniumaremorerestrictedtoCpGislandsandpromoter regions

Page 16: DNA Methylation Data Analysis

CommonBaseResolutionMethylationSequencingPlatforms

16Sun,Z.,Cunningham,J.,Slager,S.&Kocher, J.P.Baseresolutionmethylome profiling:considerationsinplatformselection,datapreprocessingandanalysis.Epigenomics 7,813-828,doi:10.2217/epi.15.21 (2015).

Page 17: DNA Methylation Data Analysis

WGBSCoverageDepthvsReplicates• Usingseveralhigh-coveragereferencedatasetstoexperimentallydetermineminimalsequencingrequirements

17Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).

Page 18: DNA Methylation Data Analysis

WGBSCoverageDepthvsReplicates

• ForDMRidentification• Per-samplecoverageintherangeof5–15×,dependingonthemagnitudeofmethylationdifferences

betweenthegroupsandwhetherasmoothingorsingleCpG-basedDMRidentificationstrategyisused

• Toidentify longDMRswith largemethylationdifferences,wefindthatreducingcoveragedownto1×or2× persampleisacceptable

• Biologicalreplicatesshouldbeanalyzedseparatelytoincreasepower,asopposed tobeingpooledtogetherforanalysis

• Stronglyarguefortheuseofatleasttwoseparatebiologicalreplicates forDMRanalysis• Choosinganappropriatenumberofbiologicalreplicatesisacomplexissueinfluencedbythedegree

ofwithin-groupheterogeneity,themagnitudeofbetween-groupdifferencesandthepresenceofconfoundingfactorssuchasbatcheffects.

18Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).

Page 19: DNA Methylation Data Analysis

DNAMethylationDataAnalysis

19

Page 20: DNA Methylation Data Analysis

EffectandProblemsofBisulfiteTreatmentofDNA

20Krueger, F.,Kreck,B.,Franke,A.&Andrews,S.R.DNAmethylome analysisusingshortbisulfite sequencingdata.NatMethods9,145-51(2012).

Mappingbisulfitereadsto4possiblebisulfitestrands(OT/CTOT/OB/CTOB)isequivalenttomappingthebisulfitereadanditsreversecomplementaryreadtobothTop/Bottomstrandsoftheoriginalreferencesequence.

OT,originaltopstrand;CTOT,strandcomplementarytotheoriginaltopstrand;OB,originalbottomstrand;andCTOB,strandcomplementarytotheoriginalbottomstrand.

Page 21: DNA Methylation Data Analysis

HowtoAlignBSReadsAgainstReferenceGenome?

21Bock,C.Analysing andinterpreting DNAmethylation data.NatRevGenet13,705-19(2012)

TCGA TCGT ACGT ATGATTGT ATGTTCGA ATGA

BS-Seq reads

Page 22: DNA Methylation Data Analysis

ProceduretoPerformThree-LetterAlignment

22Krueger, F.&Andrews,S.R.Bismark:AflexiblealignerandmethylationcallerforBisulfite-Seq applications. Bioinformatics (2011).

Page 23: DNA Methylation Data Analysis

Three-LetterAlignment

23

Multiplehits

Bock,C.Analysing andinterpreting DNAmethylation data.NatRevGenet13,705-19(2012)

Page 24: DNA Methylation Data Analysis

Wild-CardAlignment

24

ConvertC/TtoY

Multiplehits

Bock,C.Analysing andinterpreting DNAmethylation data.NatRevGenet13,705-19(2012)

Page 25: DNA Methylation Data Analysis

Wild-CardAlignmentshaveBetterAccuracybutPoorRunningTime

25

http://smithlabresearch.org/manuals/rmap_manual.pdf

Page 26: DNA Methylation Data Analysis

WorkflowforAnalyzingBS-Seq data

26Krueger, F.,Kreck,B.,Franke,A.&Andrews,S.R.DNAmethylome analysisusingshort bisulfitesequencing data.NatMethods9,145-51(2012).

http://omictools.com/bisulfite-seq/

Page 27: DNA Methylation Data Analysis

ACaseStudyofDNAMethylationDataAnalysis

27

Page 28: DNA Methylation Data Analysis

TurnoffPowerPointSmartQuote

28

Page 29: DNA Methylation Data Analysis

RequiredSoftwareinYourLaptop• MacOSXTerminal

• Applicationà Utilitiesà Terminal (終端機)

• Linuxconsole• Putty:

http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

• SCP/SFTP/FTPclient• Winscp: http://winscp.net/download/winscp556.zip

• PDFviewer• http://get.adobe.com/tw/reader/

• R• https://cran.r-project.org/

29

Page 30: DNA Methylation Data Analysis

RequiredRPackages

• Bioconductor• http://www.bioconductor.org/install/#install-bioconductor-packages

• methylKit:• https://github.com/al2na/methylKit

30

> R# dependencies > install.packages( c("data.table","devtools")) > source("http://bioconductor.org/biocLite.R") > biocLite(c("GenomicRanges","IRanges")) # install the development version from github> library(devtools) > install_github("al2na/methylKit",build_vignettes=FALSE)

Page 31: DNA Methylation Data Analysis

AnalysisPipeline

31Allele-specificMethylatedRegionsamrfinder allelicmeth

DifferentialMethylationRegiondmr

LargeHypo/Hyper-Methylation Domainspmd

Hypo/Hyper-Methylation Regionshmr hyperhmr pmr

MethylationCallingmethcounts

BisulfiteConversionRatebsrate

RemoveDuplicateReadsduplicate-remover

Mappingwalt

QualityTrimmingfastq_masker

Cross-speciesComparisonofMethylomesliftOver

CalculatingMethylationRatioforRegionsbigWigAverageOverBed roimethstat bwtools

GenerateMethylationBEDfileBedtools bedGraphToBigWig

fastx toolkit:http://hannonlab.cshl.edu/fastx_toolkit/MethPipe:http://smithlabresearch.org/software/methpipe/

Bedtools:https://github.com/arq5x/bedtools2ProgramsfromUCSCGenomeBrowser:http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64bwtool:https://github.com/CRG-Barcelona/bwtool/wiki

Sortingmr files

Sortingmr files http://smithlabresearch.org/downloads/methpipe-manual.pdf

Page 32: DNA Methylation Data Analysis

PublicBS-Seq Datasets

32

http://smithlabresearch.org/software/methbase/

OtherspeciesinNCBIGEODatabase• Glycinemax (Soybeans)• Schistocerca gregaria (Locust)• Rattus norvegicus (Rat)• Danio rerio (Zebrafish)• Drosophila melanogaster (Fruitfly)• Oryza sativa (Rice)• Macaca mulatta (Rhesusmonkey)• Mus musculus domesticus (WesternEuropen housemouse)• Xenopus (Silurana)tropicalis (Frog)• Cynoglossus semilaevis (Tonguesole,bonyfish)• Bombyx mori (Silkworm)• Harpegnathos saltator (Jerdon's jumpingant)• Camponotus floridanus (Floridacarpenterant)

Page 33: DNA Methylation Data Analysis

H1(male):humanembryonic stemcells(107GB)IMR90(female):fetallung fibroblasts(154GB)

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16256

33

DatasetsusedinThisCaseStudy

Page 34: DNA Methylation Data Analysis

ConvertSRAtoFASTQ(ExampleONLY)

# sra-toolkit can be download from https://github.com/ncbi/sratoolkit

> fastq-dump --split-3 SRR018975.sra> lsSRR018975.fastq

34

Page 35: DNA Methylation Data Analysis

DEMOFiles> cd /work3/LSLNGSDNAMETH

> ls -alh

total 12G

drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 16 00:29 .

drwxrwxrwt 109 root root 4.0K Nov 15 14:10 ..

-rwxr-xr-x 1 u00gel00 u00ycm02 65K Nov 15 17:22 h1.chrX.hmr

-rwxr-xr-x 1 u00gel00 u00ycm02 4.6G Nov 15 14:51 h1.chrX.mr.dremove

-rwxr-xr-x 1 u00gel00 u00ycm02 9.8K Nov 15 17:22 h1.chrX.pmd

-rwxr-xr-x 1 u00gel00 u00ycm02 34M Nov 15 17:39 h1.chrX_CpG.meth

-rwxr-xr-x 1 u00gel00 u00ycm02 39M Nov 15 23:52 h1.chrX_CpG.meth.for.methylKit

-rwxr-xr-x 1 u00gel00 u00ycm02 161K Nov 15 17:22 h1_gt_imr90.chrX.dmr

-rwxr-xr-x 1 u00gel00 u00ycm02 45M Nov 15 17:22 h1_imr90.chrX.methdiff

-rwxr-xr-x 1 u00gel00 u00ycm02 55K Nov 15 17:22 h1_lt_imr90.chrX.dmr

-rwxr-xr-x 1 u00gel00 u00ycm02 194K Nov 15 17:22 imr90.chrX.hmr

-rwxr-xr-x 1 u00gel00 u00ycm02 7.3G Nov 15 14:52 imr90.chrX.mr.dremove

-rwxr-xr-x 1 u00gel00 u00ycm02 5.6K Nov 15 17:22 imr90.chrX.pmd

-rwxr-xr-x 1 u00gel00 u00ycm02 35M Nov 15 17:39 imr90.chrX_CpG.meth

-rwxr-xr-x 1 u00gel00 u00ycm02 40M Nov 15 23:52 imr90.chrX_CpG.meth.for.methylKit

drwxr-xr-x 6 u00gel00 u00ycm02 4.0K Nov 15 14:28 methpipe-3.3.1

drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 15 14:46 methpipe-data

35

Page 36: DNA Methylation Data Analysis

QualityTrimming andSplitFASTQFilesintoSmallerFiles(ExampleONLY)

#e.g. SRR018975.fastq.gz

> for f in *.gz;

do

b=`basename $f .gz`;

echo $f

bsub -q 4G -o $f.stdout -e $f.stderr "\

gzip -dc $f|\

fastq_masker -q 30 -Q33|\

split -dl 6000000 - $b- ";

done

> ls

SRR018975.fastq-00

SRR018975.fastq-01

SRR018975.fastq-02

… 36

#e.g. SRR018975.fastq.gz

# listing all gzip files one by one

# SRR018975.fastq

#uncompressing gzip file and out to stdout

#masking low quality reads as Ns

#spliting fastq file into smaller ones

Page 37: DNA Methylation Data Analysis

MappingBS-SeqFASTQFiles(ExampleONLY)

> export AdapterTrich=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

> export AdapterArich=CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT

> bsub -q 4G -o rmapbs.stdout -e rmapbs.stderr "\

/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe \

-c /work3/LSLNGSDNAMETH/methpipe-data/data/genome \

-o /work3/USERNAME/Output/test.mr \

-m 3 -L 400 -C $AdapterTrich:$AdapterArich

/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_1.fq \

/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_2.fq"37

>/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-peUsage: rmapbs-pe [OPTIONS] <fastq-reads-file>

Options:-o, -output output file name -c, -chrom chromosomes in FASTA file or dir-T, -start index of first read to map -N, -number number of reads to map -s, -suffix suffix of chrom files (assumes dir provided) -m, -mismatch maximum allowed mismatches -M, -max-map maximum allowed mappings for a read -C, -clip clip the specified adaptor -L, -fraglen max fragment length

-suffix-len Suffix length of reads name -v, -verbose print more run info

Help options:-?, -help print this help message

-about print about message

Page 38: DNA Methylation Data Analysis

ExampleOutputofimr90chrX

38

> head -n 30 /work3/LSLNGSDNAMETH/imr90.chrX.mr.dremove |column

MR Format•RNAME (chromosome name)•SPOS (start position, 0-based)•EPOS (end position, 0-based)•QNAME (read name)•MISMATCH (number of mismatches)•STRAND (forward or reverse strand)•SEQ•QUAL

Page 39: DNA Methylation Data Analysis

RemoveDuplicates(ExampleONLY)

> export PATH=$PATH:/pkg/biology/methpipe/methpipe-3.3.1/bin/

> bsub -q 16G -o stdout -e stderr "\

LC_ALL=C sort -S 14G -k 1,1 -k 2,2n -k 3,3n -k 6,6 \

-o /work3/USERNAME/h1.chrX.mr.sorted_start /work3/LSLNGSDNAMETH/h1.chrX.mr;

duplicate-remover -S /work3/USERNAME/h1.chrX_dremove_stat.txt \

-o /work3/USERNAME/h1.chrX.mr.dremove \

/work3/USERNAME/h1.chrX.mr.sorted_start "

> cat stdout

Successfully completed.

Resource usage summary:

CPU time : 343.80 sec.

Max Processes : 3

Max Threads : 4 39

> cat/work3/USERNAME/h1.chrX_dremove_stat.txt TOTAL READS IN: 24350707GOOD BASES IN: 1987943796TOTAL READS OUT: 22884736GOOD BASES OUT: 1867152730DUPLICATES REMOVED: 1465971READS WITH DUPLICATES: 1219174

Page 40: DNA Methylation Data Analysis

Computingsingle-sitemethylationlevels(ExampleOnly)

# sorting again for methylated CpG analysisbsub -q 16G -o stdout -e stderr "\LC_ALL=C sort -S 14G -k 1,1 -k 3,3n -k 2,2n -k 6,6 \-o /work3/USERNAME/h1.chrX.mr.sorted_end_first \/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"

# methylation callingbsub -q 16G -o stdout -e stderr "\methcounts -c /work3/LSLNGSDNAMETH/hg18 \-o /work3/USERNAME/h1.chrX.meth \/work3/USERNAME/h1.chrX.mr.sorted_end_first"

#extract CpG sitesbsub -q 16G -o stdout -e stderr "\symmetric-cpgs \-o /work3/USERNAME/h1.chrX_CpG.meth h1.chrX.meth"

40

chrX 152 + CpG 0 0chrX 232 + CpG 0 0chrX 330 + CpG 0 0chrX 334 + CpG 0 0chrX 336 + CpG 0 0chrX 364 + CpG 0 0chrX 366 + CpG 0 0chrX 374 + CpG 0 0chrX 376 + CpG 0 0

methratio readcount

Page 41: DNA Methylation Data Analysis

Computationofmethylationlevelstatistics(ExampleONLY)

41

bsub -q 16G -o stdout -e stderr "\levels -o /work3/USERNAME/Output/h1.chrX.levels \/work3/USERNAME/h1.chrX.meth"

Page 42: DNA Methylation Data Analysis

Estimatingbisulfiteconversionrate

> bsub -q 16G -o stdout -e stderr "\bsrate -c /work3/LSLNGSDNAMETH/hg18 \-o /work3/USERNAME/Output/h1.chrX.bsrate \/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"

42

# head –n 16 /work3/USERNAME/Output/h1.chrX.bsrateOVERALL CONVERSION RATE = 0.980192POS CONVERSION RATE = 0.980204 96942555NEG CONVERSION RATE = 0.980179 96821402BASE PTOT PCONV PRATE NTOT NCONV NRATE BTHTOT BTHCONV BTHRATE ERR ALL ERRRATE1 1798190 1762518 0.98016 1796291 1760655 0.98016 3594481 3523173 0.98016 36327 3630808 0.010012 1654252 1617801 0.97797 1649805 1613025 0.97771 3304057 3230826 0.97784 41299 3345356 0.012353 1646403 1615036 0.98095 1644710 1613525 0.98104 3291113 3228561 0.98099 48231 3339344 0.014444 1699787 1666286 0.98029 1695105 1662078 0.98052 3394892 3328364 0.98040 50697 3445589 0.014715 1663363 1631006 0.98055 1658397 1626045 0.98049 3321760 3257051 0.98052 52464 3374224 0.015556 1720978 1687130 0.98033 1716036 1682351 0.98037 3437014 3369481 0.98035 45366 3482380 0.013037 1677561 1644979 0.98058 1677119 1644343 0.98046 3354680 3289322 0.98052 53873 3408553 0.015818 1714426 1681206 0.98062 1714378 1681339 0.98073 3428804 3362545 0.98068 34491 3463295 0.009969 1702891 1668424 0.97976 1700092 1665742 0.97980 3402983 3334166 0.97978 34861 3437844 0.0101410 1681522 1648092 0.98012 1680471 1647068 0.98012 3361993 3295160 0.98012 45776 3407769 0.0134311 1664207 1631036 0.98007 1664386 1631083 0.97999 3328593 3262119 0.98003 46055 3374648 0.0136512 1651326 1618334 0.98002 1649370 1616514 0.98008 3300696 3234848 0.98005 44139 3344835 0.01320

Page 43: DNA Methylation Data Analysis

Hypomethylated (hmr)andhypermethylated(hypermr)> bsub -q 16G -o stdout -e stderr "\

hmr -o /work3/USERNAME/h1.chrX.hmr /work3/USERNAME/h1.chrX_CpG.meth"

> bsub -q 16G -o stdout -e stderr "\

pmd -o /work3/USERNAME/h1.chrX.pmd /work3/USERNAME/h1.chrX_CpG.meth"

43

chrX 2727656 2728600 HYPO0 18 +chrX 2731108 2731952 HYPO1 14 +chrX 2732390 2733303 HYPO2 23 +chrX 2740632 2740962 HYPO3 9 +chrX 2756524 2758153 HYPO4 139 +chrX 2817685 2817980 HYPO5 8 +chrX 2855757 2857708 HYPO6 127 +chrX 2890571 2890884 HYPO7 9 +chrX 3004371 3004626 HYPO8 9 +chrX 3238227 3238677 HYPO9 9 +

#ofCpG

Page 44: DNA Methylation Data Analysis

DifferentialMethylationAnalysis

> bsub -q 16G -o stdout -e stderr "\

methdiff -o /work3/USERNAME/h1_imr90.chrX.methdiff

/work3/LSLNGSDNAMETH/h1.chrX_CpG.meth /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth"

44

chrX 2709681 + CpG 0.749276 7 2 12 7chrX 2709727 + CpG 0.917633 4 1 9 12chrX 2709774 + CpG 0.894737 3 1 6 10chrX 2709871 + CpG 0.742424 0 16 0 48chrX 2709890 + CpG 0.857575 3 20 3 47chrX 2709982 + CpG 0.999354 10 2 7 19chrX 2710014 + CpG 0.704043 3 6 3 10chrX 2710023 + CpG 0.600782 4 3 4 4chrX 2710146 + CpG 0.523077 1 2 8 14chrX 2710155 + CpG 0.234026 3 3 17 9

ProbabilitySampleAUn-meth

SampleAMeth

SampleBUn-meth

SampleBMeth

Page 45: DNA Methylation Data Analysis

Differentialmethylatedregion(DMR)> bsub -q 16G -o stdout -e stderr "

dmr /work3/LSLNGSDNAMETH/h1_imr90.chrX.methdiff

/work3/LSLNGSDNAMETH/h1.chrX.hmr /work3/LSLNGSDNAMETH/imr90.chrX.hmr

h1_lt_imr90.chrX.dmr h1_gt_imr90.chrX.dmr"

45

==> h1_lt_imr90.chrX.dmr <==chrX 2727656 2728600 X:18 10 +chrX 2731108 2731952 X:15 4 +chrX 2732390 2733303 X:37 8 +chrX 2740632 2740962 X:9 0 +chrX 2758131 2758153 X:3 0 +chrX 2817685 2817980 X:9 0 +chrX 2855757 2855890 X:1 1 +chrX 2890571 2890884 X:9 4 +chrX 3004371 3004626 X:9 0 +chrX 3238227 3238677 X:24 0 +

==> h1_gt_imr90.chrX.dmr <==chrX 2825454 2826947 X:37 17 +chrX 2857708 2857760 X:2 0 +chrX 3272822 3273033 X:13 3 +chrX 3275527 3275594 X:1 0 +chrX 3287038 3289160 X:36 9 +chrX 3643168 3643374 X:7 0 +chrX 4016033 4022054 X:47 29 +chrX 4028369 4042000 X:79 54 +chrX 4051286 4059878 X:52 39 +chrX 4079778 4087714 X:45 26 +

Numberof significantdifferentialmethylatedCpG

Meth.levellowerinH1thanIMR90 Meth.levellowerinIMR90thanH1

#ofCpG

> awk -F "[:\t]" ’$5 >= 10 && $6 >= 5 {print $0}’ h1_lt_imr90.chrX.dmr > h1_lt_imr90.chrX.dmr.filtered

Page 46: DNA Methylation Data Analysis

OtherUtilities

• DManalysisoftwogroupsofDNAmethylomes• Robinson,M.D.etal.Statisticalmethodsfordetectingdifferentiallymethylatedlociandregions.Frontiersingenetics5,324,doi:10.3389/fgene.2014.00324(2014).

• Allele-specificmethylation• allelicmeth

• amrfinder:http://smithlabresearch.org/software/amrfinder/

• Estimatehydroxymethylation (5hmC)andmethylation(5mC)levelsfromBS-seq,oxBS-seq andTAB-seq• mlml:http://smithlabresearch.org/software/mlml/

46

Page 47: DNA Methylation Data Analysis

DNAMethylationDataVisualization

47

Page 48: DNA Methylation Data Analysis

RPackages:methylKitThefollowingexampleswereadoptfromthetutorialsofmethylKit

• Akalin,A. etal.methylKit:acomprehensiveRpackagefortheanalysisofgenome-wideDNAmethylationprofiles.GenomeBiol13,R87,doi:10.1186/gb-2012-13-10-r87 (2012).

• Tutorial:http://methylkit.googlecode.com/files/methylKitTutorial_feb2012.pdf

• TutorialSlide:http://methylkit.googlecode.com/files/methylKitTutorialSlides_2013.pdf

48

Page 49: DNA Methylation Data Analysis

ConvertMethPipe mr FormattomethylKitFormat

Id chr base strand coverage freqC freqTChr21.9764539 chr21 9764539 R 12 25.00 75.00 Chr21.9764513 chr21 9764513 R 12 0.00 100.00 Chr21.9820622 chr21 9820622 F 13 0.00 100.00 Chr21.9837545 chr21 9837545 F 11 0.00 100.00 Chr21.9849022 chr21 9849022 F 124 72.58 27.42 Chr21.9853326 chr21 9853326 F 17 70.59 29.41

49

> awk -F $'\t' -v OFS=$'\t’ '$6>0{$5=int($5*100); print $1"."$2, $1, $2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/h1.chrX_CpG.meth > /work3/USERNAME/Output/h1.chrX_CpG.meth.for.methylKit

> awk -F $'\t' -v OFS=$'\t' '$6>0{$5=int($5*100); print $1"."$2, $1, $2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth > /work3/USERNAME/Output/imr90.chrX_CpG.meth.for.methylKit

Page 50: DNA Methylation Data Analysis

ReadMethylationFilesintomethylKit Objects> library(methylKit)

# load methylation files (change to your datasets)> file.list=list(system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") )

# read the files to a methylRawList object: myobj

> myobj=read( file.list, sample.id=list("test1", "test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0))

> head(myobj)

50

Page 51: DNA Methylation Data Analysis

Getdescriptivestatsonmethylation> png("test1.png",width=600,height=600)> getMethylationStats(myobj[[1]],plot=T,both.strands=F)> dev.off()null device 1 > png("control1.png",width=600,height=600)> getMethylationStats(myobj[[3]],plot=T,both.strands=F)> dev.off()null device 1

51

Page 52: DNA Methylation Data Analysis

SampleCorrelation> png("correlation.png",width=1000,height=1000)> getCorrelation(meth, plot = T)test1 test2 ctrl1 ctrl2test1 1.0000000 0.9252530 0.8767865 0.8737509test2 0.9252530 1.0000000 0.8791864 0.8801669ctrl1 0.8767865 0.8791864 1.0000000 0.9465369ctrl2 0.8737509 0.8801669 0.9465369 1.0000000> dev.off()

52

Page 53: DNA Methylation Data Analysis

Getbasescoveredbyallsamplesandclustersamples

# merge all samples to one table by using base-pair locations that are covered in all samples> meth=unite(myobj)

# cluster all samples using correlation distance and plot hierarchical clustering> png("cluster.png", width=600, height=600)> hc = clusterSamples(meth, dist="correlation", method="ward", plot=T)> dev.off()> png("pca.png", width=600,height=600)> PCASamples(meth)> dev.off()

53

Page 54: DNA Methylation Data Analysis

Calculatedifferentialmethylation# calculate differential methylation p-values and q-values

> myDiff=calculateDiffMeth(meth)

# get differentially methylated regions with 25% difference and qvalue < 0.01

> myDiff25p=get.methylDiff(myDiff,difference=25,qvalue=0.01)

# get differentially hypo methylated regions with 25% difference and qvalue<0.01

> myDiff25pHypo =get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")

# get differentially hyper methylated regions with 25% difference and qvalue<0.01

> myDiff25pHyper=get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")

54

Page 55: DNA Methylation Data Analysis

Differentialmethylationeventsperchromosome> png("meth_event.png",width=600,height=600)

> diffMethPerChr(myDiff, plot = T, qvalue.cutoff = 0.01,meth.cutoff = 25)

> dev.off()

55

Page 56: DNA Methylation Data Analysis

AnnotateDifferentiallyMethylatedBases/Regions

#read-intranscriptlocationstobeusedinannotation

>gene.obj=read.transcript.features(system.file("extdata","refseq.hg18.bed.txt",package="methylKit"))

#annotatedifferentiallymethylatedCswithpromoter/exon/intronusingannotationdata

>annotate.WithGenicParts(myDiff25p,gene.obj)

56

Page 57: DNA Methylation Data Analysis

AnnotatingDifferentialMethylationEventsaroundCpG Islands

>cpg.obj =read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt",package="methylKit"),feature.flank.name =c("CpGi","shores"))

>diffCpGann =annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi,cpg.obj$shores,feature.name ="CpGi",flank.name ="shores")

57

Page 58: DNA Methylation Data Analysis

https://www.gitbook.com/book/ycl6/methylation-sequencing-analysis/details

58Dr.I-Hsuan Lin,NYMU

Page 59: DNA Methylation Data Analysis

Questions?

59