QC & Analysis of Methylation Chip Data
Transcript of QC & Analysis of Methylation Chip Data
QC&AnalysisofMethylationChipData
AllanMcRae&SoniaShah
OutlineforSession3Lecture
• EWASanalysis• Inflationintest-statistics• InterpretingEWASresults• Studydesign• Examples:Smoking,age,BMIandheight,ALS
Epigenome-wideassociationstudies
• IdentifieschangesinmethylationlevelsatsingleCpG sitesthatareassociatedwithhumanphenotype/disease
• Similartoanalysing SNPsinGWAS• AssociationanalysisbetweeneachCpG andphenotypeofinterest(~450,000associationanalyses)
• UnlikeSNPs,DNAmethylationmeasurementsconsideredasquantitativemeasure.• Linearorlogisticregression(forbinarydependentvariables)• Interpretationofeffectdependsonwhethermethylationisyourdependentorindependentvariable
𝐶𝑝𝐺𝑚𝑒𝑡ℎ~𝑠𝑚𝑜𝑘𝑖𝑛𝑔 + 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + 𝑃𝐶𝑠𝑑𝑖𝑠𝑒𝑎𝑠𝑒~𝐶𝑝𝐺𝑚𝑒𝑡ℎ + 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + 𝑃𝐶𝑠
Visualising resultsmanhattan QQ
Inflationinlambda
Iterson etalGenomeBiology2017
ControllinginflationinEWAS
• Simulation study showing that the genomic inflation factor depends on the number of true associations
• genomic inflation factor commonly overestimates the true level of test-statistic inflation in EWAS and TWAS
ControllinginflationinEWAS
• http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1131-9 PublishedJan2017
• EWASsandTWASsarepronenotonlytosignificantinflationbutalsobiasoftheteststatistics
• NotproperlyaddressedbyGWAS-basedmethodology(i.e.genomiccontrol)orapproachestocontrolforunmeasuredconfounding(e.g.RUV,sva andcate).
• MethodtoestimatetheempiricalnulldistributionusingBayesianstatistics.
• http://bioconductor.org/packages/bacon/.
InterpretationofEWASmuchmorecomplicatedthanGWAS
Studydesignveryimportant
AdvantageofGWAS
• Genotypeisconstantfrombirth• Genotypecomesbeforephenotype• noissueofreversecausationi.e.phenotypedoesnotcausechangesingenotype.
• Geneticvariantsassumedtoberandomlyassignedwithrespecttothecharacteristicsofindividual,thereforeminimised confoundingbias
• Ascertainmentbias• Populationstratification(whichcanbecorrectedfor)
Fraga etal PNAS2005
Differences in global 5mC DNA content in monozygotic twins
Methylationisdynamic
Methylationisdynamic
http://ib.bioninja.com.au/_Media/methylation-factors_med.jpeg
Methylationduringdevelopment
Inuteroenvironmentandmethylation
Inuteroenvironmentandmethylation
• Dutchfaminestudy• TheDutchfaminestartedinNovember1944- May1945.• Rationswereaslowas400-800caloriesaday;lessthanaquarteroftherecommendedadultcaloricintake.
• BabieswhosemotherswentthroughtheDutchfamine• lowerbirthweights• increasedriskofcardiovasculardiseasesandotheradversehealthoutcomesinadulthood
Methylationistissueandcell-specific
Moststudiesdoneinbloodduetoeaseofsamplecollection
Methylationistissueandcell-specific
• Anytissuesuitableiftheepigeneticvariationispresentsoma-wide(e.g.ifinducedduringdevelopmentalreprogramminginearlyembryogenesis).
• Ifchangesthatoccurlaterinlife, alternativetissuesourcesneedtobeexplored
• Tissueheterogeneity- tissuesarecomposedofmultiplecelltypes(e.g.bloodcontains>50distinctcelltypes).
• Diseasestateitselfcanalsoaltercellcompositioninatissue(e.g.inflamedtissuevsnon-inflamedtissue)
Methylationcanbecausalorconsequential
• Methylationchangescanbedrivenbydiseasee.g.alterationsinwhitebloodcellproportionsinautoimmunedisordersoralteredmetabolicregulationintype2diabetes
BirneyetalPLOSGenetics2016
ConfoundinginEWAS
• Methylationmaybeaffectedbymanyconfoundingfactors:• Environmentalexposurese.g.smoking• Batcheffects• Ascertainmentbias• Populationstratification
• CouldadjustforPCsgeneratedfromGWASdataifavailableonthesameEWASsamples
• MethodssuchasSVAandPCAcanadjustforknown/unknownconfounders
Geneticvariantsalsoaffectonmethylation
• McRaeetal.2013GenomeBiology• InvestigatetheroleofgeneticheritabilityinthesimilarityofDNAmethylationbetweengenerations
• Familybasedsampleof614individualsfrom117familiesconsistingoftwinpairs,theirparentsandsiblings
• AfterremovingallprobesoverlappingSNPs(1000GEUR)averagegeneticheritabilitywas0.187
• Approximately20%ofindividualdifferencesinDNAmethylationinthepopulationarecausedbyDNAsequencevariationthatisnotlocatedwithinCpG sites
• SNPsassociatedwithmethylationlevelsoftopheritableprobes(mQTLs)
Methylationisdynamic
Studydesign
Rakyan etalNatRevGen2011
Studydesign
• Investigatingcausaleffectofenvironmentalexposureondiseaseoutcome
• 2-stepdesign• EWASofenvironmentalexposureinhealthyindividualstoidentifychangesinmethylationasaconsequenceofexposure
• Lookatwhethertheabovemethylationchangesareassociatedwithdiseaseinanindependentsample.
• Combinestudydesignse.g. adiscordantmonozygotic-twinstagefollowedbyalongitudinalcohortstage.
Studydesign
• Clearlydefinehypothesis• Understandingmechanismofdisease– mediatingcelltypewithhighpurity• Identifybiomarkerofexposureorpredictive/prognosis– useofanaccessiblecelltype/biologicalsample
• Canthestudydesignanswerthishypothesis• Understandanycellheterogeneityinyoursample• Effectsizeshouldbeevaluatedinthecontextoffunctionalandbiologicalrelevance.E.g.isamethylationdifferenceof1%largeenoughtohaveanimpactondisease?
• Integratedatawithgeneticandtranscriptomicdataonsameindividualstodeterminecausality
Validation
• Technicalvalidationusingdifferenttechnology- singlelocus–specificmethylationtechniquessuchasbisulfite(pyro)sequencing
• rulingouttechnicalerrorssuchascross-hybridising probesorunrecognisedSNPs
• BiologicalvalidationofEWASfindings- replicatingstudyresultsincomparablebutindependentsample
Criteriaforidentificationof'driver'methylationchanges
Michels etalNat.Methods2013
SummaryofEWASpublications
Michels etalNat.Methods2013
Example1:Smoking
• Zeilinger etal• 450Karray• Discoverysample: discovery (current N = 262, never N = 749)• Replication (current N = 236, never N = 232)• 972CpG siteswithdifferentialmethylationlevelsafterBonferronicorrection(p≤1E-07)
• 187CpG sitesreplicated
Zeilinger etal.PLOSONE2013
SmokingEWASTop hit cg05575921
Effect in current smokers vs never smokers• Discovery:–24.40%, p = 2.54E-182, explained variance = 41.02%;
• Replication:–23.29%, p = 1.81E-64, explained variance = 39.69%),
located within the AHRR gene (chr5)
cg05575921methylationlevels
Predictionofsmokingstatus
Longitudinalanalysisofsmoking
TwodistinctclassesofCpG sitesidentified:
• siteswhosemethylationrevertstolevelstypicalofneversmokerswithindecadesaftersmokingcessation
• sitesremainingdifferentiallymethylated,evenmorethan35yearsaftersmokingcessation.
Example2:Age
• HorvathGenomeBiology 2013• Identifyage-associatedCpGs inatrainingsetusingapenalizedregressionmodel(elasticnet)
• Identified353CpGs• Predictedageinindependentsamplesandmultipletissues
HorvathGenomeBiology 2013
Example2:Age
PredictionofageusingHorvathCpGs inaChinesecohort
Methylationagecalculator
• https://labs.genetics.ucla.edu/horvath/dnamage/
Example3:BMIandheight
• ShahetalAmericanJournalofHumanGenetics2014• DiscoveryofBMI-associatedCpGs in2independentsamples(LBCandLifelines)
• GenerategeneticriskscoresfromBMIGWASSNPsanddetermineifgeneticriskscoreandmethylationriskscoresareindependentlyassociatedwithBMI
• Repeatforheight.
Methods
• StudyA(populationcohort)– EWASonBMI->significantprobelist A• StudyB(oldindividuals70+)– EWASonBMI->significantprobelist B• CalculatemethylationBMIriskscoreinstudyAbasedonprobelist B• CalculatemethylationBMIriskscoreinstudyBbasedonprobelist A• ProportionofvarianceinBMIexplainedbymethylationscoreineachstudy• GenerategeneticscoresforBMIineachstudyusingSNPsidentifiedfromthelargestBMIGWAS(GIANTconsortium)
• Lookatproportionofvarianceexplainedbygeneticriskscore• AremethylationandgeneticriskscoresindependentlyassociatedwithBMIandheight
Example3:BMIandheight
Example4:C9orf72repeatexpansion
• hexanucleotide repeatexpansionGGGGCC
• 1st Intronregionofc9orf72• Mostcommonmutationidentified
thatisassociatedwithfamilialFTDand/orALS (5–20%ofpatientswithsporadicALS)
• Lengthofrepeatincasescanoccurintheorderof100sandvaries
• <30repeatsgenerallynotassociatedwithdisease
Determiningcausality
Mendelianrandomisation
G must be associated with intermediate phenotype PA
G mustnotbeassociatedwithconfounders.
Gshouldonlyberelatedtotheoutcome PB viaPA
Doesgenotypeaffectphenotypeviachangesinmethylation?
• InstrumentalvariableanalysisorMendelianrandomisation analysis• Step1:IsthereaSNP(notintheprobe)thatisstronglyassociatedwithmethylationlevels(mQTL)
• Step2:CpGmeth ~SNP• Step3:BMI~predictedCpGmeth