Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq...

37
Integrating ChIP-seq and RNA-seq data Rhonda Bacher April 20, 2017

Transcript of Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq...

Page 1: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

IntegratingChIP-seqandRNA-seqdata

RhondaBacher

April20,2017

Page 2: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Integrationof–omicsdata

• Eachpieceofdatarepresentsasnapshotofthebiologicalsystem.

• Integrationmovestowardsunderstandingthesystemasawhole.

2RitchieMD,Holzinger ER,LiR,PendergrassSA,KimD.Methodsofintegratingdatatouncovergenotype-phenotypeinteractions.NatRevGenet.2015;16:85–97.

Page 3: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Integrationof–omicsdataischallenging

• Needtounderstandcharacteristicsofeachdatatype.

• Incorporatebiologicalinformation.

• Needdata.

3

Page 4: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Datarepositories

• Datarepositoriescontainauthorsdepositeddatausedinpublication,alongwithconsortiumefforts.

• Surgeinresearchgroupsprocessingdatatobe‘analysis-ready’.• Easilyaccessibleandsearchable• Reproducibilityandconsistency

4Collado-TorresL, NelloreA,Kammers K,EllisSE,Taub MA,HansenKD,JaffeAE,LangmeadB,LeekJT. ReproducibleRNA-seqanalysisusing recount2. NatureBiotechnology,2017.doi:10.1038/nbt.3838.

Page 5: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Integrationof–omicsdata

5Qin,Jing,etal."ApplicationsofintegrativeOMICsapproachestogeneregulationstudies." QuantitativeBiology (2016):1-19.

Page 6: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

IntegratingChIP-seqandRNA-seqdata

• Transcriptionfactors(TFs)controlregulationofexpression.• Activationandrepression.

mRNA

DNAGene

mRNA

DNATF

X

GeneTF

mRNA

GeneTF

Page 7: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

IntegratingChIP-seqandRNA-seqdata

• Whystudyregulatorymechanismsoftranscriptionfactorsongeneexpression?

• Diseasessuchascanceranddiabetes:

• MutationsinTFs.

• Possibletodevelopinginterventions.

• Regenerativemedicine:

• Keyfactorscontrollinggeneexpressionduringcelldevelopmentanddifferentiation

7

Page 8: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapRNA-seq

DNAGene

mRNA

Gene GeneDNA

mRNACondition1 Condition2

Gene

Sample1 … Sample nGene1 42 … 6

… … … …Genem 3 … 5

Condition1 Condition2

DNAGene

mRNA

Gene GeneDNA

mRNACondition1 Condition2

Gene(ColinDewey)

(ChristinaKendziorski)

8

Page 9: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapRNA-seq

• UsestatisticalmethodstoidentifygenesthatareDE:

• Dataarecounts.• NegativeBinomialdistribution.

• Withfewreplicates,difficulttoestimatebothmeanandvariance.• Borrowinformationacrossgenestogetmoreaccurateestimatesofpergenevariances.

• Multipletestingsincetestinggenesone-by-one.(MichaelNewton)• Adjustp-values.

9

Page 10: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapRNA-seq

10Soneson,Charlotte,andMauroDelorenzi."AcomparisonofmethodsfordifferentialexpressionanalysisofRNA-seqdata." BMCbioinformatics 14.1(2013):91.

Page 11: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapChIP-seq

DNA

TF

DNATF Gene

GeneDNA

PeakDNA

TF

DNATF Gene

GeneDNA

PeakDNA

TF

DNATF Gene

GeneDNA

Peak

(Sunduz Keles)

11

Page 12: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapChIP-seq

Gene

12

Frommanycells Onlysequencemost5’end

GenePeak

Alignreads/tags

Page 13: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapChIP-seq

• UsestatisticalmethodstoinvestigateDNA-Proteininteractions:

• Identifybindingsites.

• Comparepeakstobackgroundsignal.• Accountforsequencebiases

• Shapeofpeak.

13

Page 14: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

RecapChIP-seq

14Laajala,Teemu D.,etal."ApracticalcomparisonofmethodsfordetectingtranscriptionfactorbindingsitesinChIP-seqexperiments." BMCgenomics10.1(2009):618.

Page 15: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Attie lab

• Studythegeneticsofobesityanddiabetes.

• Diabetes:Poorresponsetoinsulinincombinationwithfailuretomakeenoughinsulin leadstoincreasedlevelsofglucoseinthebody.

• Betacellsmakeinsulinandarelocatedinaclusterofcellsinthepancreas(islets).

15

Page 16: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Attie lab

• NFAT:nuclearfactorofactivatedT cellsisaprotein/transcriptionfactorthatregulatesgeneexpressioninbetacells.

(KarlBroman)

16Keller,MarkP.,etal."TheTranscriptionFactorNfatc2Regulatesβ-CellProliferationandGenesAssociatedwithType2DiabetesinMouseandHumanIslets." PLoS Genetics 12.12(2016):e1006466.

Page 17: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Attie lab

• ExamineeffectofNFATonbetacells:

• RNA-seqexperiment:over-expresstheTFandidentifydifferentiallyexpressedgenesrelativetoacontrolsetofexpressionsamples.

• ChIP-seqexperiment:identifysitesinthegenomewheretheTFbinds.

Sample1 … Sample nGene1 42 … 6

… … … …Genem 3 … 5

NFATover-express Control

17

Page 18: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Analysis:ChIP-seqdata

• IdentifypeaksandTFbindingsites(TFBS).• Annotatepeakstothegenome.• Calculatedistanceofapeaktothenearesttranscriptionstartsite(TSS).• Softwarethatallowsacutoffparameterforannotation:

• ChIPpeakAnno (doesn’tconsiderstrandinformation).• ChIPseeker

18

Page 19: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Analysis

• Overlap:

• ListofDEgenesfromRNA-seq.

• ListoftargetgenesfromChIP-seq. DEgenes

TFtargetgenes

GenesregulatedbyTF

19

Page 20: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Accuracyofdefiningtargetgenes

• TFBSareoftenneargenesinthepromoterregionorslightlyupstream.

• WillvaryacrossspeciesandTFs.

20Yu,Chun-Ping,Jinn-Jy Lin,andWen-Hsiung Li."PositionaldistributionoftranscriptionfactorbindingsitesinArabidopsisthaliana." Scientificreports 6(2016).Koudritsky,Mark,andEytan Domany."Positionaldistributionofhumantranscriptionfactorbindingsites." Nucleicacidsresearch 36.21(2008):6795-6805.

ArabidopsisHuman

Page 21: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Considerations:

• Peaksareoftennotfoundingenesorinpromoters.

• Distanceofnearestgeneisalsounreliable.

21Hua,Sujun,etal."GenomicanalysisofestrogencascaderevealshistonevariantH2A.Zassociatedwithbreastcancerprogression."Molecularsystemsbiology 4.1(2008):188.

Page 22: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Improvetargetgenelist

• Howtorankthepossibletargets?• TIP:probabilisticmethodtoannotatepeaksandrankgenetargets:

• Foreachgeneg,define:

• Interestedin:

• Transformallscoresintoz-scores,assesssignificanceforeachgene.

where

22

Page 23: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Improvetargetgenelist

• TIPimprovesuponclosestgeneapproaches:

• Bindingsitesmayfallingene-richlocations.

• Bindingsitesmayaffectmultiplegenes.

• Learnmorebyincludingexpressiondata:• Effectofbindingontargetgenes.

23

Page 24: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

BETA:TFtargetprediction.

• Geneswithmorenearbybindingsitesandmoredifferentialexpressionaremorelikelytobecalledasrealtargets.

• Calculateeachgene’sregulatorypotential:

whereiisoverallpeakswithin100kbofTSSanddi isthedistancebetweenpeakandtheTSS(relativeto100kb).

• Rgb =ranksofgene’sregulatorypotential(1islargestpotential)

• Rge =rankp-values(adjusted)ofDEgenes(1is‘strongest’DE)

• RPg =(Rgb /n)*(Rgb /n)

• SeparatelyforDEUPandDEDOWNgenes.24

.

Wang,Su,etal."TargetanalysisbyintegrationoftranscriptomeandChIP-seqdatawithBETA." Natureprotocols 8.12(2013):2502-2515.

Page 25: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Additionalanalyses

1. DetermineifTFeffectisactivationorrepression.

2. Motifanalysis.

3. Otherstatisticalanalyses.

25

Page 26: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

BETA:DetermineifTFeffectisactivationorrepression.

• ForallgeneslabelledasDEUP,DEDOWN,orEE,consider:• ValueoftheDEteststatisticforeachgene

and• Calculateeachgene’sregulatorypotential:

whereiisoverallpeakswithin100kbofTSSanddi isthedistancebetweenpeakandtheTSS(relativeto100kb).

26Wang,Su,etal."TargetanalysisbyintegrationoftranscriptomeandChIP-seqdatawithBETA." Natureprotocols 8.12(2013):2502-2515.

Page 27: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

BETA:DetermineifTFeffectisactivationorrepression.

• Sortgene’sbytheirsg andassignranks.

• Useone-tailedK-StesttodetermineifDEUPorDEDOWNissignificantlydifferent.

27Wang,Su,etal."TargetanalysisbyintegrationoftranscriptomeandChIP-seqdatawithBETA." Natureprotocols 8.12(2013):2502-2515.

Page 28: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

BETA:MotifAnalysis

• UsestheMOODSalgorithmtofidmotifsneartargetgenes.

• Calculatethenumberofmotifsnearbindingsiteinthesummitandadjacentsiteandlookforenrichment.

• PerformseparatelyforDEUPandDEDOWNgenes.• Identifydifferentialmotifs.

28Wang,Su,etal."TargetanalysisbyintegrationoftranscriptomeandChIP-seqdatawithBETA." Natureprotocols 8.12(2013):2502-2515.

Page 29: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Statisticalanalysis

• DoDEgenessignificantlyoverlapwithTFtargetgenes?

• Mightrestricttospecificgenesetofinterest.• PerformGSEAonoverlappinggenes.

TFtarget NotTFtarget

DE

NotDE

29

(MichaelNewton)

Page 30: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Statisticalanalysis

• Positionalquestions:

• Arepeaksinpromoterregions?(Introns,Exons,Intergenic,etc.)

• Permutationtest:Sampleasetofregionsrandomly*manytimesandcountXfortherandomset.Calculatetheempiricalp-value.

Peak NotPeak(?)

Promoterregion X

Notpromoterregion

30

Page 31: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

GREAT:GenomicRegionsEnrichmentofAnnotationsTool

• Enrichmenttestsforbindingsitesusingtypicalmethodscanbebiased.• Non-codingelementsdonotnecessarilyassociatewiththenearestgene.

• GREAT:• Functionallyannotatesnon-codingregionsbasedonnearbygenes.• Accountsforthetotalfractionofthegenomeactuallyannotatedforanygivenontologyterm.• Countshowmanyinputgenomicregions(peaks)fallintothoseareas.• Binomialtestoverregions.

31McLean,CoryY.,etal."GREATimprovesfunctionalinterpretationofcis-regulatoryregions." Naturebiotechnology 28.5(2010):495-501.

Page 32: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Biggerpicturebiologicalquestions

• CanTFbindingpredictgeneexpression?• HugerepositoriesofChIP-seqformanyTFs.

• ForanyRNA-seqexperiment:LetYg bethelogexpressionofgeneg,Xg,j issomemeasureofeachTF,j,relativetogeneg.

•Many extensionstothis.

32

Page 33: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Biggerpicturebiologicalquestions

• Regulatorygenenetworks.

->SeeSushmita Roy’sslides!

33

Page 34: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Additionaldataintegration:

• eQTL :howsequencevariantsaffectexpression.

• ATAC-seq:onlyregionsofDNAthatareopencanbeactivelytranscribed.

34

Page 35: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

ATAC-seq

35http://www.abcam.com/epigenetics/epigenetics-application-spotlight-atac-seq

Page 36: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

ATAC-seq

36Ackermann,AmandaM.,etal."IntegrationofATAC-seqandRNA-seqidentifieshumanalphacellandbetacellsignaturegenes."Molecularmetabolism 5.3(2016):233-244.

Page 37: Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

Summary

• Integratingdatatypesisusefulforveryspecificquestions(oneparticularTF)andforbroaderproblems(genenetworks).

• Understandingcharacteristicsofeachdatatypeiscrucial.• Biologicalaspects• Excellentstatisticalmethods

37

Richardson,Sylvia,GeorgeC.Tseng,andWeiSun."Statisticalmethodsinintegrativegenomics." AnnualReviewofStatisticsandItsApplication 3(2016):181-209.