Lihua Julie Zhu - Bioconductor - Home · Lihua Julie Zhu Integrated Analysis Of ChIP-seq/chip using...
Transcript of Lihua Julie Zhu - Bioconductor - Home · Lihua Julie Zhu Integrated Analysis Of ChIP-seq/chip using...
April 2008
Lihua Julie Zhu
Integrated Analysis Of ChIP-seq/chip using ChIPpeakAnno and GeneNetworkBuilder
Bioconductor Developer Meeting Zurich, Switzerland December 13th -14th 2012
• Introduction of ChIP-seq and ChIP-chip analysis workflow
• ChIPpeakAnno
• GeneNetworkBuilder
• Analysis of DAF-12 ChIP-chip and Expression Dataset
Outline
HIGH-THROUGHPUT IDENTIFICATION OF DNA BINDING SITES
• ChIP-seq – ChIP followed by high-throughput
sequencing
• ChIP-chip – ChIP followed by genome tiling array
analysis
ANALYSIS WORKFLOW
CHIPPEAKANNO
• Batch annotate enriched peaks – ChIP-seq – ChIP-chip – PAS-seq (Poly(A) Site Sequencing) – Cap Analysis of Gene Expression (CAGE) – Any experiments resulting in a large number of
enriched genomic regions
CHIPPEAKANNO • Find the nearest genes for each set of peaks and graph the distribution around features. • Find all genes within a certain distance from the peaks • Identify enriched Gene Ontology (GO) terms and pathways associated with adjacent genes
of the peaks. • Label peaks with any annotation of interest
• a dataset from the literature • CpG island • conserved element • histone modification marks
• Determine the significance of overlap and drawing Venn diagrams to visualize the extent of the overlap
• binding sites among replicates • binding sites among transcription factors within a complex • binding sites among different experiments such as yours and the ones in literature
• Retrieve genomic sequences flanking putative binding sites for motif discovery, cloning or PCR amplification
• Find the peaks with bi-directional promoters with summary statistics • Summarize motif occurrence in peaks
GENENETWORKBUILDER
DAF-12 EXAMPLE DATASET
• ChIP-chip peaks were downloaded from GEO at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28350 (Hochbaum, Zhang et al. 2011, PLoS Genet 7(7): e1002179)
• Expression Microarray results were downloaded from (Fisher and Lithgow 2006, Aging Cell 5(2): 127-138).
OVERLAP ANALYSIS AND DISTRIBUTION OF PEAKS AROUND TSS
Replicate1 Replicate2
Replicate3 156
932
12
323
4
148
1
1424
Distance To Nearest TSS
Freq
uenc
y
−10000 −5000 0 5000 10000
05
1015
2025
DISTRIBUTION OF DAF-12-BINDING SITES
downstream
includeFeature
insideoverlapEnd
overlapStart
upstream
Exon% Intron% 5UTR% 3UTR% Proximal Promoter% Immediate Downstream% Enhancer%
Chromosome Region
% b
indi
ng s
ites
010
2030
4050
6070
DAF-12 REGULATORY NETWORK
C02F5.5
mi r -46C33D9.9
C01B10.4
mi r -52
mi r -53
C17A2.4
mi r -228
F01D5.1
srd-16
mb f - 1
mi r -74
mi r -229Y37D8A.5
lsy-2
mi r -75
C50F4.9
mi r -73
hpd -1
F32A5.4
C08E8.3
K10C2.3
Y69E1A.5
mi r -76
htas-1
Y59E9AL.3
dod-24
R10H10.3
sss-1
mi r -240
K09H11.7
pos-1
mi r -43
F55G11.7T01D3.6
cdh-5
T05E12.6
mi r -788
F57G8.7
C33F10.1
F01D5.3
K03H1.12
scl-27
m i r - 2
mi r -242
bre-1
mi r -63nhr -86
hpo-28
Y106G6A.4
F35H8.4
msp-51
clec-66
mi r -72
F44A2.5
ZK1248.17
msp-76
mir-239.1
mi r -87
B0507.10
T11B7.5
pho-1
clec-52
f l i - 1
T05C3.6
nspd-10
Y54G11A.14F21C10.11
Y49G5A.1
F36D1.4
K10D3.6
C27F2.7
nspd-1
ceh-45
meg-2
col-172
z ip -2
tbc -7
mi r -230
wago-9col -19
m i r - 1clec-4
g rd -5
gst -11
dyc-1
nhr -20
T22B7.7
F35E12.5
mi r -243
mi r -44
m ig -5
mi r -80
nhr -42
h lh-30
mi r -40
C24B9.3
i lys-2
mi r -58Y44A6D.1
Y75B8A.23
gst -27
cyp-14A5
col-20
C53A5.2
mi r -34far -3
msp-38
Y67A6A.1
W10C8.5
nhr -10
a t f -6
W01B6.4
ces-2mi r -237
l i n -4
nh r -1
daf -12
grd-10
C12D5.4
F09F7.4
mi r -84
sre-13lnp -1
msp-56
mi r -261
pho-12 T01B11.2
C36C9.1
mi r -42
msp-142le t -7
mi r -795
mi r -37
ug t -6
F11G11.4
ssq-2msp-63
C27D6.3
mi r -51
clec-76
msp-50
fu t - 1
mi r -39
Y40C5A.1
mi r -355
pd i -2
mi r -38
daf -3
mi r -41
l in -13mi r -241
mi r -35
mi r -36
scd-1
C32H11.4
K12H4.7
SUMMARY
• Analysis of the DAF-12 example dataset shows that enriched GO terms and interaction pathways are consistent with the known functions of DAF-12.
• Network analysis, using GeneNetworkBuilder with ChIP data and expression data, generated a system-level view of the intertwined connections among the direct and indirect targets of DAF-12, which shows that DAF-12 is a master regulator.
ACKNOWLEDGEMENT • Mark Robinson for the invitation
and hospitality
• The Bioconductor package reviewers
– Nishant Gopalak
– Marc Carlson
– Paul Shannon
• Coauthors – Jianhong Ou (Developer of
GeneNetworkBuilder), Claude Gazin, Nathan Lawson, Hervé Pagès, Simon Lin, David Lapointe, Michael Green
• The users of the ChIPpeakAnno
• The Bioconductor core team, esp.,
• Patrick Aboyoun
• Vincent Carey
• Martin Morgan
• Ivan Gregoretti
• Amy Molesworth!• Khademul Islam!• Hua Li!• Noah Dowell
• Yin Wu
• Zhiping Weng, Sara Evans , Alan Ritacco, Glenn Maston, Ping Wan, Ellen Kittler