Practical Guide to the (mod)ENCODE project
description
Transcript of Practical Guide to the (mod)ENCODE project
![Page 1: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/1.jpg)
Practical Guide to the (mod)ENCODE project
February 27 2013
![Page 2: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/2.jpg)
Fundamental Goals
• Improve comprehensiveness and accuracy of gene annotation
• Define novel protein coding and noncoding gene products, including variants
• Define noncoding regulatory elements, including both sequence and epigenetic features
• Begin to measure the extent of tissue-specific deployment of functional elements
![Page 3: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/3.jpg)
Rationale for the Consortium
• Synergistic expertise of large groups• Coordinated sample and data collection
procedures• Systematic data analysis• Rapid release of the data to the public• Common data repository
![Page 4: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/4.jpg)
U. S. National Human Genome Research Institute
History and Relationship of ENCODE Projects
pilot human ENCODE
(1% of genome)
modENCODE(100% of genome)
C. elegans Drosophila
human ENCODE scale-up
(100% of genome)
Henikoff(histone replacement)
Waterston/Celniker(transcribed elements)
Piano/Lai(3’ UTR elements)
Snyder/White(TF binding sites)
Lieb/Karpen(chromatin function)
2003-2007 2007-2012 2007-20??
![Page 5: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/5.jpg)
Model organism advantages…
• Compact, well-annotated “simpler” genome• Functional elements can be identified in vivo• Experimental advantages for both
generating and interpreting genomic data
• Not human• Most studies performed in whole animals
…and disadvantages
![Page 6: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/6.jpg)
Publications of the “half-way point” in Science Dec 2010: 237 C. elegans datasets and >700 Drosophila datasetsVerified data available at http://www.modencode.org
modENCODE
![Page 7: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/7.jpg)
Defining the transcriptome
early embryo
L1L2
L3
L4
adult hermaphrodite
late embryo
L4 male
dauer
Extract total RNA, mRNA, and small RNAs from samples taken at distinct developmental stages and conditions
![Page 8: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/8.jpg)
C. elegans transcriptome features and alternative splicing
M B Gerstein et al. Science 2010;330:1775-1787
stage-specific isoforms
fractional differences in isoform composition for 12,875 genes in pair-wise comparison across seven developmental stages
stage-specificpseudogeneexpression
increase insplice junctionconfirmation
![Page 9: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/9.jpg)
Drosophila coding and noncoding genes and structures
Roy et al. Science 2010;330:1787-1797
combine RNA-seq data with conserved structures
novel miRNA found in protein coding exon
male-specific expression
![Page 10: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/10.jpg)
10
Tagging (worm) vs endogenous (fly) TF-ChIP
Create GFP-tagged transcription factor fosmids by recombineering
Generate transgenic lines by microparticle bombardment
Characterize expression and culture large scale preps
Perform ChIP-seqdefine binding sites and analyze data
Generate antibodies to proteins of interest
Characterize sensitivity and specificity
culture large scale preps
![Page 11: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/11.jpg)
C. elegans Highly Occupied Target (HOT) Regions
M B Gerstein et al. Science 2010;330:1775-1787
22TFs -> 304 HOT regions with 15+ TFs
tend to be at the promoters of broadly expressed genes
![Page 12: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/12.jpg)
Discovery and characterization of chromatin states and their functional enrichments in Drosophila
Roy et al. Science 2010;330:1787-1797
30 discrete ->9 continuouschromatinstates
![Page 13: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/13.jpg)
Statistical models predicting TF-binding and gene expression from chromatin features in C. elegans
M B Gerstein et al. Science 2010;330:1775-1787
color represents accuracy of statistical model in which a chromatin feature(s) acts as a predictor for TF binding/HOT regions
an example
Spearman correlation coefficient of each chromatin feature with expression levels
Chromatin based predictions for expression of both coding genes (top) and miRNAs (bottom)
![Page 14: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/14.jpg)
Predictive models of regulator, region, and gene activity in Drosophila
Roy et al. Science 2010;330:1787-1797 DREM: Dynamic Regulatory Events Miner
predicting target gene expression from regulator expression
predicting cell type specific regulators of chromatin activity
![Page 15: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/15.jpg)
Human (and mouse) ENCODE
PLoS Biol 9:e1001046, 2011
![Page 16: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/16.jpg)
ENCODE methods and organization
PLoS Biol 9:e1001046, 2011
![Page 17: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/17.jpg)
Selected cell lines
PLoS Biol 9:e1001046, 2011
![Page 18: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/18.jpg)
Standardized data collection and processing
• cell growth conditions• antibody characterization• requirements for controls • requirements for replicates• assessment of reproducibility• data submission formats
![Page 19: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/19.jpg)
Caveats• assays on unsynchronized cell populations• several of the cell lines are karyotypically unstable• some Tier 3 lines could be of heterogenous composition• mappability in the human genome is variable and
repetitive sequences (~15% of the genome) are not included currently
• variable confidence regarding assigned function for the different types of elements
• data types lacking focal enrichment (spread over broad regions) could have variation across the enriched domain
![Page 20: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/20.jpg)
Programs utilized for data analysis
PLoS Biol 9:e1001046, 2011
![Page 21: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/21.jpg)
Location of data sources
PLoS Biol 9:e1001046, 2011
![Page 22: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/22.jpg)
Exploring the ENCODE analysis
http://www.nature.com/encode/#/threads
![Page 23: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/23.jpg)
Companion PapersIn the same issue of Nature (6 September 2012):Landscape of transcription in human cells Djebali, S., Davis, C.A. et al.The accessible chromatin landscape of the human genome Thurman, R.E., Rynes, E., Humbert , R. et al. An expansive human regulatory lexicon encoded in transcription factor footprints Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P. et al.Architecture of the human regulatory network derived from ENCODE data Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.K. et al.The long-range interaction landscape of gene promoters Sanyal, A., Lajoie, B.R. et al.
In Genome Biology (6 September 2012):Analysis of variation at transcription factor binding sites in Drosophila and humans Spivakov, M. et al.Genome Biol.Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3 Frietze, S. et al.Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription related factors Yip, K.Y. et al.Functional analysis of transcription factor binding sites in human promoters Whitfield, T.W. et al.Analysis of variation at transcription factor binding sites in Drosophila and humans Spivakov, M. et al.Modeling gene expression using chromatin features in various cellular contexts Dong, X. et al.The GENCODE pseudogene resource Pei, B. et al.
![Page 24: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/24.jpg)
Companion PapersIn Genome Research (6 September 2012):Annotation of functional variation in personal genomes using RegulomeDB. Boyle, A.P. et al.ChIP-seq guidelines and practices used by the ENCODE and modENCODE consortia. Landt, S.G. et al.Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs Tilgner, H. et al.Discovery of hundreds of mirtrons in mouse and human small RNA data Ladewig, E. et al.GENCODE: The reference human genome annotation for the ENCODE project Harrow, J. et al.Linking disease associations with regulatory information in the human genome. Schaub, M.A. et al.Long noncoding RNAs are rarely translated in two human cell lines Bánfai, B. et al.Sequence and chromatin determinants of cell-type–specific transcription factor binding. Arvey, A. et al.Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors Wang, J. et alCombining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome Howald, C. et al.Personal and population genomics of human regulatory variation. Vernot, B. et al.Predicting cell-type–specific gene expression from regions of open chromatin. Natarajan, A. et al.RNA editing in the human ENCODE RNA-seq data Park, E. et al.
![Page 25: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/25.jpg)
GENCODE
• GENCODE is a manual/automated curation of genes• annotation is verified by RT-PCR and RACE experiments• v7: 20,687 protein-coding genes with, on average, 6.3 alternatively
spliced transcripts (3.9 different protein-coding transcripts) per locus
Harrow et al., 2012
Frankish et al., Genome Research 2012
![Page 26: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/26.jpg)
TF mapping by ChIP-seq
across 72 cell lines
data is organized in “Factorbook” www.factorbook.orgEncode Project Consortium, Nature 489: 57-74, 2012
![Page 27: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/27.jpg)
Chromatin accessibility mapping • 2.89 million unique, non-overlapping DNase I
hypersensitive sites (DHSs) by DNase-seq in 125 cell types
• 4.8 million sites across 25 cell types that displayed reduced nucleosomal crosslinking by FAIRE, many of which coincide with DHSs
• DNA methylation by RRBS [average of 1.2 million CpGs in each of 82 cell lines and tissues (8.6% of non-repetitive genomic CpGs), including CpGs in intergenic regions, proximal promoters and intragenic regions (gene bodies)]
Encode Project Consortium, Nature 489: 57-74, 2012
![Page 28: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/28.jpg)
Histone modification mapping
12 histone modifications and variants in 46 cell types, including a complete matrix of eight modifications across tier 1 and tier 2.
![Page 29: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/29.jpg)
Modelling transcription levels from histone modification and transcription-factor-binding patterns
histonemodifications
TFs
Encode Project Consortium, Nature 489: 57-74, 2012
![Page 30: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/30.jpg)
Patterns and asymmetry of chromatin modification at transcription-factor-binding sites
histone modifications show asymmetric patterns across TFBS
Encode Project Consortium, Nature 489: 57-74, 2012
![Page 31: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/31.jpg)
Co-association between transcription factors
Encode Project Consortium, Nature 489: 57-74, 2012
![Page 32: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/32.jpg)
Integration of ENCODE data by genome-wide segmentation
Encode Project Consortium, Nature 489: 57-74, 2012
Label DescriptionCTCF CTCF-enriched element
E Predicted enhancerPF Predicted promoter flanking regionR Predicted repressed or low-activity region
TSS Predicted promoter region including TSST Predicted transcribed region
WE Predicted weak enhancer or open chromatin cis-regulatory element
![Page 33: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/33.jpg)
High-resolution segmentation of ENCODE data by self-organizing maps (SOM)
Encode Project Consortium, Nature 489: 57-74, 2012
![Page 34: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/34.jpg)
Allele-specific ENCODE elements
Encode Project Consortium, Nature 489: 57-74, 2012
single genes
Chrom HMM segments
![Page 35: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/35.jpg)
Examining ENCODE elements on a per individual basis in the normal and cancer genome
![Page 36: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/36.jpg)
Comparison of genome-wide-association-study-identified loci with ENCODE data
![Page 37: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/37.jpg)
UCSC broswer
![Page 38: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/38.jpg)
Browser interface
PLoS Biol 9:e1001046, 2011
http://encodeproject.org
-> Genome Browser link
both hg18 and hg19 genome versions are available and worth viewing – hg18 has the “Integrated Regulation Track” on by default, while hg19 has newer and more datasets
![Page 39: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/39.jpg)
UCSC browser visualization of ENCODE data
novel independent transcript in the first intron of TP53
session includes proteogenomics data in conjunction with ENCODE gene, transcriptome and regulatory data sets
![Page 40: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/40.jpg)
Roadmap Epigenomics Project
next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts
in stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease
rapid release of raw sequence data, profiles of epigenomics features and higher-level integrated maps to the scientific community
development, standardization and dissemination of protocols, reagents and analytical tools to enable the research community to utilize, integrate and expand upon this body of data
![Page 41: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/41.jpg)
Epigenomics Data
www.roadmapepigenomics.org/data
![Page 42: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/42.jpg)
Epigenomics Data
www.roadmapepigenomics.org/data
![Page 43: Practical Guide to the (mod)ENCODE project](https://reader036.fdocuments.net/reader036/viewer/2022062310/56816694550346895dda75ed/html5/thumbnails/43.jpg)
Databases, data visualization, and access
modENCODE: http://www.modencode.orghttp://www.intermine.modencode.orghttp://www.modencode.org/publications/worm_2010pubs/http://www.wormbase.orghttp://www.flybase.org
ENCODE: http://www.encodeproject.orghttp://www.genome.ucsc.edu/ENCODE/http://www.genome.ucsc.edu/ENCODE/downloads.html http://www.factorbook.org
Epigenomics RoadMap: http://nihroadmap.nih.gov/epigenomicshttp://ncbi.nlm.nih.gov/epigenomicshttp://www.epigenomebrowser.orghttp://genomebrowser.wustl.edu/http://epigenomegateway.wustl.edu/