239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5...

75
1 Supplementary Information for Ho et al., modENCODE and ENCODE resources for analysis of metazoan chromatin organization Supplementary Content Supplementary Methods --------------------------------------------------------- pp 5-22 Supplementary Figures 1-47 --------------------------------------------------------- pp 23-69 Supplementary Fig. 1 A summary of the full dataset Supplementary Fig. 2 Genomic coverage of various histone modifications in the three species Supplementary Fig. 3 H3K4me1/3 enrichment patterns in regulatory elements defined by DNase I hypersensitive sites (DHS) or CBP-1 binding sites Supplementary Fig. 4 Distribution of H3K27ac enrichment levels at putative enhancers Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes Supplementary Fig. 6 Correlation of enrichment of 82 histone marks or chromosomal proteins at enhancers with STARR-seq defined enhancer strength in fly S2 cells Supplementary Fig. 7 Nucleosome turnover at enhancers Supplementary Fig. 8 Nucleosome occupancy at enhancers Supplementary Fig. 9 Salt extracted fractions of chromatin at enhancers Supplementary Fig. 10 Chromatin environment described by histone modification and binding of chromosomal proteins at enhancers Supplementary Fig. 11 Analysis of p300-based enhancers from human Supplementary Fig. 12 Promoter architecture Supplementary Fig. 13 Relationship between sense-antisense bidirectional transcription

Transcript of 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5...

Page 1: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

1��

Supplementary Information

for

Ho et al., modENCODE and ENCODE resources for analysis of metazoan chromatin organization

Supplementary Content

Supplementary Methods --------------------------------------------------------- pp 5-22

Supplementary Figures 1-47 --------------------------------------------------------- pp 23-69

Supplementary Fig. 1 A summary of the full dataset

Supplementary Fig. 2 Genomic coverage of various histone modifications in the three species

Supplementary Fig. 3 H3K4me1/3 enrichment patterns in regulatory elements defined by DNase I hypersensitive sites (DHS) or CBP-1 binding sites

Supplementary Fig. 4 Distribution of H3K27ac enrichment levels at putative enhancers

Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes

Supplementary Fig. 6 Correlation of enrichment of 82 histone marks or chromosomal proteins at enhancers with STARR-seq defined enhancer strength in fly S2 cells

Supplementary Fig. 7 Nucleosome turnover at enhancers

Supplementary Fig. 8 Nucleosome occupancy at enhancers

Supplementary Fig. 9 Salt extracted fractions of chromatin at enhancers

Supplementary Fig. 10 Chromatin environment described by histone modification and binding of chromosomal proteins at enhancers

Supplementary Fig. 11 Analysis of p300-based enhancers from human

Supplementary Fig. 12 Promoter architecture

Supplementary Fig. 13 Relationship between sense-antisense bidirectional transcription

Page 2: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

2��

and H3K4me3 at TSS

Supplementary Fig. 14 Profiles of the well positioned nucleosome at Transcription Start Sites (TSSs) of protein coding genes

Supplementary Fig. 15 Nucleosome occupancy profile at TSS based on two MNase-seq datasets for each species

Supplementary Fig. 16 Association between repressive chromatin and lamina-associated domains (LADs)

Supplementary Fig. 17 Chromatin context in lamina-associated domains

Supplementary Fig. 18 Chromatin context in short and long lamina-associated domains (LADs) in three organisms

Supplementary Fig. 19 LAD domains are late replicating

Supplementary Fig. 20 DNA shape conservation in nucleosome sequences

Supplementary Fig. 21 DNA shape in nucleosome sequences

Supplementary Fig. 22 Chromatin context of broadly expressed and specifically expressed genes

Supplementary Fig. 23 Structure and expression of broadly and specifically expressed genes

Supplementary Fig. 24 Example genome browser screenshot showing broadly and specifically expressed genes

Supplementary Fig. 25 Genome-wide correlations between histone modifications show intra- and inter- species similarities and differences

Supplementary Fig. 26 A Pearson correlation matrix of histone marks in each cell type or developmental stage

Supplementary Fig. 27 Genome-wide correlation of ChIP-seq datasets for human, fly and worm

Supplementary Fig. 28 Comparison of chromatin state maps generated by hiHMM and Segway

Supplementary Fig. 29 Comparison of chromatin state maps generated by hiHMM and ChromHMM

Supplementary Fig. 30 Comparison of hiHMM-based chromatin state model with species-specific models

Page 3: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

3��

Supplementary Fig. 31 Distribution of genomic features in each hiHMM-based chromatin state

Supplementary Fig. 32 Enrichment of chromosomal proteins in individual chromatin states generated by hiHMM

Supplementary Fig. 33 Enrichment of transcription factor binding sites in individual chromatin states generated by hiHMM

Supplementary Fig. 34 Coverage by hiHMM-based states in mappable regions of individual chromosomes

Supplementary Fig. 35 Heterochromatin domains defined based on H3K9me3-enrichment in worm, fly and human

Supplementary Fig. 36 Borders between pericentric heterochromatin and euchromatin in Fly L3 from this study compared to those based on H3K9me2 ChIP-chip data

Supplementary Fig. 37 Distribution of H3K9me3 in different cells in human chr2 as an example

Supplementary Fig. 38 Evidence that overlapping H3K9me3 and H3K27me3 ChIP signals in worm are not due to antibody cross-reactivity

Supplementary Fig. 39 Gene body plots of several histone modifications for euchromatic and heterochromatic genes

Supplementary Fig. 40 The chromatin state map around three examples of expressed genes in or near heterochromatic regions in human GM12878 cells, fly L3, and worm L3

Supplementary Fig. 41 Relationship between enrichment of H3K27me3 and H3K9me3 in three species

Supplementary Fig. 42 Organization of silent domains

Supplementary Fig. 43 Chromatin context of topological domain boundaries

Supplementary Fig. 44 Classification of topological domains based on chromatin states

Supplementary Fig. 45 Similarity between fly histone modification domains/boundaries and Hi-C

Supplementary Fig. 46 A chromosomal view of the chromatin-based topological domains in worm early embryos at chromosome IV

Supplementary Fig. 47 Chromatin context of chromatin-based topological domain

Page 4: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

4��

boundaries

Supplementary Tables 1-3 -------------------------------------------------------- pp 70-72

Supplementary Table 1 Abbreviation of key cell types and developmental stages described in this study

Supplementary Table 2 List of protein names used in this study

Supplementary Table 3 Overlap of DHS-based and p300-peak-based enhancers in human cell lines

Supplementary References -------------------------------------------------------- pp 73-75

Page 5: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

5��

Supplementary Methods

Preprocessing of ChIP-seq data

Raw sequences were aligned to their respective genomes (hg19 for human; dm3 for fly; and

ce10 for worm) using bowtie22 or BWA23following standard preprocessing and quality

assessment procedures of ENCODE and modENCODE4. Validation results of the antibodies

used in all ChIP experiments are available at the Antibody Validation Database24

(http://compbio.med.harvard.edu/antibodies/). Most of the ChIP-seq datasets were generated

by 36 bp (in human and worm), 42 bp (in worm), or 50 bp (in fly and worm) single-end

sequencing using the Illumina HiSeq platform, with an average of ~20 million reads per

sample replicate (at least two replicates for each sample). Quality of the ChIP-seq data was

examined as follows. For all three organisms, cross-correlation analysis was performed, as

described in the published modENCODE and ENCODE guidelines4. This analysis examines

ChIP efficiency and signal-to-noise ratio, as well as verifying the size distribution of ChIP

fragments. The results of this cross-correlation analysis for the more than 3000

modENCODE and ENCODE ChIP-seq data sets are described elsewhere25. In addition, to

ensure consistency between replicates in the fly data, we further required at least 80%

overlap of the top 40% of peaks in the two replicates (overlap is determined by number of bp

for broad peaks, or by number of peaks for sharp peaks; peaks as determined by SPP26 etc).

Library complexity was checked for human. For worm, genome-wide correlation of fold

enrichment values was computed for replicates and a minimum threshold of 0.4 was required.

In all organisms, those replicate sets that do not meet these criteria were examined by manual

inspection of browser profiles to ascertain the reasons for low quality and, whenever

possible, experiments were repeated until sufficient quality and consistent were obtained. To

enable the cross-species comparisons described in this paper, we have reprocessed all data

using MACS27. (Due to the slight differences in the peak-calling and input normalization

steps, there may be slight discrepancies between the fly profiles analyzed here (available at

http://encode-x.med.harvard.edu/data_sets/chromatin/) and those available at the data

coordination center: http://intermine.modencode.org/). For every pair of aligned ChIP and

matching input-DNA data, we used MACS28 version 2 to generate fold enrichment signal

tracks for every position in a genome:

Page 6: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

6��

macs2 callpeak -t ChIP.bam -c Input.bam -B --nomodel --shiftsize 73 --SPMR -g hs -n ChIP

macs2 bdgcmp -t ChIP_treat_pileup.bdg -c ChIP_control_lambda.bdg -o ChIP_FE.bedgraph -m FE

Depending on analysis, we applied either log transformation or z-score transformation.

Preprocessing of ChIP-chip data

For the fly data, genomic DNA Tiling Arrays v2.0 (Affymetrix) were used to hybridize ChIP

and input DNA. We obtained the log-intensity ratio values (M-values) for all perfect match

(PM) probes: M = log2(ChIP intensity) - log2(input intensity), and performed a whole-

genome baseline shift so that the mean of M in each microarray is equal to 0. The smoothed

log intensity ratios were calculated using LOWESS with a smoothing span corresponding to

500 bp, combining normalized data from two replicate experiments. For the worm data, a

custom Nimblegen two-channel whole genome microarray platform was used to hybridize

both ChIP and input DNA. MA2C29 was used to preprocess the data to obtain a normalized

and median centered log2 ratio for each probe. All data are publicly accessible through

modMine (http://www.modencode.org/).

Preprocessing of GRO-seq data

Raw sequences of the fly S2 and human IMR90 datasets were downloaded from NCBI Gene

Expression Omnibus (GEO) using accession numbers GSE2588730 and GSE1351831

respectively. The sequences were then aligned to the respective genome assembly (dm3 for

fly and hg19 for human) using bowtie22. After checking for consistency based on correlation

analysis and browser inspection, we merged the reads of the biological replicates before

proceeding with downstream analyses. Treating the reads mapping to the positive and

negative strands separately, we calculated minimally-smoothed signals (by a Gaussian kernel

smoother with bandwidth of 10 bp in fly and 50 bp in human) along the genome in 10 bp

(fly) or 50 bp (human) non-overlapping bins.

Preprocessing of DNase-seq data

Aligned DNase-seq data were downloaded from modMine (http://www.modencode.org/) and

the ENCODE UCSC download page (http://encodeproject.org/ENCODE/). Additional

Page 7: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

7��

Drosophila embryo DNase-seq data were downloaded from32. After confirming consistency,

reads from biological replicates were merged. We calculated minimally-smoothed signals (by

a Gaussian kernel smoother with bandwidth of 10 bp in fly and 50 bp in human) along the

genome in 10 bp (fly) or 50 bp (human) non-overlapping bins.

Preprocessing of MNase-seq data

The MNase-seq data were analyzed as described previously33. In brief, tags were mapped to

the corresponding reference genome assemblies. The positions at which the number of

mapped tags had a Z-score > 7 were considered anomalous due to potential amplification

bias. The tags mapped to such positions were discarded. To compute profiles of nucleosomal

frequency around TSS, the centers of the fragments were used in the case of paired-end data.

In the case of single-end data, tag positions were shifted by the half of the estimated fragment

size (estimated using cross-correlation analysis34 toward the fragment 3’-ends and tags

mapping to positive and negative DNA strands were combined). Loess smoothing in the 11-

bp window, which does not affect positions of the major minima and maxima on the plots,

was applied to reduce the high-frequency noise in the profiles.

GC-content and PhastCons conservation score

We downloaded the 5bp GC% data from the UCSC genome browser annotation download

page (http://hgdownload.cse.ucsc.edu/downloads.html) for human (hg19), fly (dm3), and

worm (ce10). Centering at every 5 bp bin, we calculated the running median of the GC% of

the surrounding 100 bp (i.e., 105 bp in total).

PhastCons conservation score was obtained from the UCSC genome browser annotation

download page. Specifically, we used the following score for each species.

Target species phasetCons scores generated by multiple alignments with

URL

C. elegans (ce10) 6 Caenorhabditis nematode genomes

http://hgdownload.cse.ucsc.edu/goldenPath/ce10/phastCons7way/

D. melanogaster (dm3) 15 Drosophila and related fly genomes

http://hgdownload.cse.ucsc.edu/goldenPath/dm3/phastCons15way/

H. Sapiens (hg19) 45 vertebrate genomes http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vertebrate/

Page 8: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

8��

Both GC and phastCons scores were then binned into 10 bp (fly and worm) or 50 bp (human)

non-overlapping bins.

Genomic sequence mappability tracks

We generated empirical genomic sequence mappability tracks using input-DNA sequencing

data. After merging input reads up to 100M, reads were extended to 149 bp which

corresponds to the shift of 74 bp in signal tracks. The union set of empirically mapped

regions was obtained. They are available at the ENCODE-X Browser (http://encode-

x.med.harvard.edu/data_sets/chromatin/).

Coordinates of unassembled genomic sequences

We downloaded the “Gap” table from UCSC genome browser download page

(http://hgdownload.cse.ucsc.edu/downloads.html). The human genome contains 234 Mb of

unassembled regions whereas fly contains 6.3 Mb of unassembled genome. There are no

known unassembled (i.e., gap) regions in worm.

Gene annotation

We used human GENCODE version 10 (http://www.gencodegenes.org/releases/10.html) for

human gene annotation35. For worm and fly, we used custom RNA-seq-based gene and

transcript annotations generated by the modENCODE consortium (see Gerstein et al., The

Comparative ENCODE RNA Resource Reveals Conserved Principles of Transcription).

Worm TSS definition based on capRNA-seq (capTSS)

We obtained worm TSS definition based on capRNA-seq from Chen et al.36. Briefly, short 5'-

capped RNA from total nuclear RNA of mixed stage embryos were sequenced (i.e., capRNA-

seq) by Illumina GAIIA (SE36) with two biological replicates. Reads from capRNA-seq

were mapped to WS220 reference genome using BWA23. Transcription initiation regions

(TICs) were identified by clustering of capRNA-seq reads. In this analysis we used TICs that

overlap with wormbase TSSs within -199:100bp. We refer these capRNA-seq defined TSSs

as capTSS in this study.

Page 9: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

9��

Gene expression data

Gene expression level estimates of various cell-lines, embryos or tissues were obtained from

the modENCODE and ENCODE projects (see Gerstein et al., The Comparative ENCODE

RNA Resource Reveals Conserved Principles of Transcription). The expression of each gene

is quantified in terms of RPKM (reads per million reads per kilobase). The distribution of

gene expression in each cell line was assessed and a cut-off of RPKM=1 was determined to

be generally a good threshold to separate active vs. inactive genes. This definition of active

and inactive genes was used in the construction of meta-gene profiles.

Genomic coverage of histone modifications

To identify the significantly enriched regions, we used SPP R package (ver.1.10)26. The 5'end

coordinate of every sequence read was shifted by half of the estimated average size of the

fragments, as estimated by the cross-correlation profile analysis. The significance of

enrichment was computed using a Poisson model with a 1 kb window. A position was

considered significantly enriched if the number of IP read counts was significantly higher (Z-

score > 3 for fly and worm, 2.5 for human) than the number of input read counts, after

adjusting for the library sizes of IP and input, using SPP function

get.broad.enrichment.cluster.

Genome coverage in each genome is then calculated as the total number of base pair covered

by the enriched regions or one or more histone marks. It should be noted that genomic

coverage reported in Supplementary Fig. 2 refers to percentage of histone mark coverage

with respective to mappable region. A large portion (~20%) of human genome is not

mappable based on our empirical criteria. These unmappable regions largely consist of

unassembled regions, due to difficulties such as mapping of repeats. Furthermore, some

unmappable regions may be a result of the relatively smaller sequencing depth compared to

fly and worm samples. Therefore it is expected that empirically determined mappability is

smaller in human compared to fly and worm.

Identification and analysis of enhancers

Page 10: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

10��

We used a supervised machine learning approach to identify putative enhancers among

DNaseI hypersensitive sites (DHSs) and p300 or CBP-1 binding sites, hereafter referred

collectively as “regulatory sites”. The basic idea is to train a supervised classifier to identify

H3K4me1/3 enrichment patterns that distinguish TSS distal regulatory sites (i.e., candidate

enhancers) from proximal regulatory sites (i.e., candidate promoters). TSS-distal sites that

carry these patterns are classified as putative enhancers.

Human DHS and p300 binding site coordinates were downloaded from the ENCODE UCSC

download page (http://genome.ucsc.edu/ENCODE/downloads.html). When available, only

peaks identified in both replicates were retained. DHSs and p300 peaks that were wider than

1 kb were removed. DHS positions in fly cell lines were defined as the 'high-magnitude'

positions in DNase I hypersensitivity identified by Kharchenko et al10. We applied the same

method to identify similar positions in DNase-seq data in fly embryonic stage 14 (ES14)32,

which roughly corresponds to LE stage. Worm MXEMB CBP-1 peaks were determined by

SPP with default parameters. CBP-1 peaks that were identified within broad enrichment

regions wider than 1 kb were removed. For fly and human cell lines, DHS and p300 data

from matching cell types were used. For fly late embryos (14-16 h), the DHS data from

embryonic stage 14 (10:20–11:20 h) were used. For worm EE and L3, CBP-1 data from

mixed-embryos were used.

To define the TSS-proximal and TSS-distal sites, inclusive TSS lists were obtained by

merging ensemble v66 TSSs with GENCODE version 10 for human, and modENCODE

transcript annotations for fly and worm, including all alternate sites. Different machine

learning algorithms were trained to classify genomic positions as a TSS-distal regulatory site,

TSS-proximal regulatory site or neither, based on a pool of TSS-distal (>1 kb) and TSS–

proximal (<250 bp) regulatory sites and a random set of positions from other places in the

genome. The random set included twice as many positions as the TSS-distal site set for each

cell type. Five features from each of the two marks, H3K4me1 and H3K4me3, were used for

the classification: maximum fold-enrichment within +/-500 bp, and four average fold

enrichment values in 250 bp bins within +/-500 bp. The pool of positions was split into two

equal test and training sets. The performance of different classifier algorithms was compared

using the area under Receiver Operator Characteristics (ROC) curves. For human and fly

Page 11: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

11��

samples, the best performance was obtained using the Model-based boosting (mboost)

algorithm37, whereas for the worm data sets, the Support Vector Machine (SVM) algorithm

showed superior performance. TSS-distal sites that in turn get classified as “TSS-distal”

make up our enhancer set. In worm, the learned model was used to classify sites within 500-

1000 bp from the closest TSS, and those classified as TSS-distal were included in the final

enhancer set to increase the number of identified sites. Our sets of putative enhancers

(hereafter referred to as ‘enhancers’) include roughly 2000 sites in fly cell lines and fly

embryos, 400 sites in worm embryos, and 50,000 sites in human cell lines.

It should be noted that while enhancers identified at DHSs (in human and fly) or CBP-1

binding sites (in worm) may represent different classes of enhancers, for the purpose of

studying the major characteristics of enhancers, both definitions are a reasonable proxy for

identifying enhancer-like regions. We repeated all human enhancer analysis with p300 sites

(worm CBP-1 is an ortholog of p300 in human). Half of the p300-based enhancers overlap

with DHS-based enhancers (Supplementary Table 3). In addition, all the observed patterns

were consistent with the enhancers identified using DHSs (Supplementary Fig. 3), including

the association of enhancer H3K27ac levels with gene expression (Supplementary Figs. 4-6),

patterns of nucleosome turnover (Supplementary Figs. 7-9) and histone modifications and

chromosomal proteins (Supplementary Fig. 10). For validation of results based on DHS-

based enhancers were validated by analyzing p300-based enhancers (Supplementary Fig. 11).

For Supplementary Fig. 3-6, the enrichment level of a histone mark around a site (DHS or

CBP-1 enhancer) is calculated based on the maximum ChIP fold enrichment within +/- 500

bp region of the site. These values are also used to stratify enhancers based on the H3K27ac

enrichment level. For Supplementary Fig. 10, we extracted histone modification signal +/- 2

kb around each enhancer site in 50 bp bins. ChIP fold enrichment is then averaged across all

the enhancer sites in that category (high or low H3K27ac). These average signals across the

entire sample (e.g., human GM12878) are then subjected to Z-score transformation (mean =

0, standard deviation = 1). All z-scores above 4 or below -4 are set to 4 and -4 respectively.

In terms of analysis of average expression of genes that are proximal to a set of enhancers

(Supplementary Fig. 5), we identify genes that are located within 5, 10, 25, 40, 50, 75, 125,

Page 12: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

12��

150, 175 and 200 kb away from the center of an enhancer in both directions, and take an

average of the expression levels of all of the genes within this region.

Analysis of HiC-defined topological domains

We used the genomic coordinates of the topological domains defined in the original

publication on fly late embryos19, and human embryonic stem cell lines18. The human

coordinates were originally in hg18. We used UCSC's liftOver tool

(http://genome.ucsc.edu/cgi-bin/hgLiftOver) to convert the coordinates to hg19.

Analysis of chromatin states near topological domain boundaries

For each chromatin state, the number of domain boundaries where the given state is at a

given distance to the boundary is counted. The random expected value of counts is calculated

as the number of all domain boundaries times the normalized genomic coverage of the

chromatin state. The ratio of observed to expected counts is presented as a function of the

distance to domain boundaries.

Analysis of chromatin states within topological domains

In supplementary Fig. 44, the interior of topological domains is defined by removing 4 kb

and 40 kb from the edges of each topological domain for fly and human Hi-C defined

domains respectively. To access the chromatin state composition of each topological domain,

the coverage of the domain interior by each chromatin state is calculated in bps and

normalized to the domain size, yielding a measure between 0 and 1. Then the matrix of

values corresponding to chromatin states in one dimension and topological domains in the

second dimension is used to cluster the chromatin states hierarchically. Pearson correlation

coefficients (1-r) between domain coverage values of different chromatin states are taken as

the distance metric for the clustering. The clustering tree is cut as to obtain a small number of

meaningful groups of highly juxtaposed chromatin states. The coverage of each chromatin

state group is calculated by summing the coverage of states in the group. Each topological

domain is assigned to the chromatin state group with maximum coverage in the domain

interior.

Page 13: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

13��

Definition of lamina associated domains (LADs)

Genomic coordinates of LADs were directly obtained from their original publications, for

worm38, fly39 and human40. We converted the genomic coordinates of LADs to ce10 (for

worm), dm3 (for fly) and hg19 (for human) using UCSC's liftOver tool with default

parameters (http://genome.ucsc.edu/cgi-bin/hgLiftOver). For Supplementary Fig. 17b, the

raw fly DamID ChIP values were used after converting the probe coordinates to dm3.

LAD chromatin context analysis

In Supplementary Fig. 16, scaled LAD plot, long and short LADs were defined by top 20%

and bottom 20% of LAD sizes, respectively. For a fair comparison between human and worm

LADs in the figure, a subset of human LADs (chromosomes 1 to 4, N = 391) was used, while

for worm LADs from all chromosomes (N= 360) were used. 10 kb (human) or 2.5 kb (worm)

upstream and downstream of LAD start sites and LAD ending sites are not scaled. Inside of

LADs is scaled to 60 kb (human) or 15 kb (worm). Overlapping regions with adjacent LADs

are removed.

To correlate H3K9me3, H3K27me3 and EZH2/EZ with LADs, the average profiles were

obtained at the boundaries of LADs with a window size of 120 kb for human, 40 kb for fly

and 10 kb for worm. The results at the right side of domain boundaries were flipped for

Supplementary Fig. 17a.

LAD Replication Timing analysis

The repli-seq BAM alignment files for the IMR90 and BJ human cell lines were downloaded

from the UCSC ENCODE website. Early and late RPKM signal was determined for non-

overlapping 50 kb bins across the human genome, discarding bins with low mappability (i.e.,

bins containing less than 50% uniquely mappable positions). To better match the fly repli-seq

data, the RPKM signal from the two early fractions (G1b and S1) and two late fractions (S4

and G2) were each averaged together. The fly Kc cell line replication-seq data was obtained

from GEO. Reads were pooled together from two biological replicates (S1: GSM1015342

and GSM1015346; S4: GSM1015345 and GSM1015349), and aligned to the Drosophila

melanogaster dm3 genome using Bowtie22. Early and late RPKM values were then calculated

Page 14: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

14��

for each non-overlapping 10 kb bin, discarding low mappability bins as described above. To

make RPKM values comparable between both species, the fly RPKM values were

normalized to the human genome size. All replication timing bins within a LAD domain were

included in the analysis. An equivalent number of random bins were then selected, preserving

the observed LAD domain chromosomal distribution.

CellType Phase Link

IMR90 G1b http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqImr90G1bAlnRep1.bam

IMR90 S1 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqImr90S1AlnRep1.bam

IMR90 S4 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqImr90S4AlnRep1.bam

IMR90 G2 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqImr90G2AlnRep1.bam

BJ G1b http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBjG1bAlnRep2.bam

BJ S1 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBjS1AlnRep2.bam

BJ S4 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBjS4AlnRep2.bam

BJ G2 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBjG2AlnRep2.bam

Analysis of DNA structure and nucleosome positioning

The ORChID2 algorithm was used to predict DNA shape and generate consensus profiles for

paired-end MNase-seq fragments of size 146-148 bp as previously described41. Only 146-148

bp sequences were used in this analysis to minimize possible effect of over- and under-

digestion in the MNase treatment. The ORChID2 algorithm provides a more general

approach than often-used investigation of mono- or dinucleotide occurrences along

nucleosomal DNA since it can capture even degenerate sequence signatures if they have

pronounced structural features.

For individual sequence analyses, we used the consensus profile generated above and

trimmed three bases from each end to eliminate edge effects of the prediction algorithm, and

then scanned this consensus against each sequence of length 146-148 bp. We retained the

Page 15: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

15��

maximum correlation value between the consensus and individual sequence, and compared

this to shuffled versions of each sequence (Supplementary Figs. 20-21). To estimate the

sequence effect on nucleosome positioning we calculated the area between the solid lines and

normalized by the area between the dashed lines (Supplementary Fig. 21a; upper panel) and

reported this result in Supplementary Fig. 20b.

Construction of meta-gene profiles

We defined transcription start site (TSS) and transcription end site (TES) as the 5' most and 3'

most position of a gene, respectively, based on the modENCODE/ENCODE transcription

group’s gene annotation (see Gerstein et al., The Comparative ENCODE RNA Resource

Reveals Conserved Principles of Transcription). To exclude short genes from this analysis,

we only included genes with a minimum length of 1 kb (worm and fly) or 10 kb (human). To

further alleviate confounding signals from nearby genes, we also excluded genes which have

any neighboring genes within 1 kb upstream of its TSS or 1 kb downstream of its TES. The

ChIP enrichment in the 1 kb region upstream of TSS or downstream of TES, as well as 500

bp downstream of TSS or upstream of TES, were not scaled. The ChIP-enrichment within the

remaining gene body was scaled to 2 kb. The average ChIP fold enrichment signals were then

plotted as a heat map or a line plot.

Analysis of broadly and specifically expressed genes

For each species, we obtained RNA-seq based gene expression estimates (in RPKM) of

multiple cell lines or developmental stages from the modENCODE/ENCODE transcription

groups (see Gerstein et al., The Comparative ENCODE RNA Resource Reveals Conserved

Principles of Transcription). Gene expression variability score of each gene was defined to be

the ratio of standard deviation and mean of expression across multiple samples. For each

species, we divide the genes into four quartiles based on this gene expression variability

score. Genes within the lowest quartile of variability score with RPKM value greater than 1 is

defined as "broadly expressed". Similarly, RPKM>1 genes within the highest quartile of

variability score is defined as "specifically expressed". We further restricted our analysis to

protein-coding genes that are between 1 and 10 kb (in worm and fly) or between 1 and 40 kb

(in human) in length.

Page 16: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

16��

For ChIP-chip analysis of BG3, S2 and Kc cells, ChIP signal enrichment for each gene was

calculated by averaging the smoothed log intensity ratios from probes that fall in the gene

body. For all other cell types, ChIP-seq read coordinates were adjusted by shifting 73 bp

along the read and the total number of ChIP and input fragments that fall in the gene body

were counted. Genes with low sequencing depth (as determined by having less than 4 input

tags in the gene body) were discarded from the analysis. ChIP signal enrichment is obtained

by dividing (library normalized) ChIP read counts to Input read count. The same procedure

was applied to calculate enrichment near TSS of genes, by averaging signals from probes

within 500 bp of TSSs for BG3 cells and using read counts within 500 bp of TSSs for ChIP-

seq data.

Genome-wide correlation between histone modifications

In Supplementary Figs. 25-26, eight histone modifications commonly profiled in human (H1-

hESC,GM12878 and K562), fly (LE, L3 and AH), and worm (EE and L3), were used for

pairwise genome-wide correlation at 5 kb bin resolution. Unmappable regions and regions

that have fold enrichment values less than 1 for all 8 marks (low signal regions) were

excluded from the analysis. To obtain a representative correlation value for each species, an

average Pearson correlation coefficient for each pair of marks was computed over the

different cell types and developmental stages of each species. The overall correlation (upper

triangle of Supplementary Fig. 25) was computed by averaging the three single-species

correlation coefficients. Intra-species variance was computed as the average within-species

variance of correlation coefficients. Inter-species variance was computed as the variance of

the within-species average correlation coefficients. For the large correlation heatmaps in

Supplementary Fig. 27, 10 kb (worm and fly) or 30 kb (human) bins were used with no

filtering of low-signal regions.

Chromatin segmentation using hiHMM

We performed joint chromatin state segmentation of multiple species using a hierarchically

linked infinite hidden Markov model (hiHMM). In a traditional HMM that relies on a fixed

number of hidden states, it is not straightforward to determine the optimal number of hidden

states. In contrast, a non-parametric Bayesian approach of an infinite HMM (iHMM) can

Page 17: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

17��

handle an unbounded number of hidden states in a systematic way so that the number of

states can be learned from the training data rather than be pre-specified by the user42. For

joint analysis of multi-species data, the hiHMM model employs multiple, hierarchically

linked, iHMMs over the same set of hidden states across multiple species - one iHMM per

species. More specifically, within a hiHMM, each iHMM has its own species-specific

parameters for both transition matrix ���� and emission probabilities for c={human, fly,

worm}. Emission process was modeled as a multivariate Gaussian with a diagonal

covariance matrix such that where represents m-

dimensional vector for observed data from m chromatin marks of species c at genomic

location t, and represents the corresponding hidden state at t. The parameters

correspond to the mean signal values from state k in species c, and is the species-

specific covariance matrix. To take into account the different self-transition probabilities in

different species, we also incorporate an explicit parameter that controls the self-

transition probability. In the resulting transition model, we have

. Each row of the transition matrix

across all the species follows the same prior distribution of the so-called Dirichlet process

that allows the state space to be shared across species. Using this scheme, data from multiple

species are weakly coupled only by a prior. Therefore hiHMM can capture the shared

characteristics of multiple species data while still allowing unique features for each

species. This hierarchically linked HMM has been first applied to the problem of local

genetic ancestry from haplotype data42 in which the same modeling scheme for the transition

process but a different emission process has been adopted to deal with the SNP haplotype

data.

This hierarchical approach is substantially different from the plain HMM that treats multi-

species data as different samples from a homogeneous population. For example, different

species data have different gene length and genome composition, so one transition event

along a chromosome of one species does not equally correspond to one transition in another

species. So if a model has just one set of transition probabilities for all species, it cannot

)(c�

),(~| )()()()( cck

ct

ct Nksy �� � )(c

ty

)(cts )(c

k�)(c�

)(0cp

)()(0

)(0

)(1

)( )1()()|( cjk

ccct

ct pjkpjsksp �� ������ �

)( c�

Page 18: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

18��

reflect such difference in self-transition or between-state transition probabilities. Our model

hiHMM can naturally handle this by assuming species-specific transition matrices. Note that

since the state space is shared across all the populations, it is easy to interpret the recovered

chromatin states.

Since hiHMM is a non-parametric Bayesian approach, we need Markov chain Monte-Carlo

(MCMC) sampling steps to train a model. Instead of Gibbs sampling, we adopted a dynamic

programming scheme called Beam sampling43, which significantly improves the mixing and

convergence rate. Although it still requires longer computation time than parametric methods

like a finite-state HMM, this training can be done once offline and then we can approximate

the decoding step of the remaining sequences by Viterbi algorithm using the trained HMM

parameters.

ChIP-seq data were further normalized before being analyzed by hiHMM. ChIP-seq

normalized signals were averaged in 200 bp bins in all three species. MACS2 processed

ChIP-seq fold change values were log2 transformed with a pseudocount of 0.5, i.e.,

y=log2(x+0.5), followed by mean-centering and scaling to have standard deviation of 1. The

transformed fold enrichment data better resemble a Gaussian distribution based on QQ-plot

analysis.

To train the hiHMM, the following representative chromosomes were used:

� Worm (L3): chrII, chrIII, chrX

� Fly (LE and L3): chr2L, chr2LHet, chrX, chrXHet

� Human (H1-hESC and GM12878): chr1, chrX

It should be noted that H4K20me1 profile in worm EE is only available as ChIP-chip data.

This is why worm EE was not used in the training phase. In the inference phase, we used the

quantile-normalized signal values of the H4K20me1 EE ChIP-chip data.

One emission and one transition probability matrix was learned from each species. We also

obtained the maximum a priori (MAP) estimate of the number of states, K. We then used

Viterbi decoding algorithm to generate a chromatin state segmentation of the whole genome

Page 19: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

19��

of worm (EE and L3), fly (LE and L3) and human (H1-hESC and GM12878). To avoid any

bias introduced by unmappable regions, we removed the empirically determined unmappable

regions before performing Viterbi decoding. These unmappable regions are assigned a

separate “unmappable state” after the decoding.

The chromatin state definition can be accessed via the ENCODE-X Browser (http://encode-

x.med.harvard.edu/data_sets/chromatin/).

Chromatin segmentation using Segway

We compared the hiHMM segmentation with a segmentation produced by Segway44, an

existing segmentation method. Segway uses a dynamic Bayesian network model, which

includes explicit representations of missing data and segment lengths.

Segway models the emission of signal observations at a position using multivariate

Gaussians. Each label k has a corresponding Gaussian characterized by a mean vector

and a diagonal covariance matrix . At locations where particular tracks have missing data,

Segway excludes those tracks from its emission model. For each label, Segway also includes

a parameter that models the probability of a change in label. If there is a change in label, a

separate matrix of transition parameters models the probability of switching to every other

label. Given these emission and transition parameters, Segway can calculate the likelihood of

observed signal data. To facilitate modeling data from multiple experiments with a single set

of parameters, we performed a separate quantile normalization on each signal track prior to

Segway analysis. We took the initial unnormalized values from MACS2’s log-likelihood-

ratio estimates. We compared the value at each position to the values of the whole track,

determining the fraction of the whole track with a smaller value. We then transformed this

fraction, using it as the argument to the inverse cumulative distribution function of an

exponential distribution with mean parameter . We divided the genome into 100 bp

non-overlapping bins, and took the mean of the transformed values within each bin. We then

used these normalized and averaged values as observations for Segway in place of the initial

MACS2 estimates.

k�

�1

Page 20: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

20��

We trained Segway using the Expectation-Maximization algorithm and data from all three

species: a randomly-sampled 10% of the human genome (with data from H1-hESC and

GM12878) and the entire fly (LE and L3) and worm (EE and L3) genomes. Using these data

sets jointly, we trained 10 models from 10 random initializations. In every initialization, we

set each mean parameter for label i and track k by sampling from a uniform distribution

defined in , where is the empirical standard deviation of track k. We placed a

Dirichlet prior on the self-transition model to make the expected segment length 100 kb. We

always initialized transition probability parameters with an equal probability of switching

from one label to any other label. While these parameters changed during training, we

increased the likelihood of a flatter transition matrix by including a Dirichlet prior of 10

pseudocounts for each ordered pair of labels. To increase the relative importance of the

length components of the model, we exponentiated transition probabilities to the power of 3.

After training converged, we selected the model with the highest likelihood. We then used

the Viterbi algorithm to assign state labels to the genome in each cell type of each organism.

Chromatin segmentation using ChromHMM

We also compared hiHMM with another existing segmentation method called

ChromHMM45. ChromHMM uses a hidden Markov model with multivariate binary

emissions to capture and summarize the combinatorial interactions between different

chromatin marks. ChromHMM was jointly trained in virtual concatenation mode using 8

binary histone modification ChIP-seq tracks (H3K4me3, H3K27ac, H3K4me1, H3K79me2,

H4K20me1, H3K36me3, H3K27me3 and H3K9me3) from two developmental stages in

worm (EE, L3), two developmental stages in fly (LE, L3) and two human cell-lines

(GM12878 and H1-hESC). The individual histone modification ChIP-seq tracks were

binarized in 200 bp non-overlapping, genome-wide, tiled windows by comparing the ChIP

read counts (after shifting reads on both strands in the 5’ to 3’ direction by 100 bp) to read

counts from a corresponding input-DNA control dataset based on a Poisson background

model. A p-value threshold of 1e-3 was used to assign a presence/absence call to each

window (0 indicating no significant enrichment and 1 indicating significant enrichment).

Bins containing < 25% mappable bases were considered unreliable and marked as ‘missing

�ik

]2.0,2.0[ �

Page 21: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

21��

data’ before training. In order to avoid a human-specific bias in training due to the

significantly larger size of the human genome relative to the worm and fly genomes, the

tracks for both the worm and fly stages were repeated 10 times each, effectively up-

weighting the worm and fly genomes in order to approximately match the amount of training

data from the human samples. ChromHMM was trained in virtual concatenation mode using

expectation maximization to produce a 19 state model which was found to be an optimal

trade-off between model complexity and interpretability. The 19 state model was used to

compute a posterior probability distribution over the state of each 200 bp window using a

forward-backward algorithm. Each bin was assigned the state with the maximum posterior

probability.

The states were labeled by analyzing the state-specific enrichment of various genomic

features (such as locations of genes, transcription start sites, transcription end sites, repeat

regions etc.) and functional datasets (such as transcription factor ChIP-seq peaks and gene

expression). For any set of genomic coordinates representing a genomic feature and a given

state, the fold enrichment of overlap was calculated as the ratio of the joint probability of a

region belonging to the state and the feature to the product of independent marginal

probability of observing the state in the genome and that of observing the feature. Similar to

the observations of hiHMM states, there are 6 main groups of states: promoter, enhancer,

transcription, polycomb repressed, heterochromatin, and low signal.

Heterochromatin region identification

To identify broad H3K9me3+ heterochromatin domains, we first identified broad H3K9me3

enrichment region using SPP26, based on methods get.broad.enrichment.cluster with a 10 kb

window for fly and worm and 100 kb for human . Then regions that are less than 10 kb of

length were removed. The remaining regions were identified as the heterochromatin regions.

The boundaries between pericentric heterochromatin and euchromatin on each fly

chromosome are consistent with those from lower resolution studies using H3K9me213

(Supplementary Fig. 36).

Genome-wide correlation analysis for heterochromatin-related marks

Page 22: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

22��

For heterochromatin related marks in Fig. 3c, the pairwise genome-wide correlations were

calculated with 5 kb bins using five marks in common in the similar way as described above.

Unmappable regions or regions that have fold enrichment values < 0.75 for all five marks

were excluded from the analysis.

Chromatin-based topological domains based on Principal Component Analysis

We respectively partitioned the fly and worm genomes into 10 kb and 5 kb bins, and assign

average ChIP fold enrichment of multiple histone modifications to each bin (See below for

the list of histone modifications used). Aiming to reduce the redundancy induced by the

strong correlation among multiple histone modifications, we projected histone modification

data onto the principal components (PC) space. The first few PCs, which cumulatively

accounted for at least 90% variance, were selected to generate a "reduced" chromatin

modification profile of that bin. Typically 4-5 PCs were selected in the fly and worm

analysis. Using this reduced chromatin modification profile, we could then calculate the

Euclidean distance between every pair of bin in the genome. In order to identify the

boundaries and domains, we calculated a boundary score for each bin:�

in which, dk+i,k is the Euclidean distance between the k+i th bin and the k th bin. If a bin has larger

distances between neighbors, in principle, it would have a higher boundary score and be recognized as

a histone modification domain boundary. The boundary score cutoffs are set to be 7 for fly and worm.

If the boundary scores of multiple continuous bins are higher than the cutoff, we picked the highest

one as the boundary bin. The histone marks used are H3K27ac, H3K27me3, H3K36me1, H3K36me3,

H3K4me1, H3K4me3, H3K79me1, H3K79me2, H3K9me2 and H3K9me3 for fly LE and L3, and

H3K27ac, H3K27me3, H3K36me1, H3K36me3, H3K4me1, H3K4me3, H3K79me1, H3K79me2,

H3K79me3, H3K9me2 and H3K9me3 for worm EE and L3.

10)(_

5

0,5 ,� �

��� ��

i

ii kikdkscoreboundary

Page 23: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supp

lem

enta

ry F

ig. 1

. A su

mm

ary

of th

e fu

ll da

tase

t. Th

is ta

ble

show

s all

chro

mat

in-a

ssoc

iate

d da

ta se

ts g

ener

ated

in th

e m

odEN

CO

DE/

ENC

OD

E pr

ojec

t (th

e m

ain

Fig.

1 sh

ows o

nly

thos

e pr

ofile

d in

at l

east

two

orga

nism

s) a

s wel

l as o

ther

pub

lishe

d da

ta se

ts u

sed

in th

is p

aper

. Cel

l typ

es a

nd d

evel

opm

enta

l sta

ges t

hat s

hare

the

sam

e pa

ttern

acr

oss t

he ro

ws a

re m

erge

d fo

r a m

ore

com

pact

dis

play

; lik

ewis

e,

fact

ors t

hat s

hare

the

sam

e pa

ttern

acr

oss t

he c

olum

ns w

ere

mer

ged

(com

ma

sepa

rate

d). O

rthol

ogs w

ith d

iffer

ent m

ark

or fa

ctor

nam

es a

mon

g th

e th

ree

spec

ies a

re sh

own

with

the

nam

es se

para

ted

by sl

ash

(/). D

ata

that

wer

e no

t gen

erat

ed b

y th

e m

odEN

CO

DE/

ENC

OD

E co

nsor

tium

are

m

arke

d by

ast

eris

ks (*

). C

oval

ent A

ttach

men

t of T

agge

d H

isto

nes t

o C

aptu

re a

nd Id

entif

y Tu

rnov

er (C

ATC

H-I

T) d

ata46

repr

esen

t gen

ome-

wid

e m

easu

rem

ent o

f nuc

leos

ome

turn

over

per

form

ed u

sing

met

abol

ic la

belin

g fo

llow

ed b

y ca

ptur

e of

new

ly sy

nthe

size

d hi

ston

es.

*

**

**

**

*H1

H2A.Z/H2A�������H2AK5acH2AK7ac H2BubH2BK5ac H3 H3.3 H3S10phH3K4me1H3K4me2H3K4me3 H3K9ac

H3K9acS10phH3K9me1H3K9me2H3K9me3H3K14acH3K18acH3K23acH3K27ac

H3K27me1H3K27me2H3K27me3

H3K36acH3K36me1H3K36me2H3K36me3H3K79me1H3K79me2H3K79me3

H4H4acTetra H4R3me2 H4K5ac H4K8acH4K12acH4K16ac

H4K20me1H4K20me3 ��� ���

(h)

���r

mlin

eles

sA

D n

o em

bryo

s

���

�rm

line

Larv

����

����

�����

��La

rv��

����

����

�����

Late

em

bryo

(LTE

MB

)M

ixed

em

bryo

(MX

EM

B)

Ear

ly e

mbr

yo (E

E)

Adu

lt he

ad (A

H)

Ova

ryTh

ird in

star

larv

ae (L

3)L3

Sex

ed M

ale

L3 S

exed

Fem

ale

Late

em

bry�

����

����

����

�E

arly

em

bry�

����

�����

��

ES

14,E

S10

,ES

5B

G3

Clo

ne 8Kc

S2

���(f)(e)

(d)

(c)

(b)

(a)

A54

9�

���!

"���

��

��#$

RO

0174

6�

��

���%

��

&�'

�%�

+�

-%:

;�

P :

��

%<&

�:��

<

&�:

�<�

$=

U2O

SFi

brob

last

IMR

90O

steo

blas

tH

UV

EC

%:�

�&

:�

�+�

>

Hep

G2

���

��<

�G

M12

878

K56

2�

����

<

�-�

��<

hi

ston

e*

**

*

*

**

***

� ��

ASH1,CP190 <���

BEAF,GAF,P

C,PSC,ZW5;�;��

CBX3,CTCFL,HDAC8 CEBPB CHD1 CHD2

�����?���������@ CHRO CTCF

�J���%�J��@ �J��-

���%�?:���EZH2/E(Z)

HDAC1/RPD3/HD �� HDAC2 HP1A��;�����

HP1C,HP2 HP4 N?��� KDM1A KDM2 KDM4A KDM5A

�����%�?Q�� �?:��Y;�;��=� ��<�� MOF �=[�� MYS3

:]=+�^��:]=+�� P300 PIWI POF

=�<��-%TOP2 RBBP5=: �Pol II

=:+��=?:[ =�� SETDB1 SFMBTSMARCA4 SMC3 SU(HW)

SU(V =���-SU(V =���' WDS �+��

(1) (2) (3) (4) (5) (6) (7) (8) (9)

nonh

isto

ne

�����

��

�%�

+���

%�?:

��Y%

�?:

��-%

�?:

�Y�%

�?:

�Y�%

�?:

�Y�%

�?:

�'��

��?<

_?%�

`�

���

[��

%�=

[��

Y%

����

;Q

�%

;Q

@%:

`

=%:

<�

�%

F,Po

lIIS

5P,R

ES

T(4

) HD

AC6,

KD

M5B

,PH

F8,S

AP

30(5

) CH

D7,

KD

M5C

,SIR

T6,S

UZ1

2��

���

J��

^%?�

;��

%�A]

��%�

?<��

�%

g:

�@Y%

<�

��

%<+

�%<

��%<

��%<

_�

�%T

+�� , T

A[��

�Y%�

;

��(7

) A

+�%;

=�

�%

[�^

��^%

�;=

%���

%�<

���%

O%=

�?:

O%<

��

�%Q

:

�@��:

���%

<�

��

%<�

��

%J�'

[�^

=

�@�'

���

��

%�

�-

%�

��%

`

���

%`

���

%[

&�%

�?�

��-%

�?�

��%�

?��Y

%�?�

�@%�

���

%����&

����

%�=

���

�%�

<�

�Y%

(a) D

nd41

,HM

EC

,HS

MM

,HS

MM

tube

%:�

� %:

��+

(b) A

G04

449,

AG04

450,

AG09

309,

AG09

319

,AG

1080

3,A

oAF%

;�

�$C

,GM

1286

4,G

M12

865, ,

GM

1287

5,H

Ac%

��p,

,H

CP

Epi

C,H

EE

piC

,HFF

%�++

��"c

%���

�^%�

�+

, HPA

F,H

PF,

HR

PE

piC

,HV

MF%

:�

�+�

��o,

RP

TEC

,_�

=?�

=j�

�%_

?��@

(c) G

M10

847,

GM

1551

0,G

M18

505,

GM

1852

6��%[

��@

'Y�%

[�

�'^'

'%[

��'

�'�%

;

��

%+<

&��

%����=

�kq%<

&�:

�<�

%]@-

�>��

��^

#$R

`^�

--@%

�^#$

RO

0179

4,H

CF,

Jurk

at �%�

:

�P,S

KM

C(e

) BJ%

�!

���%

[�

^�''

^%�

=�

%<

(f) G

liob{

�%[

���

@'�%

[�

��@'

�%

���+

qj��

��+qj

��bl

,GM

1280

1,G

M12

872,

GM

1287

3 ,G

M12

874,

GM

1923

8,G

M19

239,

GM

1924

0(h

) Lar

v���

����

����

����

%��r

v���

����

����

����

�?

��

�|

�?

�!�q

}no

n-C

hIP

*exte

rnal

sou

rce

*

*****

**

***

**

**

CA���?� DHS GR`��| HiC Lamina

Mononucleosomes :?�[:�!{�������

Salt fractionated chromatin

othe

rsHuman Fly Worm

=

��Y

�%=

�@%<

��%�

��%�

?���

%�?�

��

HB

ME

C,H

CFa

a,H

CM

Page 24: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 2. Genomic coverage of various histone modifications in the three species. Red lines indicate the ten marks analyzed in common in the three species and their cumulative coverage. The color bars underneath each plot indicate whether data is available for a given histone modification in that sample (K562 in human, L3 in fly and L3 in worm). In all three organisms, a large fraction of the assembled and mappable genome is occupied by at least one of the profiled histone modifications. For example, after excluding genomic regions that are unas-sembled or unmappable, the ten histone modifications profiled in at least one cell type or develop-mental stage of all three organisms display enrichments covering 56% of the human genome, 74% of the fly genome, and 92% of the worm genome (see Methods). The higher genomic coverage by histone modifications in worms and flies compared to humans is likely related to both the smaller genome size (which allows better sequencing coverage) and the higher proportion of protein-coding regions in the genomes of these organisms.

Cov

erag

e of

map

pabl

e ge

nom

e (%

)

4020

6080100

0

4020

6080100

0

4020

6080100

0

Cumulativecoverage

Individualcoverage

HumanFlyWorm

Cov

erag

e of

map

pabl

e ge

nom

e (%

)

Cov

erag

e of

map

pabl

e ge

nom

e (%

)

H3K

4me3

H3K

27ac

H3K

9ac

H3K

9me3

H3K

4me2

H3K

79m

e2H

3K36

me3

H3K

4me1

H3K

27m

e3H

2A.Z

H3K

9me1

H4K

20m

e1

H3K

4me3

H3K

27ac

H3K

9ac

H3K

9me3

H3K

4me2

H3K

79m

e2H

3K36

me3

H3K

4me1

H3K

27m

e3H

2AV

H3K

9me1

H4K

20m

e1H

3K79

me3

H3K

9me2

H3K

36m

e1H

3K27

me1

H3K

79m

e1H

3K23

acH

4K16

acH

3K9a

cS10

phH

4K8a

cH

2Bub

H3K

27m

e2

H3K

4me3

H3K

27ac

H3K

9ac

H3K

9me3

H3K

4me2

H3K

79m

e2H

3K36

me3

H3K

4me1

H3K

27m

e3H

TZ1

H3K

79m

e3H

3K9m

e2H

3K36

me1

H3K

27m

e1H

3K79

me1

H3K

23ac

H3K

18ac

H3K

36m

e2H

3K36

ac

Worm

Fly

Human

Page 25: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

H3K

4me3

UW

H3K

4me3

Bro

adH

3K4m

e3 B

road

H3K

4me3

Bro

ad�<<�}�oximal DHS

�� 0 � 4

���

46

8������[���@-@

H3K4me1

�� 0 � 4��0

�4

6

������[���@-@

H3K4me1

��

�� ��

�� ��

1 � 3 4

��0

�4

68 �����������<

H3K4me1

�� 0 1 � 3 4

���

46

8 human IMR90

H3K4me1

�<<�>q���{���<

�� 0 � 4

������

H3K4me1

�� 0

00

� 4

����Y

H3K4me1

�� 1 � 3 4

���^��

H3K4me1

0 1 � 3 4

������

H3K4me1

H3K

4me3

H3K

4me3

H3K

4me3

H3K

4me3

�<<�}�o�q��{���<�;

�� 0 1 �

��0

�4

�{"���

H3K4me1

���^ ^�^ ��^ ��^

01

�3 �{"�<�

H3K4me1

���^ ^�^ ��^ ��^

��0

�4

6 wor����

H3K4me1

�^�Y ^�Y ��Y��00

0

1�

34

worm L3

H3K4me1

�<<�>q���{���<�;

�� 0 1 �

�����@

H3K4me1

���^ ^�^ ��^ ��^

���

H3K4me1

���^ ^�^ ��^ ��^

��Y��

H3K4me1

�^�Y ^�Y ��Y

�����

H3K4me1

Supplementary Fig. 3. H3K4me1/3 enrichment patterns in regulatory elements defined by DNase I hypersensitive sites (DHS) or CBP-1 binding sites. ChIP signal enrichment (log2

scale) of H3K4me3 vs. H3K4me1 at TSS-proximal (<250 bp) and TSS-distal (>1 kb) DHSs (blue: human, orange: fly) or CBP-1 binding sites (green: worm). The labels “UW” and “Broad” denote, two ENCODE data generation centers: University of Washington and Broad Institute, respectively. The median H3K4me3 enrichment values are marked by horizontal dashed lines. The numbers (e.g., 1/3.5) on the dashed lines denote the linear fold enrichment of H3K4me3 at the median TSS-distal site relative to the median TSS-proximal site (e.g., for UW GM12878 data, the median TSS-proximal DHS has 32.2 fold higher enrichment than the median TSS-distal DHS). To characterize the chromatin features that can distinguish enhancers from promoters, we

Page 26: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

�� 0 � 4 6

0.0

0.3

human K562

��&�-�!

dens

ity DHSp300

�� 0 � � 3

0.0

0.4

fly LE

��&�-�!

dens

ity

�� 0 � 4

0.00

^��Y

worm EE

��&�-�!

dens

ity

�� 0 � 4 6 8

0.00

^��^

human GM12878

��&�-�!

dens

ity

0 � � 3

0.0

0.4

fly S2

��&�-�!

dens

ity

0 � � 3

0.0

0.4

worm L3

��&�-�!

dens

ity

Supplementary Fig. 4. Distribution of H3K27ac enrichment levels at putative enhancers.One key observation is that H3K27ac density displays a wide range of enrichment levels at enhancers in all three species. This result in human cells is consistent whether using enhancers identified as DHSs (solid line) or as p300 binding sites (dashed line). X-axis is log2 ChIP fold enrichment of H3K27ac at +/-500 bp of enhancer sites.

compared the enrichment patterns of H3K4me1 and H3K4me3 at TSS-proximal and TSS-distal DHSs in human and fly. Since DHS data were not available in worm, we examined the binding sites of CBP-1, the worm ortholog of human p300/CBP47. We observe that DHSs (or CBP-1 sites) generally fall into two clusters for all cell types: those proximal to TSSs constitute a cluster with stronger H3K4me3 signal (left column), while those distal to TSSs constitute a cluster showing stronger H3K4me1 signals (right column). Although the enrichment levels of H3K4me1/3 at these sites vary considerably between cell types, platforms (array vs. sequencing), and even different laboratories for the same cell type, these two marks clearly distinguish TSS-distal sites (enhancers) from TSS-proximal sites (promoters). Here, we define putative enhancer sites to be DHSs (or CBP-1 sites) with the H3K4me1/3 pattern that is characteristic of TSS-distal sites, as determined by a supervised machine learning approach (see Methods).

Page 27: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

1.0

1.2

1.4

1.1

1.3

1.0

1.2

1.4

1.6

1.8

1.0

1.2

1.4

1.6

1.0

1.2

1.4

1.1

1.3

1.1

1.3

1.5

0.9

1.2

1.4

1.6

1.1

1.3

1.5

1.7

1.2

1.4

1.6

1.8

2.0

2.2

1.6

1.8

2.0

2.2

1.4

1.6

1.8

2.0

2.0

2.5

3.0

2.2

2.4

2.6

2.8

3.2

3.0

Supplementary Fig. 5. Relationship of enhancer H3K27ac levels with expression of nearby genes. Average expression of genes that are close (vary between 5 to 200 kb) to enhancers with high (top 40%; red line) or low (bottom 40%; blue line) levels of H3K27ac in various human, fly and worm samples. As a control, we analyzed TSS-distal DHSs (in human and fly) or CBP-1 sites (in worm) that are not classified as enhancers (dashed black). RPKM: reads per kilobase per million. Error bar: standard error of the mean. The proximity of genes to enhancers with higher H3K27ac levels is positively correlated with expression, in a distance-dependent manner. This observation is consistent across multiple cell-types and tissues in all three species.

Page 28: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

0.0 0.5 1.0 1.5 2.0 2.5 3.0

1

2

3

4

Supplementary Fig. 6. Correlation of enrichment of 82 histone marks or chromosomal proteins at enhancers with STARR-seq defined enhancer strength in fly S2 cells. Histonemarks or chromosomal proteins whose enrichment is anti-correlated (top bar plot) or positively correlated (bottom bar plot) with STARR-seq enrichment level48, which is a proxy for enhancer strength based on the ability of ~600 bp DNA fragments to stimulate transcription from an associated promoter. All histone lysine acetylation marks, including H3K27ac, show a moderate but significant positive correlation with enhancer activity (p<0.01).

Page 29: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

−3000 −1000 0 1000 3000

−2−1

01

23

Human GM

relative position, bp

Z−sc

ore

−3000 −1000 0 1000 3000

−3−2

−10

12

3

Fly S2

relative position, bp

Z−sc

ore

−3000 −1000 0 1000 3000

−2−1

01

2Worm L3

relative position, bp

Z−sc

ore

Supplementary Fig. 8. Nucleosome occupancy at enhancers. Average nucleosome density profiles were computed for DHS and CBP-1-identified enhancers in human GM12878 cells, fly S2 cells, and worm L3. In each case nucleosome occupancy was inferred from MNase-seq data obtained for the corresponding or similar cell types50-52. Green dashed lines indicate centers of the enhancer regions. In general, nucleosome occupancy is lower in the broad region around enhancers (roughly ±2 kb) but with a local (±400 bp) increase at the centers of the enhancers (defined by DHS and CBP-1 peaks). This pattern is similar to that reported for non-promoter regulatory sequences in the human genome53. In human, this increase is characterized by two well-positioned nucleosomes flanking the nucleosome-depleted region at the enhancer center, and this feature may be indicative of the presence of relatively unstable nucleosomes (this may be occluded by lower resolution in fly and worm).

High H3K27ac

Low H3K27ac

Enr

ichm

ent,

log2

sca

le

�� �� 0 � 3

0.2

0.6

��^

H3.3, Human

�� �� 0 � 3

0.4

0.8

���

H3.3, Fly

Distance to enhancer,centered at DHS, kb

Supplementary Fig. 7. Nucleosome turnover at enhancers. ChIP signal enrichment (log2 scale) of H3.3 around enhancers in human Hela-S3 cells (ChIP-seq) and fly S2 cells (ChIP-chip), which is known to be present in regions with higher nucleosome turnover46. We found that the local increase in nucleosome occupancy (see Supplemental Fig. 8) indeed overlaps with the peak of H3.3 enrichment, and that the levels of H3.3 and H3K27ac enrichment are correlated. These findings, together with the specific patterns of nucleosome occupancy49, indicate that increased nucleosome turnover is one of the major characteristics of chromatin at active enhancers.

Page 30: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

��^^^ ��^^^ ^ �^^^ �^^^ �^^^

�^��

^�^

^��

^��

^��

bulk.mM80 in fly.S2

Position relative to center, bp

��v

alue

��^^^ ��^^^ ^ �^^^ �^^^ �^^^

�^�^

��^

�^�

^�^^

^�^�

bulk.mM150.600 in fly.S2

Position relative to center, bp

��v

alue

Supplementary Fig. 9. Salt extracted fractions of chromatin at enhancers. The average profiles are shown for the 80 mM (left) and 150-600 mM (right) salt fractions in fly S2 cells54.The 80 mM fraction is enriched with easily mobilized nucleosomes and preferentially represents accessible, “open” chromatin. The 150-600 mM fraction derives from a 600 mM extraction following a 150 mM extraction and therefore is depleted of such nucleosomes, representing more compacted, “closed” chromatin. We note that the peak in the 80 mM fraction at enhancers indi-cates that these loci are enriched in relatively unstable nucleosomes, which is in agreement with our observation of increased nucleosome turnover at these sites (see Supplementary Fig. 7).

Page 31: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 10. Chromatin environment described by histone modification and binding of chromosomal proteins at enhancers. Z-score of average ChIP fold enrichment of some key histone modifications and chromosomal proteins around +/-2 kb of the center of enhancers with high H3K27ac or low H3K27ac. Most active histone marks in addition to H3K4me1 show stronger enrichment at enhancers with high H3K27ac, including H3K4me2 and many H3 lysine acetylation marks. H3K27me3 is generally not enriched at enhancers except in embryonic stem cells such as human H1-hESC, where there is also enrichment of binding by the Polycomb protein EZH2. Enhancers with high H3K27ac have a higher prevalence of PolII bind-ing in all three species, consistent with the elevated level of H3K4me3 at these sites compared to that in enhancers with low H3K27ac. H2A.Z is enriched in human enhancers, but the H2Av ortholog is not enriched in any fly samples. These configurations are likely to be correlated to the generation of short transcripts from these sites, as reported recently55.

Page 32: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

1.0

1.2

1.4

1.6

1.0

1.2

1.4

1.6

1.0

1.1

1.2

1.3

1.4

1.0

1.1

1.2

1.3

1.41.5

1.6

0.0

0.2

0.4

0.6

0.8

1.0

1.2

H3.3 in human.Hela

Position relative to center, bp

-3 -1-2 0 21 3

top 40%bottom 40%

Enr

ichm

ent,

log2

-sca

le

Supplementary Fig. 11. Analysis of p300-based enhancers from human. As an additional validation, we repeated all key analyses in human cell lines using the population of p300-based enhancers; the general trends remain the same as found for the DHS-based sites in corresponding human cell lines. a, Average expression of genes that are close to enhancers with high (top 40%; red line) or low (bottom 40%; blue line) levels of H3K27ac in human cell lines. As a control, we analyzed TSS-distal p300 binding sites that are not classified as enhancers (dashed black). RPKM: reads per kilobase per million. Error bar: standard error of the mean. b, ChIP signal enrichment (log2 scale) of H3.3 around p300-based enhancers in human Hela-S3 cells. c, Z-score of average ChIP fold enrichment of some key histone modifications and chromosomal proteins around +/-2 kb of the center of high H3K27ac or low H3K27ac enhancers in human cell lines. The observed patterns at human enhancers hold even if the putative enhancers were centered at p300 sites instead of DHSs.

Page 33: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 12. Promoter architecture. Comparative analysis of promoter architecture as shown by average profiles of H3K4me3 (human GM12878, fly L3 and worm L3), DNase hypersensitivity sites (DHS), GC content and nascent transcrip (GRO-seq, in human IMR90 and fly S2 cells) over all TSSs which do not overlap with any neighboring genes within 1 kb upstream. Human promoters exhibit a bimodal enrichment for H3K4me3 and other active marks, immediately upstream and immediately downstream of the TSSs. In worm, we observed weak H3K4me3 enrichment upstream of TSS (as defined using recently published capRNA-seq data36).In contrast, fly promoters clearly exhibit a unimodal distribution of active marks, downstream of the TSSs. Since genes that have a neighboring gene within 1 kb of a transcription start or end site were removed from this analysis, any bimodal histone modification pattern cannot be attributed to nearby genes. This difference is also not explained by chromatin accessibility determined by DNase I hypersensitivity (DHS), or by fluctuations in GC content around the TSSs, although the GC profiles are highly variable across species.

Page 34: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

�10

12

3

Distance to TSS (kb)

Distance to TSS (kb)

020

4060

80

Total RNA�seq

Distance to TSS (kb)

sense ELantisense, EL

sense EEantisense EE

Distance to TSS (kb)

Sca

led

sign

al(Z

�sco

re)

Sca

led

sign

al

�1 �0.5 0 0.5 1

a b

c

Aver

age

Tota

l RN

A-se

q sig

nal

Unidirectional

transcriptionB

idirectional transcription

�10

12

3

(Z�s

core

)

�2 �1 0 1 2

Supplementary Fig. 13. Relationship between sense-antisense bidirectional transcription and H3K4me3 at TSS. a, The majority of human expressed genes have sense-antisense bidirec-tional transcription at the TSS. Even in the small number of TSSs with unidirectional transcrip-tion, there is still a clear signal of bimodal H3K4me3 enrichment. The GC content pattern is the same as in expressed genes with unidirectional and bidirectional transcription. b, An average plot summarizing the results in panel A. c, Independently generated total RNA-seq data generated by modENCODE using fly early (2-4 hours) and late (14-16 hours) embryos support the observation made in fly S2 GRO-seq data that there is no evidence of strong antisense transcription at fly promoters.

Page 35: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 14. Profiles of the well positioned nucleosome at Transcription Start Sites (TSSs) of protein coding genes. Nucleosome frequency profiles (as represented as Z-scores) around TSSs for human CD4+ T cells, fly EE and worm adults. The profiles were computed for highly expressed (top 20% in all three species) and lowly expressed genes (bottom 20% for fly and human, and bottom 40% for worm; see Methods). The main features of the ‘classic’ nucleosome occupancy profile56, comprising a nucleosome-depleted region at the TSS flanked by well-positioned nucleosomes (‘-1’, ‘+1’, etc.) are observed in expressed genes for all three organisms. The similarity between the profiles, especially in the context of different nucleotide compositions of the TSS-proximal regions across the species, underscores the importance and conservation of specific nucleosome placement for gene regulation.

Z-sc

ore

Distance to TSS (bp)-1,500 1,500-500 5000

Human

Fly

Worm

NDR

NDR

NDR

-1

+1

-1+1

-1

+1

-101234

-2

0

2

4

-10123

-2

Z-sc

ore

Z-sc

ore

HighExpression

Expression

Expression

Low

HighLow

HighLow

Page 36: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

a b c

−1500 −500 0 500 1500

−2−1

01

23

45

human CD4+ T−cells (Schones'08)human CD4+ T−cells (Valouev'11)

Distance to TSS (bp)

Z-sc

ore

−1500 −500 0 500 1500

−3−2

−10

12

3

Distance to TSS (bp)

Z-sc

ore

−1500 −500 0 500 1500

−2−1

01

2

worm WOworm ME

Distance to TSS (bp)

Z-sc

ore

Supplementary Fig. 15. Nucleosome occupancy profile at TSS based on two MNase-seq datasets for each species. Comparison of the nucleosome occupancy profiles at TSS obtained in different studies. Two TSS-proximal profiles are plotted for each species: a, obtained for human CD4+ T-cells57,58, b, obtained for fly embryos (this study) and S2 cells59, and, c, obtained for worm embryos60 and whole adult organisms51. All data were uniformly processed as described in Methods. The nucleosomal profiles at TSS, obtained under different biochemical conditions (e.g., degree of chromatin digestion or salt concentration used to extract mono-nucleosomes), may vary substantially even for the same cell type, due to interplay between nucleosome stability and observed occupancy61,62.

Supplementary Fig. 16. Association between repressive chromatin and lamina-associated domains (LADs). Heatmap of the enrichment of H3K9me3 and H3K27me3 in scaled LADs (upper panels: long LADs as defined as the 20% longest LADs; lower panel: short LADs as defined as the 20% shortest LADs). Each row represents H3K27me3 or H3K9me3 enrichment in each LAD. (H3K9me3 and H3K27me3 from IMR90, LADs from Tig3 for human; H3K9me3 and H3K27me3 from EE, LADs from MXEMB for worm). Our examination reveals a simple relationship that depends on LAD size. In human fibroblasts, long LADs (> 1 Mb) tend to be found in H3K9me3-enriched heterochromatic regions, with sharp enrichment of H3K27me3 at the LAD boundaries; in contrast, short LADs (< 1 Mb) are enriched for H3K27me3 across the domain with a low occupancy of H3K9me3. Although LADs are generally smaller in worm, we observe a similar though weaker trend, with longer LADs more frequently enriched for H3K9me3.

Page 37: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

22,600 kb 22,800 kb 23,000 kb 23,200 kb 23,400 kb 23,600 kbFly Chr3L

Heterochromatin

b

Gene

Dam-LAM/Dam

H3K9me3

LAD relative coordinate (kb)

Log2

ChI

P/in

put

−0.8

−0.4

0.0

0.2

��^ ��^ 0 30 60

HumanEZH2

K27me3

LAD relative coordinate (kb)

−0.2

−0.1

0.0

0.1

0.2

−0.040.00

0.040.08

��^ ��^ 0 �^ 20

FlyE(Z)PC

K27me3

LAD relative coordinate (kb)

−0.7

−0.5

−0.3

�Y ���Y 0 ��Y Y

Worm

K27me3

(H3K

27m

e3) (E

Z/Pc)

LAD LAD LAD

a

Supplementary Fig. 17. Chromatin context in lamina-associated domains. a, Average profiles of H3K27me3 (NHLF in human, Kc in fly, and EE in worm) and EZH2/E(Z) (NHLF in human and Kc in fly) at LAD boundaries. LADs are enriched for H3K27me3 and are often flanked by E(Z) in fly or EZH2 (the ortholog) in human, both H3K27 methyltransferases and members of Polycomb Repressive Complex 2. b, Genome browser shot of the profiles fly Kc Lam DamID in chromosome 2L. The levels of Lam (DamID) are negative in heterochromatin (gray block enriched with H3K9me3 in Kc). Y-axis: log2 enrichment of Lam (DamID) normalized by controls (first row); log2 ChIP/input (second row) in the range of -3 and 3.

Page 38: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Genes

IMR90NHLF

OsteoblIMR90NHLF

OsteoblTig3 LAD

000 kb 115,000 kb 116,000 kb 117,000 kb 118,000 kb 119,000 kb 120,000 kb

Long LAD Short LAD

aK9

me3

K27m

e3

b

Human Chr2

H3K9me3 LAD size

4M(bp)

Average H3K9me3

−0.5 0.5 (z-score)

LAD

siz

e

100kbscaled LAD

0

0

Hum

an

-1.5

1.5

z-sc

ore(

log2

ChI

P/in

put)

0 4M (bp) −0.5 1.0 (z-score)

10kbscaled LAD

0

LAD

siz

e

Wor

m

1.2M

genome-wide average

c

human worm

01

23

z−sc

ore(

log2

ChI

P/in

put)

human worm

−0.5

0.51.0

1.52.0

human worm

−1.0

−0.5

0.00.5

1.0

human worm

0.00.5

1.01.5

2.0

Shor

t LAD

sLo

ng L

ADs

genome-wideaverage

z−sc

ore(

log2

ChI

P/in

put)

H3K9me3H3K27me3

Supplementary Fig. 18. Chromatin context in short and long lamina-associated domains (LADs) in three organisms. In human, long LADs tend to be localized in H3K9me3-enriched heterochromatic regions, with sharp enrichment of H3K27me3 at the LAD boundaries; in contrast, short LADs are enriched for H3K27me3 across the domain body with a low occupancy of H3K9me3. a, Example of typical patterns of H3K9me3 and H3K27me3 profiles of fibroblast or fibroblast-like cell lines in human long LADs (dark blue) and short LADs (light blue) from fibroblast Tig3. Y-axis: fold enrichment of ChIP/input in the range of 0 and 2. The enrichment of H3K27me3 is observed at the boundaries of long LADs (red dashed). b, Relationship between the level of H3K9me3 enrichment in LADs and the size of LADs in human and worm. Longer LADs are more frequently enriched for H3K9me3. Left: heatmap of the enrichment in scaled LADs (upper: human; lower: worm). Each row represents H3K9me3 enrichment in each LAD, sorted by the size of LAD. Middle: LAD domain size. Right: average H3K9me3 values in each LAD. Genome-wide average values are indicated by the green dashed lines. In human, H3K9me3 is often associated with LADs of > ~1.2 Mb. (Human: IMR90 for H3K9me3 and Tig3 for LADs; worm: early embryos for H3K9me3 and mixed embryos for LADs). c, The average enrichments in long LADs (top 20% in LAD size) and short LADs (bottom 20% in LAD size). In short LADs, in all three species, the levels of H3K27me3 enrichment are higher than the genome-wide average, whereas the levels of H3K9me3 enrichment are low. In long LADs, the levels of H3K9me3 enrichment are higher than the genome-wide average. ) No long LADs in the H3K9me3 hetero-chromatic regions were reported in fly data generated from Kc167 cells using DamID6; however, this may reflect the specific cellular origin (plasmatocyte) of Kc167 cells63, as well as the fact that these analyses do not include the simple tandem repeats that constitute the majority of fly heterochromatin (Human: IMR90 for H3K9me3/H3K27me3 and Tig3 for LADs; fly data from Kc; worm: early embryos for H3K9me3/H3K27me3 and mixed embryos for LADs).

Page 39: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

IMR90

BJ

LADCall

0 50 100 150 200 243Chr 2 Position (MB)

Kc

LADCall

0 5 10 15 20Chr 2R Position (MB)

0.0

0.5

1.0

1.5

Nor

mal

ized

RP

KM

IMR90 BJ Kc

LAD Random LAD Random LAD Random

EarlyLate

Human FlyLate RPKM

0 1

a

b

Supplementary Fig 19. LAD domains are late replicating. a, Distribution of late replicating domains and LADs across human chromosome 2 and fly chromosome 2R. Late replicating domains (red) are shown in human and fly cell lines by plotting the relative RPKM of BrdU-enriched fractions from late S-phase binned across 50 kb (human) and 10 kb (fly) windows. LADs for human40 and fly39 are indicated in black. b, LADs are enriched for late replicating sequences and depleted of early replicating sequences. Boxplots depicting the genome-wide distribution of early (green) and late (red) replicating sequences in LAD and random domains for human and fly cell lines. Thus one consistent feature between fly and human is the association of LADs with late replication, which suggests that LADs generally reside in (and may promote) a repressive chromatin environment that impacts both transcription and DNA replication.

Page 40: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 20. DNA shape conservation in nucleosome sequences. a, Consensus ORChID2 profiles as a measure of DNA shape (y-axis) in 146-148 bp nucleosome-associated DNA sequences identified by paired-end MNase-seq in human, fly and worm. A larger value of DNA shape (y-axis) corresponds to a wider minor groove and weaker negative charge. ORChID2 provides a quantitative measure of DNA backbone solvent accessibility, minor groove width, and minor groove electrostatic potential. DNA shape analysis can reveal structural features shared by different sequences that are not apparent in the typical approach of evaluating mono- or di- nucleotide frequencies along nucleosomal DNA, since it can capture structural features in regions with degenerate sequence signatures. Consensus shape profiles, obtained by averaging individual nucleosome-bound sequences aligned by the inferred dyad position, are highly similar across species. b, Normalized correlation (similarity) of ORChID2 profile of individual nucleosome-associated sequence with the consensus profile (see Methods and Supplementary Fig. 21). The result indicates that the proportion of sequences that are positively correlated with the consensus profile is higher than would be expected by random in all three species, and this proportion is higher in worm than in fly and human.

Human Fly Worm0.32

0.30

0.28

0.26

0.24

0.22

0 50 100 150Position in consensus profile

DN

A sh

ape

a b

0.00

0.01

0.02

0.03

Human Fly Worm

Nor

mal

ized

sim

ilarit

y w

ith c

onse

nsus

Page 41: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

GC = 0.35 - 0.40

GC = 0.40 - 0.45

GC = 0.45 - 0.50

GC = 0.50 - 0.55

0.200.220.240.260.28

0.260.280.300.32

0.300.310.320.330.340.35

0.330.340.350.360.370.380.39

0 50 100 150Position in consensus profile

DN

A s

hape

flyhuman worm

a

b

fly (EE)

human (lymphoblastoid) - GSM907783

worm (mixed embryo) - GSM514735

worm (whole organism) - GSM777719

worm (whole organism) - GSM807109

0.00.20.40.60.81.0

0.00.20.40.60.81.0

0.00.20.40.60.81.0

0.00.20.40.60.81.0

0.00.20.40.60.81.0

�^�� 0.0 0.2 0.4Correlation with consensus profile

Cum

ulat

ive

fract

ion

genome

random

0 1

1

0

Correlation with consensus

Cum

ulat

ive

fract

ion

of fr

agm

ents

genome

random

No influence of sequence on positioning

Positioning is completelyexplained by sequence

with individual MNase-seq regions length 146-148 bp

Supplementary Fig. 21. DNA shape in nucleosome sequences. a, Consensus ORChID2 profiles in 146-148 bp nucleosome-associated DNA sequences stratified by average GC content. The subset of GC content sequences used here (GC: 35 % to 55 %) represents 74.4%, 65.6%, and 64.6% of the human, fly, and worm reads, respectively. Note that the worm dataset used here (GSM807109) is a representative of three independent worm MNase-seq datasets. b, An outline of the analysis procedure used to evaluate individual sequence DNA shape similarity to the consensus (upper panel) and continuous distributions of similarity scores (lower panel).

Page 42: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 22. Chromatin context of broadly expressed and specifically expressed genes. ChIP signal enrichment (log2 scale) of different marks is plotted against gene expression (log2 scale) for protein coding genes with low expression variability (black points), and high expression variability (colored points), across cell types. ChIP signal enrichment is calculated over the whole gene body for H3K36me3, H3K79me2, H4K20me1 and H3K27me3, within 500 bp of the TSS for H3K4me1 and H3K4me3, and over the gene body excluding the first 500 bp at the 5' end for PolII. Different columns show different cell types as labeled. The expressed gene cut-off of RPKM=1 is denoted with vertical dashed lines. In fly LE and worm L3, most ChIP enrichment and depletion signals appear to be significantly lower in specifically expressed genes. This observation is understood to be due to the different sensitivities of RNA-seq and ChIP-seq protocols when examining samples with heterogeneous cell types. Genes expressed in only a sub-population of the cells can be identified as expressed in RNA-seq assays, but the chromatin signal from the sub-population of cells with these genes actively expressed is washed out by the signal from the remaining cells, where these genes are silent. In human and fly cell lines and worm early embryos, the majority of the marks show similar enrichment and depletion patterns

Page 43: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

for broadly and specifically expressed genes. Two particular marks show consistent differences in these cell types: H3K4me1 levels are observed to be on average higher in specifically expressed genes relative to broadly expressed genes in both species, consistent with the role of H3K4me1 in marking cell-type specific regulatory regions. On the other hand, H3K36me3 levels are observed to be on average lower in specifically expressed genes relative to broadly expressed genes. This is consistent with previously reported results in fly Kc cells64 and worm early embryos7. We verified that the difference in H3K36me3 levels is not due to differences in gene structure such as gene length, first intron length or exon coverage (Supplementary Fig. 23; See Supplementary Fig. 24 for an example.) However, the differences are much larger in whole animals than in cell lines, suggesting that the observation may be a consequence of sampling mixed cell types, where a large number of transcripts could come from genes enriched for H3K36me3 in only a small fraction of the cells in the population. Consistent with this hypoth-esis, chromatin signals associated with active gene expression are lower over specifically expressed genes compared to broadly expressed genes in these samples. It is possible that different modes of transcriptional regulation are being utilized, e.g., it is hypothesized that in worm EE, H3K36me3 marking of germline- and broadly expressed genes is carried out by the HMT MES-4, providing epigenetic memory of germline transcription, whereas specifically expressed genes are marked co-transcriptionally by the HMT MET-17. Profiling of chromatin patterns and gene expression in additional individual cell types is needed to test whether cellular heterogeneity fully accounts for our observations.

Page 44: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

2468

expression

low

exp

ress

ion

hig

h ex

pres

sion

sho

rt ge

nes

long

gen

es

sho

rt fir

st in

tron

long

firs

t int

ron

low

exo

n co

vera

ge

hig

h ex

on c

over

age

human GM12878

010000200003000040000

length

05000

100001500020000

first intron len.

0.00.20.40.60.8

exon cover.

��0123

H3K36me3

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

����

012345

(n=

676)

(n=

101)

(n=1

437)

(n=

134)

(n=1

086)

(n=

120)

(n=1

027)

(n=

115)

(n=1

133)

(n=

84)

(n=

980)

(n=

151)

(n=1

017)

(n=

117)

(n=1

096)

(n=

118)

PolII

2468

low

exp

ress

ion

hig

h ex

pres

sion

sho

rt ge

nes

long

gen

es

sho

rt fir

st in

tron

long

firs

t int

ron

low

exo

n co

vera

ge

hig

h ex

on c

over

age

fly BG3

2000400060008000

10000

01000200030004000500060007000

0.00.20.40.60.8

�^�Y0.00.51.01.52.0

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

0.00.51.01.52.0

(n=

624)

(n=

105)

(n=1

324)

(n=

47)

(n=

984)

(n=

95)

(n=

964)

(n=

57)

(n=1

120)

(n=

79)

(n=

828)

(n=

73)

(n=

815)

(n=

84)

(n=1

133)

(n=

68)

2468

1012 lo

w e

xpre

ssio

n

hig

h ex

pres

sion

sho

rt ge

nes

long

gen

es

sho

rt fir

st in

tron

long

firs

t int

ron

low

exo

n co

vera

ge

hig

h ex

on c

over

age

worm EE

2000400060008000

10000

0500

10001500

0.20.40.60.8

����

012

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific

broa

dsp

ecifi

cbr

oad

spec

ific��

��02

(n=

712)

(n=

461)

(n=1

940)

(n=

115)

(n=1

342)

(n=

425)

(n=1

310)

(n=

151)

(n=1

404)

(n=

297)

(n=1

248)

(n=

279)

(n=1

231)

(n=

208)

(n=1

421)

(n=

368)

Supplementary Fig. 23. Structure and expression of broadly and specifically expressed genes. Boxplots show gene expression [log2(RPKM+1)], gene length, first intron length, exon coverage and H3K36me3 and PolII enrichment (log2) for broadly and specifically expressed genes. The genes are categorized according to expression, gene length, first intron length and exon coverage in the four vertical panels respectively.

Page 45: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

CG34304 CG6465 Skeletor CG31373 tomboy20 CG14691 fau Rbp1

CG14681 Takr86C Cap-H2 fau Rbp1

CG14681 Cap-H2 fau Rbp1

hth

CG14683 Cap-H2 fauGene

S2 RNA-seqKc RNA-seq

BG3 RNA-seqS2 H3K4me3Kc H3K4me3

BG3 H3K4me3S2 H3K36me3Kc H3K36me3

BG3 H3K36me3S2 H3K4me1Kc H3K4me1

BG3 H3K4me1

6,400 kb 6,500 kb 6,600 kb303 kbchr3R

Supplementary Fig. 24. Example genome browser screenshot showing broadly and specifically expressed genes. The fly hth gene is specifically expressed in BG3 cells. Cell-type-specific enrichment of H3K4me3 at the TSS and of H3K4me1 over the gene body of hthis observed, whereas H3K36me3 is depleted over the gene independent of its expression level.

Page 46: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

H3K79me2

H3K4me1

H3K4me3

H3K4me2

H3K27ac

H3K36me3

H3K9me3

H3K27me3

H3K79

me2

H3K4m

e1

H3K4m

e3

H3K4m

e2

H3K27

ac

H3K36

me3

H3K9m

e3

H3K27

me3

intra-species variance

inter-species varianceh

fw

1.0

0.5

0.0

-0.5

-1.0

0.14

0.12

0.10

0.08

0.06

0.04

0.02

0

r var(r)

Supplementary Fig. 25. Genome-wide correlations between histone modifications show intra- and inter- species similarities and differences. Upper left half: pairwise correlations between marks in each genome, averaged across all three species. Lower right half: pairwise correlation, averaged over cell types and developmental stages, within each species (pie chart), inter-species variance (grey-scale background) and intra-species variance (grey-scale small rectangles) of correlation coefficients. h,f,w indicate human, fly and worm, respectively. Modifi-cations enriched within or near actively transcribed genes are consistently correlated with each other in all three organisms. In contrast, we found major differences in the relationships between two repressive marks associated with silenced genes: H3K27me3, which is associated with Polycomb (Pc)-mediated silencing, and H3K9me3, associated with heterochromatin. This is indicated by the high variance of correlation between these two marks among the different organisms (black cell). In worm, these two marks are strongly correlated at both developmental stages analyzed, whereas their correlation is low in human (r = -0.24 ~ -0.06) and fly (r = -0.03 ~ -0.1).

Page 47: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Supplementary Fig. 26. A Pearson correlation matrix of histone marks in each cell type or devel-opmental stage. Each entry in the matrix is the pairwise Pearson correlation between marks across the genome, computed using 5 kb bins across the mappable regions excluding regions with no signal at all(ChIP fold enrichment over input <1 for all 8 marks). This numerical matrix shows the same difference in H3K9me3 and H3K27me3 as reported in Supplemental Fig. 25. Note that embryonic cell/sample types (H1-hESC in human, LE in fly, and EE in worm) display a higher correlation between H3K4me1 and repressive mark H3K27me3, compared to cells or samples that are more differentiated (GM12878 and K562 in human; L3 and AH in fly; L3 in worm).

Page 48: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

�� �^�Y ^ ^�Y �Value

Color Key

a

&Y������&Y�����A�&Y���< &Y�����A������<�=;;Y&Y�����A������<�&��Y &Y����+�]_�&Y����+[���@-@��+�]_������<��+[���@-@���&�^���&Y�����&�^���&Y�����&'��������<���&�^��������<���&�-�!�����<���������<���&����&Y���=;;Y&Y���&��Y;�����<���&'�!&Y���=: �ol II&Y�����&����&Y�����&'�!&Y�����&����&Y�����&�-�!&Y�����&�����]_�&Y����+@[���@-@���&'�![���@-@���&����[���@-@���&����[���@-@���&�����]_������<���&���������<���&�����-���<���&�����]_�&Y�����&����[���@-@���&����[���@-@��� ��&Y����� �������<���&������-���<���&������]_�&Y�����&������]_�&Y�����&�����[���@-@���&�����[���@-@���&������]_�[���@-@���&�-�![���@-@���&-'���&Y�����&-'��������<���&-'���&Y�����&�-����]_�&Y�����&�-���&Y�������&Y����^^[���@-@�����[���@-@���&�-����]_�[���@-@���&�-��������<��� ���-���<���&�-����]_������<���&�-��������<�����&Y�����&'��������<���&'���[���@-@���&'���

�����

&Y�

��

��

�&

Y���

��

�&

Y���

<

&

Y���

��

��

����

<

�=;

;

Y&

Y���

��

��

����

<

�&�

�Y

&Y�

��

�+�

]_

�&

Y���

+[

���

@-@�

+�]

_�

���

��<

�+

[�

��@-

@��

�&�^

���

&Y�

���

�&�^

���

&Y�

���

�&'�

���

����

<

���&

�^�

���

����

<

���&

�-�!

���

��<

��

��

����

<

���&

����

&Y�

��=

;;

Y

&Y�

��&

��

Y;�

����

<

���&

'�!

&Y�

��=

:

��{

�??&

Y���

��&

����

&Y�

���

�&'�

!&

Y���

��&

����

&Y�

���

�&�-

�!&

Y���

��&

����

�]_

�&

Y���

+@[

���

@-@�

��&

'�!

[�

��@-

@��

�&��

��[

���

@-@�

��&

����

[�

��@-

@��

�&��

���]

_�

���

��<

��

�&��

���

����

<

���&

����

�-�

��<

��

�&��

���]

_�

&Y�

���

�&��

��[

���

@-@�

��&

����

[�

��@-

@��

� ��

&Y�

���

� ��

���

��<

��

�&��

���

�-�

��<

��

�&��

���

�]_

�&

Y���

��&

���

���]

_�

&Y�

���

�&��

���

[�

��@-

@��

�&��

���

[�

��@-

@��

�&��

���

�]_

�[

���

@-@�

��&

�-�!

[�

��@-

@��

�&-'

���

&Y�

���

�&-'

���

���

��<

��

�&-'

���

&Y�

���

�&�-

���

�]_

�&

Y���

��&

�-�

��&

Y���

���

�&

Y���

�^

^[

���

@-@�

���

�[

���

@-@�

��&

�-�

���]

_�

[�

��@-

@��

�&�-

���

���

��<

��

� ��

�-�

��<

��

�&�-

���

�]_

��

����

<

���&

�-�

���

����

<

����

�&

Y���

��&

'���

���

��<

��

�&'�

��[

���

@-@�

��&

'���

Supplementary Fig. 27. a. (See below for legend)

Page 49: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

LE H3K9me3LE H3K9me2AH H3K9me2L3 H3K9me3L3 H3K9me2AH H3K9me3AH H3K36me2LE H3K36me2AH H3K36me3LE H3K36me3L3 H3K36me3AH H4K16acL3 H4K16acLE H2AVAH H2AVL3 H2AVLE H3K4me3L3 H3K4me3AH H3K4me3AH H3K4me2LE H3K4me2L3 H3K4me2LE H3K27me3L3 H3K27me3L3 H3K27me2LE H3K27me2L3 H1LE H1AH H1L3 H3K27acLE H3K27acLE H3K4me1AH H3K9acLE H3K9acL3 H3K9acL3 H3K9acS10PLE H3K79me1L3 H3K79me1AH H3K4me1L3 H3K4me1LE H3K79me2L3 H3K79me3L3 H3K79me2LE H4K20me1AH H2BubAH H4K20me1L3 H4K20me1L3 H2BubLE H2BubAH H3K27me1L3 H3LE H4LE H4K16acLE RNA Pol IILE H3K9acS10PAH H3K36me1AH H3K18acAH H4K8acLE H3K36me1L3 H3K36me1L3 H3K9me1L3 H4K8acLE H3K18acLE H3K9me1L3 H3K27me1AH H3LE H3AH H3K9me1AH H3K23acLE H3K23acL3 H3K23ac

FlyLE

H3K

9me3

LE H

3K9m

e2A

H H

3K9m

e2L3

H3K

9me3

L3 H

3K9m

e2A

H H

3K9m

e3A

H H

3K36

me2

LE H

3K36

me2

AH

H3K

36m

e3LE

H3K

36m

e3L3

H3K

36m

e3A

H H

4K16

acL3

H4K

16ac

LE H

2AV

AH

H2A

VL3

H2A

VLE

H3K

4me3

L3 H

3K4m

e3A

H H

3K4m

e3A

H H

3K4m

e2LE

H3K

4me2

L3 H

3K4m

e2LE

H3K

27m

e3L3

H3K

27m

e3L3

H3K

27m

e2LE

H3K

27m

e2L3

H1

LE H

1A

H H

1L3

H3K

27ac

LE H

3K27

acLE

H3K

4me1

AH

H3K

9ac

LE H

3K9a

cL3

H3K

9ac

L3 H

3K9a

cS10

PLE

H3K

79m

e1L3

H3K

79m

e1A

H H

3K4m

e1L3

H3K

4me1

LE H

3K79

me2

L3 H

3K79

me3

L3 H

3K79

me2

LE H

4K20

me1

AH

H2B

ubA

H H

4K20

me1

L3 H

4K20

me1

L3 H

2Bub

LE H

2Bub

AH

H3K

27m

e1L3

H3

LE H

4LE

H4K

16ac

LE R

NA

Pol

IILE

H3K

9acS

10P

AH

H3K

36m

e1A

H H

3K18

acA

H H

4K8a

cLE

H3K

36m

e1L3

H3K

36m

e1L3

H3K

9me1

L3 H

4K8a

cLE

H3K

18ac

LE H

3K9m

e1L3

H3K

27m

e1A

H H

3LE

H3

AH

H3K

9me1

AH

H3K

23ac

LE H

3K23

acL3

H3K

23ac

b

�� �^�Y 0 ^�Y 1Value

Color Key

Supplementary Fig. 27. b. (See below for legend)

Page 50: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

EE HCP-3L3 H3K27me3EE H3K27me3EE H3K9me3L3 H3K9me3L3 H3L3 H4K20me1L3 RPC-1L3 DPY-27EE H4K20me1L3 H3K79me1L3 H3K9me2EE H3K9me2L3 H3K9me1EE H3K9me1L3 HPL-2L3 LIN-52L3 LIN-35L3 LIN-54L3 LIN-9L3 DPL-1L3 LIN-37L3 EFL-1L3 H3K36me3EE H3K36me3L3 H3K79me2L3 H3K79me3EE RNA Pol IIEE H3K79me2EE H3K79me3L3 H3K36me1EE H3K36me1L3 H3K4me1EE H3K4me2EE H3K4me1L3 H4K8acEE H3K79me1EE MRG-1EE H4K8acL3 H3K27acL3 EPC-1EE H3K4me3EE H3K27acL3 H3K9acS10L3 H3K4me3

Worm

EE

HC

P-3

L3 H

3K27

me3

EE

H3K

27m

e3E

E H

3K9m

e3L3

H3K

9me3

L3 H

3L3

H4K

20m

e1L3

RP

C-1

L3 D

PY

-27

EE

H4K

20m

e1L3

H3K

79m

e1L3

H3K

9me2

EE

H3K

9me2

L3 H

3K9m

e1E

E H

3K9m

e1L3

HP

L-2

L3 L

IN-5

2L3

LIN

-35

L3 L

IN-5

4L3

LIN

-9L3

DP

L-1

L3 L

IN-3

7L3

EFL

-1L3

H3K

36m

e3

EE

RN

A P

ol II

EE

H3K

36m

e3L3

H3K

79m

e2L3

H3K

79m

e3

EE

H3K

79m

e2E

E H

3K79

me3

L3 H

3K36

me1

EE

H3K

36m

e1L3

H3K

4me1

EE

H3K

4me2

EE

H3K

4me1

L3 H

4K8a

cE

E H

3K79

me1

EE

MR

G-1

EE

H4K

8ac

L3 H

3K27

acL3

EP

C-1

EE

H3K

4me3

EE

H3K

27ac

L3 H

3K9a

cS10

L3 H

3K4m

e3

-1 -0.5 0.5 10correlation

c

Supplementary Fig. 27. Genome-wide correlation of ChIP-seq datasets for human, fly and worm. a, The genome-wide correlations between chromatin marks and factors for human. Each entry in the heatmap shows the Pearson correlation coefficient between a pair of marks/factors, computed using 30-kb bins across the whole genome. The dendrogram shows the hierarchical clustering result based on Pearson correlation coefficients. Datasets marked with 'UW' were generated at the University of Washington; the rest were generated at the Broad Institute. b,c,The same correlation matrices for fly and worm, respectively. The resolution for calculating correlation is 10 kb.

Page 51: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

040

80

[HFW]_Heterochromatin2[hFW]_Heterochromatin1

[hFW]_Polycomb[HFW]_ReprPromoter

[HFW]_Bivalent[HFW]_Elongation2[HFW]_Elongation1[HFW]_IntronWeak

[HfW]_Intron[HFw]_Elongation

[HFW]_ElonRegulatory[hFW]_Regulatory1

[HFW]_Regulatory2[HFW]_Regulatory4

[hFW]_Promoter[HFW]_Regulator5

[HFW]_Low3[HFW]_Low2[HFW]_Low1

[hfW]_Heterok9k27[HFW]_Heterochromatin2[hFW]_Heterochromatin1

[hFW]_Polycomb[HFW]_ReprPromoter

[HFW]_Bivalent[HFW]_Elongation2[HFW]_Elongation1[HFW]_IntronWeak

[HfW]_Intron[HFw]_Elongation

[HFW]_ElonRegulatory[hFW]_Regulatory1

[HFW]_Regulatory2[HFW]_Regulatory4

[hFW]_Promoter[HFW]_Regulator5

16 Low signal 3

15 Low signal 2

14 Low signal 1

13 Heterochrom

atin 212 H

eterochromatin 1

11 PC

repressed 210 P

C repressed 1

9 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 G

ene, H4K

20me1

5 Transcription 5', 24 Transcription 5', 13 E

nhancer 22 E

nhancer 11 P

romoter

16 Low signal 3

15 Low signal 2

14 Low signal 1

13 Heterochrom

atin 212 H

eterochromatin 1

11 PC

repressed 210 P

C repressed 1

9 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 G

ene, H4K

20me1

5 Transcription 5', 24 Transcription 5', 13 E

nhancer 22 E

nhancer 11 P

romoter

16 Low signal 3

15 Low signal 2

14 Low signal 1

13 Heterochrom

atin 212 H

eterochromatin 1

11 PC

repressed 210 P

C repressed 1

9 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 G

ene, H4K

20me1

5 Transcription 5', 24 Transcription 5', 13 E

nhancer 22 E

nhancer 11 P

romoter

H3K4

me3

H3K4

me1

H3K2

7ac

H3K7

9me2

H4K2

0me1

H3K3

6me3

H3K2

7me3

H3K9

me3

[HFW]_Low3[HFW]_Low2[HFW]_Low1[hfW]_Heterok9k27[HFW]_Heterochromatin2[hFW]_Heterochromatin1[hFW]_Polycomb[HFW]_ReprPromoter[HFW]_Bivalent[HFW]_Elongation2[HFW]_Elongation1[HFW]_IntronWeak[HfW]_Intron[HFw]_Elongation[HFW]_ElonRegulatory[hFW]_Regulatory1[HFW]_Regulatory2[HFW]_Regulatory4[hFW]_Promoter[HFW]_Regulator5

0 0.6 1

a

b

Hum

an H1

Hum

an GM

12878

Fly LEFly L3

Worm

EE

Worm

L3hiHMM

Seg

way

[HFW]_Low3[HFW]_Low2[HFW]_Low1

[hfW]_Heterok9k27

Supplementary Fig. 28. Comparison of chromatin state maps generated by hiHMM and Segway. a,Histone modification enrichment in each of the 20 chro-matin states (i.e., emission matrix) identified by Segway. Color represents relative enrichment of a histone mark (scaled between 0 and 1). Letters in brackets preceding each state name indicate coverage within each species (H: human, F: fly, W: worm; upper case: high coverage, lower case: low coverage). In general, Segway identified similar types of states as hiHMM (Fig. 2b). b, This figure shows the percentage of each hiHMM region that is occupied by each Segway state. The blue-red color bar shows percentage of overlap. There is a strong overlap between certain hiHMM-states and certain Segway states, indicating the identification of analogous states. For example, the Promoter state in hiHMM (state 1) strongly overlaps with Segway state "[hFW]_Promoter". Thus the two algorithms give largely concordant results.

Page 52: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

0 40 80

0 1A

ctiv

epr

omot

e rA

ctiv

epr

omot

erfla

nkin

gA

ctiv

een

hanc

e rW

eak

enha

nce r

Enh

ance

rwith

only

H3K

27a c

Enh

ance

rin

trans

crib

e dre

gion

1E

nhan

ceri

ntra

nscr

ibe d

regi

on2

Tran

scrib

ed1

Tran

s cri b

ed2

Tran

scrib

edw

ithen

richm

ent i

ni n

trons

1T r

ansc

ribed

with

enric

hmen

t in

intr o

ns2

3en

dof

t rans

crib

edge

n es

13

end

oftra

nscr

ibed

gene

s2

Biv

alen

tpoi

sed

p rom

ote r

Poly

com

bre

pres

sed

wit h

H3K

4me1

Poly

com

bre

pres

sed

Het

eroc

hrom

a tin

Het

eroc

hrom

atin

with

H3K

27m

e3Lo

wsi

gnal

Act

ive

p rom

ote r

Act

ive

prom

oter

flank

ing

Act

ive

enha

nce r

Wea

ken

hanc

e rE

nhan

cerw

ithon

lyH

3K27

a cE

nhan

ceri

ntra

nscr

ibe d

regi

on1

Enh

ance

rin

trans

crib

e dre

gion

2T r

ansc

ribed

1Tr

ans c

ri bed

2Tr

ansc

ribed

with

enric

hmen

tin

i ntro

ns1

Tran

scrib

edw

ithen

richm

ent i

nin

tr ons

23

end

oftra

nsc r

ibed

gen e

s1

3en

dof

trans

crib

edge

nes

2B

ival

entp

oise

dp r

omot

e rPo

lyco

mb

repr

esse

dw

it hH

3K4m

e1Po

lyco

mb

repr

esse

dH

ete r

ochr

oma t

inH

eter

ochr

omat

inw

ithH

3K27

me3

Low

sign

al

16 Low signal 315 Low signal 214 Low signal 113 Heterochromatin 212 Heterochromatin 111 PC repressed 210 PC repressed 19 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 Gene, H4K20me15 Transcription 5', 24 Transcription 5', 13 Enhancer 22 Enhancer 11 Promoter

16 Low signal 315 Low signal 214 Low signal 113 Heterochromatin 212 Heterochromatin 111 PC repressed 210 PC repressed 19 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 Gene, H4K20me15 Transcription 5', 24 Transcription 5', 13 Enhancer 22 Enhancer 11 Promoter

16 Low signal 315 Low signal 214 Low signal 113 Heterochromatin 212 Heterochromatin 111 PC repressed 210 PC repressed 19 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 Gene, H4K20me15 Transcription 5', 24 Transcription 5', 13 Enhancer 22 Enhancer 11 Promoter

H3K

4me3

H3K

4me1

H3K

27ac

H3K

79m

e2H

4K20

me1

H3K

36m

e3H

3K27

me3

H3K

9me3

Active promoterActive promoter flankingActive enhancerWeak enhancerEnhancer with only H3K27acEnhancer in transcribed region 1Enhancer in transcribed region 2Transcribed 1Transcribed 2Transcribed with enrichment in introns 1Transcribed with enrichment in introns 23’end of transcribed genes 13’end of transcribed genes 2Bivalent poised promoterPolycomb repressed with H3K4me1Polycomb repressedHeterochromatinHeterochromatin with H3K27me3Low signal

Human H1-hESC Human GM12878

Fly LE Fly L3

worm EE worm L3

chromHMM states

chromHMM states

a b

Supplementary Fig. 29. Comparison of chromatin state maps generated by hiHMM and ChromHMM. a, Histone modification enrichment in each of the 19 chromatin states (i.e., emis-sion matrix) identified by ChromHMM. Color represents emission probability given the enrich-ment of a histone mark. In general, ChromHMM identified similar types of states as did hiHMM (Fig. 2b). b, This figure shows the percentage of each hiHMM region that is occupied in each ChromHMM state. The blue-red color bar shows percentage of overlap. There is a strong overlap between certain hiHMM-states and certain ChromHMM states, indicating analogous states. For example, the Promoter state in hiHMM (state 1) strongly overlap with ChromHMM states "Active promoter" and more weakly with "Active promoter flanking". Thus the two algorithms give largely concordant results.

Page 53: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

16 Low signal 315 Low signal 214 Low signal 113 Heterochromatin 212 Heterochromatin 111 PC repressed 210 PC repressed 19 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 Gene, H4K20me15 Transcription 5', 24 Transcription 5', 13 Enhancer 22 Enhancer 11 Promoter

16 Low signal 315 Low signal 214 Low signal 113 Heterochromatin 212 Heterochromatin 111 PC repressed 210 PC repressed 19 Transcription 3', 38 Transcription 3', 27 Transcription 3', 16 Gene, H4K20me15 Transcription 5', 24 Transcription 5', 13 Enhancer 22 Enhancer 11 Promoter

1_P

rom

oter

2 _G

ene

3 _E

nhan

cer

4_In

tron

5 _H

4K16

ac6 _

Rep

ress

ed7_

Het

18 _

Het

29 _

low

1 _P

rom

oter

2 _G

ene

3 _E

nhan

cer

4_In

tron

5 _H

4K16

ac6 _

Rep

ress

ed7_

Het

18 _

Het

29 _

low

1_A

ctiv

e_P

rom

oter

2_W

eak_

Pro

mot

er3 _

Pois

ed_P

rom

oter

4 _S

trong

_Enh

ance

r5_

Stro

ng_E

nhan

cer

6_W

eak_

Enh

ance

r7 _

Wea

k_E

nhan

cer

8_In

sula

tor

9_Tx

n_Tr

ansi

tion

10_T

xn_E

long

atio

n11

_Wea

k_Tx

n12

_Rep

ress

ed13

_Het

eroc

hrom

/lo14

_Rep

etiti

ve/C

NV

15_R

epet

itive

/CN

V

BG3 S2

BG3 S2

H1-hESC

GM12878

HumanFly

Kharchenko et al 9 state model (fly) Ernst et al (human)

hiH

MM

chr

omat

in s

tate

map

0 30 0 60

0 60

0 30

0 300 30

Supplementary Fig. 30. Comparison of hiHMM-based chromatin state model with species-specific models. The fly hiHMM segmentations for LE and L3 were compared to the 9-state model established for fly S2 and BG3 cell lines by Kharchenko10 et al. The human hiHMM segmentations for GM12878 and H1-hESC were compared to their respective chroma-tin maps produced by ChromHMM45. The color bar shows the percentage of a given hiHMM states that is occupied by each species-specific model state. Our chromatin state maps are in agreement with existing species-specific chromatin maps for human45 and fly10, even though these state maps were generated using a different number of histone marks and different cell types.

Page 54: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Human GM12878Human H1-hESC

17 Unmappable

1 Promoter2 Enhancer 13 Enhancer 24 Transcription 5' 15 Transcription 5' 26 Gene, H4K20me17 Transcription 3' 18 Transcription 3' 29 Transcription 3' 3

10 PC repressed 111 PC repressed 212 Heterochromatin 113 Heterochromatin 214 Low signal 115 Low signal 216 Low signal 3

17 Unmappable

1 Promoter2 Enhancer 13 Enhancer 24 Transcription 5' 15 Transcription 5' 26 Gene, H4K20me17 Transcription 3' 18 Transcription 3' 29 Transcription 3' 3

10 PC repressed 111 PC repressed 212 Heterochromatin 113 Heterochromatin 214 Low signal 115 Low signal 216 Low signal 3

Fly L3 Worm L3

Supplementary Fig. 31. Distribution of genomic features in each hiHMM-based chroma-tin state. Each entry in the heatmap represents the relative enrichment of that state in a given genomic feature. The scale was normalized between 0 to 1 for each column. In general, similar combinations of histone marks are enriched in each state across the three species (Fig. 2b), and each state is enriched for similar genomic features (above).

Page 55: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

CEBPBCHD1CHD2CTCFEZH2HDAC2KDM5AP300

POLR2ARBBP5

humanCEBPBCHD2CTCFEZH2P300

POLR2ASMC3

GM

BEAFCTCFGAFHP1AHP1BHP1CHP2HP4KDM1AKDM2PCPOFPSCRING

RNA Pol IIRPD3SU(HW)ZW5

fly

HP1AHP1BHP1CHP2KDM1AKDM4APOF

RNA Pol IIRPD3SFMBTSMC3

L3

KDM2MRG�1

RNA Pol II

worm

AMA�1ASH�2CBP�1DPY�26DPY�28DPY�30HCP�4IMB�1MAU�2MIS�12MIX�1

PQN�85RPC�1SDC�1SFC1SMC�4SMC�6SWD3TAF�1TAG�315TBP�1

MxE

DPL�1DPY�27EFL�1EPC�1HDA�1HPL�2LET�418LIN�35LIN�37LIN�52LIN�53LIN�54LIN�61LIN�9NURF�1RPC�1ZFP�1

L3

�1 0 1 2

H1LE

EE1

Prom

oter2

Enhancer1

3E

nhancer24

Transcription5'1

5Transcription

5'26

Gene,H

4K20m

e17

Transcription3'1

8Transcription

3'29

Transcription3'3

10P

Crepressed

111

PC

repressed2

12H

eterochromatin

113

Heterochrom

atin2

14Low

signal115

Lowsignal2

16Low

signal3

z score ofchIP fold enrichment

3 4

Supplementary Fig. 32. Enrichment of chromosomal proteins in individual chro-matin states generated by hiHMM. ChIP enrichment Z-scores computed based on genome-wide mean and standard deviation, for individual non-histone proteins in human H1-hESC and GM12878, fly late embryo (LE) and third instar larvae (L3) and worm early embryo (EE), mixed embryo (MxE) and larvae stage 3 (L3).

Page 56: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Hum

an H1-hE

SC

Hum

an GM

12878Fly L3W

orm L3

1P

romoter

2E

nhancer13

Enhancer2

4Transcription

5'15

Transcription5'2

6G

ene,H4K

20me1

7Transcription

3'18

Transcription3'2

9Transcription

3'310

PC

repressed1

11P

Crepressed

212

Heterochrom

atin1

13H

eterochromatin

214

Lowsignal1

15Low

signal216

Lowsignal3

17U

nmappable

1P

romoter

2E

nhancer13

Enhancer2

4Transcription

5'15

Transcription5'2

6G

ene,H4K

20me1

7Transcription

3'18

Transcription3'2

9Transcription

3'310

PC

repressed1

11P

Crepressed

212

Heterochrom

atin1

13H

eterochromatin

214

Lowsignal1

15Low

signal216

Lowsignal3

17U

nmappable

Supplementary Fig. 33. Enrichment of transcrip-tion factor binding sites in individual chromatin states generated by hiHMM. Each entry in the heatmap represents the relative enrichment of that state in the entire set of TF binding sites. ChIP enrich-ment scaled from 0 to 1, for individual transcription factors in human H1-hESC and GM12878, fly third instar larvae (L3) and worm larvae stage 3 (L3). Most transcription factors are associated either with the promoter states or with Pc repression 1, but some are broadly distributed. Note that many transcription factors are associated with state 10, PC Repressed 1, and relatively few with state 11, PC Repressed 2. This result supports that there are two distinct types of Polycomb-associated repressed regions: strong H3K27me3 accompanied by marks for active genes or enhancers (state 10) and strong H3K27me3 without active marks (state 11) (see Fig. 2b).

Page 57: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

a

b c

Supplementary Fig. 34. Coverage by hiHMM-based states in mappable regions of individual chromosomes in human (a), fly (b), and worm (c). Fly annotated heterochromatic arms (chr2LHet, chr2RHet, chr3LHet, chr3RHet and chrXHet) are disproportionally enriched for heterochromatin states and "Low signal 3" state, which is consistent with our understanding of the marks enriched in these regions. Similarly, chromosome four is enriched in heterochromatin. Furthermore, in worm, a higher proportion of chrX is covered by the H4K20me1-enriched state 6 in L3 compared to EE, which is consistent with the role of H4K20me1 in worm chrX dosage compensation at L3.

2 Enhancer 11 Promoter

3 Enhancer 24 Transcription 5’ 15 Transcription 5’ 26 Gene, H4K20me17 Transcription 3’ 18 Transcription 3’ 29 Transcription 3’ 3

10 PC repressed 111 PC repressed 212 Heterochromatin 113 Heterochromatin 214 Low signal 115 Low signal 216 Low signal 3

Page 58: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Worm L3chrI

chrII

chrIII

15080 kb

15280 kb

13790 kb

17500 kb

20930 kb

17720 kb

chrV

chrX

chrIV

chrXHet

chrX

chr4chr3RHet

chr3R

chr3LHetchr3L

chr2RHet

chr2R

chr2LHet

chr2LFly L3

210 kb

22430 kb1360 kb

2520 kb27910 kb2560 kb

24550 kb

3290 kb

21150 kb370 kb23020 kb

Human H1- hESCchr6

chr7

chr8

chr9

171.2 Mb

159.2 Mb

146.4 Mb

141.3 Mb

H3K9me3 ChIP/input fold enrichment

low high Heterochromatin call

Supplementary Fig. 35. Heterochromatin domains defined based on H3K9me3-enrichment in worm, fly and human. H3K9me3 profiles from worm L3 (upper), fly L3 (middle) and human H1-hESC (bottom) in heatmaps and the identified heterochromatic regions (enriched for H3K9me3; green) are shown. For human, examples are shown for selected chromosomes. Note centromeric regions of human chromosomes are poorly assembled (regions marked with grey above H3K9me3 enrichment heatmap). Significantly enriched regions are determined using a Poison model for ChIP and input tag distributions with a window size of 10 kb (fly and worm) or 100 kb (human), using SPP26 (see Methods). The majority of the H3K9me3-enriched domains in fly L3 and human H1-hESC are concentrated in the pericentric regions, while in worm L3 they are distributed in subdomains throughout the chromosome arms.

Page 59: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

[0 - 4.5]H3K9me3

Riddle et alHet call

Genes

0 kb 400 kb 800 kb 1,200 kb 1,600 kb 2,000 kb 2,400 k2.5 Mb

H3K9me3

Genes

20,600 kb 21,000 kb 21,400 kb 21,800 kb 22,200 kb 22,600 kb 2

H3K9me3

Genes

b 22,200 kb 22,600 kb 23,000 kb 23,400 kb 23,800 kb 24,200 kb

chr2R

chr2L

chr3L

2.5 Mb

2.5 Mb

[0 - 4.5]

[0 - 4.5]

Riddle et alHet call

Riddle et alHet call

This study Riddle et al.

Supplementary Fig. 36. Borders between pericentric heterochromatin and euchromatin in Fly L3 from this study compared to those based on H3K9me2 ChIP-chip data. The screen-shots are from near the pericentric heterochromatic regions in fly chr2R (upper), chr2L (middle) and chr3L (bottom). H3K9me3 ChIP-seq profiles (ChIP/input fold enrichment) are shown in the top rows. Heterochromatin (Het) calls, the regions identified as significantly enriched for H3K9me3 with a 10 kb window, are shown in the middle (see Methods). The end or start sites of continuous H3K9me3 enrichment regions are marked with red triangles. Blue triangles indicate identified borders between pericentric heterochromatin and euchromatin from Riddle et al.13

based on H3K9me2 ChIP-chip profiles. The boundaries between pericentric heterochromatin and euchromatin on each fly chromosome are consistent with those from the lower resolution studies using H3K9me2.

Page 60: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

H1-hESCHSMMtube

HSMMNH-A

NHDF-AdNHLF

OsteoblIMR90

LAD call

H3K9me3 ChIP/input fold enrichment > 1

0 243 Mb

embryonic

non-fibroblastfibroblast

H3K9me3 and LADs in human chr2

U

Supplementary Fig. 37. Distribution of H3K9me3 in different cells in human chr2 as an example. U indicates an unassembled region in a centromere. In human differentiated cell types, the regions enriched with H3K9me3 (red blocks) often form large domains (up to ~ 11 Mb in size) distant from the centromere and are cell type-specific. The LADs identified in fibroblasts (Tig3) correlate with the regions enriched for H3K9me3 in other fibroblast or fibroblast-like cell types.

Page 61: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

1 Mb

H3K36me3 2 --1 _

Chr. II

Up 2 --1 _

Kim 2 --1 _

Ab 2 --1 _

Up 2 --1 _

Kim

AM 2 --1 _

2 --1 _

arm center

DNA DNAH3K9me3 H3K27me3

N2

met-2set-25

mes-2

0.92 0.74

0.87

0.82 0.85 0.86

0.73 0.70 0.73

0.49 0.42 0.47

0.86 0.90

0.97

H3K9me3 H3K27me3Up Kim AM Up Kim

H3K9

me3

Up

Ab

Kim

H3K2

7me3 AM

Up

Ab

Kim

H3K

9me3

H3K

27m

e3a

b c

Supplementary Fig. 38. Evidence that overlapping H3K9me3 and H3K27me3 ChIP signals in worm are not due to antibody cross-reactivity. a,b, ChIP-chip experiments were performed from early embryo extracts with three different H3K9me3 antibodies (from Abcam, Upstate, and H. Kimura) and three different H3K27me3 antibodies (from Active Motif, Upstate, and H. Kimura). The H3K9me3 antibodies show similar enrichment profiles (a) and high genome-wide correlation coeffi-cients (b). The same is true for H3K27me3 antibodies. Between the H3K9me3 and H3K27me3 ChIP results, there is a significant overlap, especially on chromosome arms, resulting in relatively high genome-wide correlation coefficients. The Abcam and Upstate H3K9me3 antibodies showed low-level cross-reactivity with H3K27me3 on dot blots24, and the Abcam H3K9me3 ChIP signal over-lapped with H3K27me3 on chromosome centers. The Kimura monoclonal antibodies against H3K9me3 and H3K27me3 showed the least overlap and smallest genome-wide correlation. In ELISA assays using histone H3 peptides containing different modifications, each Kimura H3K9me3 or H3K27me3 antibody recognized the modified tail against which it was raised and did not cross-react with the other modified tail65,66, providing support for their specificity. c, Specificity of the Kimura antibodies was further analyzed by immunostaining germlines from wild type, met-2 set-25 mutants (which lack H3K9 HMT activity15), and mes-2 mutants (which lack H3K27 HMT activity67). Staining with anti-HK9me3 was robust in wild type and in mes-2, but undetectable in met-2 set-25. Staining with anti-HK27me3 was robust in wild type and in met-2 set-25, but undetectable in mes-2. Finally, we note that the laboratories that analyzed H3K9me3 and H3K27me3 in other systems used Abcam H3K9me3 (for human and fly) and Upstate H3K27me3 (for human), and in these cases observed non-overlapping distributions. Chandra et al. also reported non-overlapping distributions of H3K9me3 and H3K27me3 in human fibroblast cells using the Kimura antibodies66. The overlapping distributions that we observe in worms using any of those antibodies suggest that H3K9me3 and H3K27me3 occupy overlapping regions in worms. Those overlapping regions may exist in individual cells or in different cell sub-populations in embryo and L3 preparations.

Page 62: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

Human Fly Worm

H3K9me3H3K27me3H3K4me3H3K27ac

H3K79me2H3K36me3

H3K9me1H4K20me1

Euchromatin Heterochromatin Euchromatin Heterochromatin Euchromatin HeterochromatinExp Silent Exp Silent Exp Silent Exp Silent Exp Silent Exp Silent

othe

rac

tive

repr

essi

ve

TSS TES

1 kb 1 kbscaledgene body

500 bp 500 bp-2 0 2

z-score (ChIP/input fold enrichment)

Supplementary Fig. 39. Gene body plots of several histone modifications for euchromatic and heterochromatic genes. Heterochromatic genes are defined as genes in H3K9me3-enrichment regions detected with 10 kb (fly and worm) or 100 kb (human) window (see Methods). Expressed or silent genes are defined using RNA-seq data (see Methods; human K562, fly L3 and worm L3). For human and fly, H3K9me3 is depleted at the TSS of expressed genes in the heterochromatic regions. In worm, H3K9me3 is predominantly confined to gene bodies, with overall lower levels in promoter regions.

GeneKIAA1551

hiHMM

H3K4me3H3K4me1H3K27ac

H3K79me2

H3K27me3H3K9me3RNA-seq

32,110 kb 32,130 kb

H3K36me3H4K20me1

35 kb

RpL5 CG17490 CG17493

22,430 kb 22,440 kb20 kb

R155.1 R155.2

1,955 kb 1,970 kb20 kb

F53A3.7 R155.3

Human Fly Worm2 Enhancer 11 Promoter

3 Enhancer 24 Transcription 5’ 15 Transcription 5’ 26 Gene, H4K20me17 Transcription 3’ 18 Transcription 3’ 29 Transcription 3’ 3

10 PC repressed 111 PC repressed 212 Heterochromatin 113 Heterochromatin 214 Low signal 115 Low signal 216 Low signal 3

Supplementary Fig. 40. The chromatin state map around three examples of expressed genes in or near heterochromatic regions in human GM12878 cells, fly L3, and worm L3. Expressed genes are enriched for H3K4me3 (state 1, red) at their promoters and exhibit a transcription state across the body of the gene. While H3K9me3 is a hallmark of heterochromatin in all three species (resulting in a heterochromatin state across the body of the gene in this domain; states 12-13, grey), H3K27me3 is also enriched in worm heterochromatin.

Page 63: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

�� �� �� �� 0 � �

��������0��

�����

�� �� 0 �

��������

0��

�����

�� �� �� 0 �

����0��� ��� �

�� �� �� 0 �

��

��

0

� �����

Human Worm centerWorm armFly

a(10 kb)

�� �� 0 �

��

��

��

0

�����

�� �� �� �� 0 �

��

��

��

0

�����

�������� 0 � �

��

0

� ��� �

�������� 0 �

������0��� �����

b(1 kb)

K9m

e3 (l

og2

ChI

P/in

put)

K27me3(log2 ChIP/input)

K9m

e3 (l

og2

ChI

P/in

put)

K27me3(log2 ChIP/input)

Human Worm centerWorm armFly

K27me3 K27me3K27me3

K27me3 K27me3 K27me3

Supplementary Fig. 41. Relationship between enrichment of H3K27me3 and H3K9me3 in three species. a, The distribution of H3K27me3 and H3K9me3 enrichment for human K562 (left most), fly L3 (second left), chromosome arms (second right) and centers (right most) of worm EE. After binning log2 of ChIP over input fold enrichment profiles with a 10 kb bin, the density was calculated as a frequency of bins that fall in the area in the scatter plot (darker grey at a higher frequency). r indicates Pearson correlation coefficients between binned H3K27me3 fold enrichment (log2) and H3K9me3 fold enrichment (log2). In worm chromosome arms the correlation between H3K27me3 and H3K9me3 is much higher than in human and fly. In addi-tion, in worm H3K27me3 is highly correlated with H3K9me3 in arms, but not in centers. b, Thesame figure as above using a bin size of 1 kb. The high correlation in worm is driven by the overlapping distributions of H3K27me3 and H3K9me3 on the arms of worm chromosomes. The vast majority of H3K9me3 resides in the arms (Fig. 3a), and H3K27me3 is enriched in subdo-mains across the entire chromosome, including the arms.

Page 64: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

������H3K9me3

H3K27me3H3K36me3

�rm Center �rm with pairing center

������H3K9me3

H3K27me3H3K36me3

telomere telomerepericentric

heterochromatin

Centromerepericentric

heterochromatin

������H3K9me3

H3K27me3H3K36me3

telomere telomerepericentric

heterochromatin

Centromerepericentric

heterochromatin

Human

Fly

Worm

~10M

~1M

~1M

Supplementary Fig. 42. Organization of silent domains. Schematic diagrams of the distribu-tions of silent domains along the chromosomes in three species. Upper: human H1-hESC, Middle: fly S2, Lower: worm EE. In human and fly, the majority of the H3K9me3-enriched domains are located in the pericentric regions (as well as telomeres), while the H3K27me3-enriched domains are distributed along the chromosome arms. H3K27me3-enriched domains are negatively correlated with H3K36me3-enriched domains, although in human, there is some overlap of H3K27me3 and H3K36me3 in bivalent domains. CENP-A resides at the centromere. In contrast, in worm the majority of H3K9me3-enriched domains are located in the arms, while H3K27me3-enriched domains are distributed throughout the arms and centers of the chromo-somes and are anti-correlated with H3K36me3-enriched domains. The inset below worm shows the typical patterns of H3K36me3, H3K27me3, H3K9me3 and CENP-A in the arms lacking the pairing center. This inset highlights that virtually all H3K9me3-enriched domains reside within H3K27me3-enriched domains. In arms and centers, domains that are permissive for CENP-A incorporation generally reside within H3K27me3-enriched domains.

Page 65: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

0 50 100 150 200

0.0

0.5

1.0

1.5

2.0

2.5

3.0

distance from boundary [kb]

obse

rved

/ ex

pect

ed

PromoterEnhancer 1Enhancer 2

0 50 100 150 200

0.0

0.5

1.0

1.5

2.0

2.5

3.0

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 50 100 150 200

0.0

0.5

1.0

1.5

2.0

2.5

3.0

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Transcription 3, 1Transcription 3, 2Transcription 3, 3

0 50 100 150 2000.

00.

51.

01.

52.

02.

53.

0distance from boundary [kb]

obse

rved

/ ex

pect

ed

Pc repressed 1Pc repressed 2

0 50 100 150 200

0.0

0.5

1.0

1.5

2.0

2.5

3.0

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Heterochromatin 1Heterochromatin 2

0 50 100 150 200

0.0

0.5

1.0

1.5

2.0

2.5

3.0

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Low signal 1Low signal 2Low signal 3

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

ed

PromoterEnhancer 1Enhancer 2

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

edTranscription 3, 1Transcription 3, 2Transcription 3, 3

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Pc repressed 1Pc repressed 2

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Heterochromatin 1Heterochromatin 2

0 5 10 15 20

01

23

45

67

distance from boundary [kb]

obse

rved

/ ex

pect

ed

Low signal 1Low signal 2Low signal 3

Human Fly

Supplementary Fig. 43. Chromatin context of topological domain boundaries. Observed occurrences of chromatin states near Hi-C defined topological domain boundaries normalized to random expectation. The two species generally show similar enrichment of active states near domain boundary and depletion of low signal and heterochromatin states. In human H1-hESC, the Pc repressed 1 state, which largely marks bivalent regions, is also observed to be enriched near domain boundaries.

Page 66: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

PromoterTranscription 3, 1Transcription 3, 2Transcription 3, 3Transcription 5, 1Transcription 5, 2Enhancer 1Enhancer 2Gene, H4K20me1Pc repressed 1Pc repressed 2Low signal 1Heterochromatin 1Heterochromatin 2Low signal 3Low signal 2

�����������<������^�����}�{��q!�{�>���q��

Active TranscriptionActive Enhancer

Pc RepressedHeterochromatin

Lo��<q���{

015

00

����-��n = 123

n = 502n = 178

n = 1522

0.0 0.4 0.8

coverage in domain

dom

ain

size

[kb]

02

4

3050 / 8387242 / 1025

���-�������355 / 3205

������������expressed/all genes:

expr

essi

on

PromoterTranscription 5, 1Transcription 3, 1Transcription 3, 3Transcription 3, 2Enhancer 1Enhancer 2Transcription 5, 2Gene, H4K20me1Pc repressed 1Pc repressed 2Heterochromatin 1Heterochromatin 2Low signal 3Low signal 1Low signal 2

+{"�����������'���}�{��q!�{�>���q��

Active TranscriptionActive Enhancer

Activ��?������richPc Repressed

HeterochromatinLo��<q���{

020

0

n = 377����'�

n = 214n = 107

n = 34n = 345

0.0 0.4 0.8

coverage in domain

dom

ain

size

[kb]

04

8

2783 / 3702�YY�����^

�YY�����@@�'@���'Y@

83 / 305��@����Y'Y�expressed/all genes:

expr

essi

on

Supplementary Fig. 44. Classification of topological domains based on chromatin states. Coverage of chromatin states (rows) in individual topological domains (columns) is shown as a heatmap for fly late embryos and human H1-hES cells. Chromatin states are clustered according to their co-occurrence correlations in topological domains to identify the labeled meta-chromatin states. (Active Transcription, Active Enhancer, Active Intron-rich, Pc Repressed, Heterochromatin, and Low Signal domains). The topological domains are classified according to the dominant meta state in the domain. The clustering of chromatin states is observed to be generally similar in the two species. One notable exception is that the H4K20me1-enriched state 6 is found in Polycomb repressed domains in human, whereas the same state is enriched in introns of long active genes in fly. These long genes are observed to define a relatively distinct group of topological domains. The distributions of domain sizes and expression levels of genes for the different topological domain classes are also presented as boxplots. Active domains are observed to be smaller in size in both species: in fly LE, 377 (32%) domains are identified as active covering 15% of the fly genome and containing 43% of all active genes (RPKM>1). In human H1-hESC, 736 (24%) domains are identified as active covering 16% of the human genome and contain 47% of all active genes (RPKM>1).

Page 67: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

0 200 400 600 800

0.00

00.

002

0.00

40.

006

0.00

8Domain size distribution

Domain size (kb)

dens

ity

20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

HM boundaries

Random boundaries

Wilcoxon test p-value< 1e-06

Searching distance from boundaries (kb)

Rat

io o

f Hi-C

bou

ndar

ies

cove

red

by H

M b

ound

arie

s

20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Serching distance from boundaries (kb)

Rat

io o

f HM

bou

ndar

ies

cov

ered

by

HiC

bou

ndar

ies

Wilcoxon test p-value< 1e-06

Overlap between HM and Hi-C boundariesa b

Hi-C domainHM domain

Hi-C boundariesRandom boundaries

Supplementary Fig. 45. Similarity between fly histone modification domains/boundaries and Hi-C. a, We used the fly histone-modification-defined (HM) boundaries to divide the fly genome into HM domains. We defined fly HM domain as the genomic region in between middle points of two nearby HM boundaries. Fly HM domains (red line) have the same size distribution as Hi-C domains (orange line), with a peak at about 50 kb and a long right tail. b,In order to show the significant overlap between boundaries defined from HM and Hi-C, we generated random boundaries through random shuffling while keeping the same domain size distribution for each chromosome. We generated random boundaries for 100 times. We then searched for Hi-C boundaries around HM and around random boundaries (left), as well as for HM boundaries around Hi-C and around random boundaries (right). Blue line is plotted as the average of 100 random boundary sets. Significant overlap between HM and Hi-C boundaries compared to random background is supported by a Wilcoxon test with p-value less than 10-6.

Page 68: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

16,200 kb 16,700 kb 17,200 kb

Chromatin

boundary score

boundary

domain

H3K4me3

H3K27me3

H3K36me3

chrIV

1,490 kb

Chromatin distance0 15

Supplementary Fig. 46. A chromosomal view of the chromatin-based topological domains in worm early embryos at chromosome IV. Similar to Fig 3e, we generated chromatin-based topological domains in worm early embryos. Local histone modification similarity (Euclidian distance) is presented as a heatmap. Red indicates higher similarity and blue indicates lower similarity. Chromatin-defined boundary scores, boundaries and domains locations are compared to histone marks in the same chromo-somal regions. There is no available Hi-C data in worm that could be used for direct comparison. None-theless, the enrichment of H3K4me3 and H3K36me3 at the predicted boundaries is reminiscent of what is observed in Hi-C topological boundaries in human and fly (Supplementary Figs. 43, 44). This analy-sis supports the idea that domain prediction based on genome-wide similarities of histone modifications may be used to discover large-scale topological domains without the need for HiC data.

Page 69: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

0 5 10 15 200

12

34

56

7ob

serv

ed /

expe

cted Promoter

Enhancer 1Enhancer 2

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 3, 1Transcription 3, 2Transcription 3, 3

0 5 10 15 20

01

23

45

67 Pc repressed 1

Pc repressed 2

0 5 10 15 200

12

34

56

7 Heterochromatin 1Heterochromatin 2

0 5 10 15 20

01

23

45

67 Low signal 1

Low signal 2Low signal 3

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed PromoterEnhancer 1Enhancer 2

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 3, 1Transcription 3, 2Transcription 3, 3

0 5 10 15 20

01

23

45

67 Pc repressed 1

Pc repressed 2

0 5 10 15 20

01

23

45

67 Heterochromatin 1

Heterochromatin 2

0 5 10 15 20

01

23

45

67 Low signal 1

Low signal 2Low signal 3

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed PromoterEnhancer 1Enhancer 2

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 3, 1Transcription 3, 2Transcription 3, 3

0 5 10 15 20

01

23

45

67 Pc repressed 1

Pc repressed 2

0 5 10 15 20

01

23

45

67 Heterochromatin 1

Heterochromatin 2

0 5 10 15 20

01

23

45

67 Low signal 1

Low signal 2Low signal 3

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed PromoterEnhancer 1Enhancer 2

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 5, 1Transcription 5, 2Gene, H4K20me1

0 5 10 15 20

01

23

45

67

obse

rved

/ ex

pect

ed Transcription 3, 1Transcription 3, 2Transcription 3, 3

0 5 10 15 20

01

23

45

67 Pc repressed 1

Pc repressed 2

0 5 10 15 20

01

23

45

67 Heterochromatin 1

Heterochromatin 2

0 5 10 15 20

01

23

45

67 Low signal 1

Low signal 2Low signal 3

Fly Worm

Em

bryo

sLa

rvae

distance from boundary (kb)

Supplementary Fig. 47. Chromatin context of chromatin-based topological domain boundaries. Observed occurrences of chromatin states near histone marks defined domain boundaries normalized to random expectation. Fly late embryos, worm early embryos, fly and worm L3 larvae show enrichment of promoter and active transcription states near domain boundaries and depletion of low signal and heterochromatin states, similar to the observation at Hi-C boundaries (see Supplementary Fig. 43).

Page 70: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

70��

Supplementary Table 1. Abbreviation of key cell types and developmental stages described in this study.

Species Abbreviation Description

C. elegans

EE Early embryos MXEMB Mixed embryos LTEMB Late embryos L3 Stage 3 larvae L4 Stage 4 larvae

AD no embryos Feminized adults that produce oocytes but no sperm, and therefore do not contain embryos (fem-2(b245ts) strain)

AD germline Purified germline nuclei from wildtype hermaphrodites (ojIs9strain carrying zyg-12::gfp transgene)

AD-germlineless AD without germline (glp-4(bn2ts) strain)

D. melanogaster

EE Early embryos (2-4hr) LE Late embryos (14-16hr) L3 Third instar larvae AH Adult heads ES5,ES10,ES14 Embryonic stages 5, 10, and 14, respectively S2 S2-DRSC cell line: derived from late embryonic stage Kc Kc157 cell line: dorsal closure stage BG3 ML-DmBG3-c2 cell line: central nervous system, derived from L3 Clone 8 CME W1 Cl.8+ cell line: dorsal mesothoracic disc

H. sapiens

H1-hESC Embryonic stem cells GM12878 B-lymphocytes K562 Myelogenous leukemia cell line A549 Epithelial cell line derived from a lung carcinoma tissue HeLa-S3 Cervical carcinoma cell line HepG2 Hepatocellular carcinoma HSMM Skeletal muscle myoblasts HSMMtube Skeletal muscle myotubes differentiated from the HSMM cell line HUVEC Human umbilical vein endothelial cells IMR90 Fetal lung fibroblasts NH-A Astrocytes NHDF-Ad Adult dermal fibroblasts NHEK Epidermal keratinocytes NHLF Lung fibroblasts Osteobl Osteoblasts (NHOst)

More information on the cell types and stages can be found at the project websites:

Details of D. melanogaster cell lines: https://dgrc.cgb.indiana.edu/project/index.html

Details of H. sapiens: http://encodeproject.org/ENCODE/cellTypes.html

Page 71: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

71��

Supplementary Table 2. List of protein names used in this study.

Name used Official name Name

used Official name Name

used Official name

human fly worm human fly worm human fly worm CHD1 CHD1 �� CBX2 CBX2 � �� AMA-1 AMA-1CHD2 CHD2 �� CBX3 CBX3 �� �� ASH-2 ASH-2CHD3/MI-2/LET-418 CHD3 MI-2 LET-418 CBX8 CBX8 � �� CEC-3 CEC-3 CBP/CBP-1 CREBBP CBP CBP-1 CEBPB CEBPB �� �� CEC-7 CEC-7CTCF CTCF CTCF �� CHD7 CHD7 � �� COH-1 COH-1 EZH2/E(Z) EZH2 E(Z) �� CTCFL CTCFL �� �� COH-3 COH-3HDAC1/RPD3/HAD-1 HDAC1 RPD3 HDA-1 REST REST � �� DPL-1 DPL-1 HP1A SU(VAR)205 �� SAP30 SAP30 �� �� EFL-1 EFL-1HP1B/HPL-2 HP1B HPL-2 SIRT6 SIRT6 � �� EPC-1 EPC-1 HP1C HP1C �� NCOR NCOR1 �� �� HCP-3 HCP-3HP2 HP2 �� NSD2 WHSC1 � �� HCP-4 HCP-4 HP4 HP4 �� P300 EP300 �� �� HIM-17 HIM-17 KDM1A KDM1A SU(VAR)3-3 �� PCAF KAT2B � �� HIM-3 HIM-3 KDM2 KDM2 T26A5.5 PHF8 PHF8 �� �� HIM-5 HIM-5 KDM4A KDM4A KDM4A �� RBBP5 RBBP5 � �� HIM-8 HIM-8 KDM5A KDM5A �� �� ACF1 �� ACF1 �� HTP-3 HTP-3KDM5B KDM5B � �� ASH1 ASH1 �� HTZ-1 HTZ-1 KDM5C KDM5C �� �� BEAF BEAF-32 �� IMB-1 IMB-1 RNF2/RING RNF2 SCE �� CG10630 BLANKS �� KLE-2 KLE-2 HDAC11 HDACX �� BRE1 BRE1 �� LEM-2 LEM-2HDAC2 HDAC2 �� CHRO CHRO �� LIN-35 LIN-35 HDAC3 HDAC3 �� CP190 CP190 �� LIN-37 LIN-37 HDAC4a HDAC4 �� GAF GAF �� LIN-52 LIN-52 HDAC6 HDAC6 HDAC6 �� ISWI ISWI �� LIN-53 LIN-53 HDAC8 HDAC8 �� JIL-1 JIL-1 �� LIN-54 LIN-54 MOD(MDG4) MOD(MDG4) �� LBR LBR �� LIN-61 LIN-61 NURF301/NURF-1 E(BX) NURF-1 MBD-R2 MBD-R2 �� LIN-9 LIN-9 PR-SET7 PR-SET7 �� MLE MLE �� MAU-2 MAU-2SMC3 SMC3 CAP �� MOF MOF �� MES-4 MES-4 SMARCA4 SMARCA4 �� �� MRG15 MRG15 �� MRE-11 MRE-11SU(HW) SU(HW) �� MSL-1 MSL-1 �� MIS-12 MIS-12 SU(VAR)3-7 SU(VAR)3-7 �� PC PC �� MIX-1 MIX-1 SU(VAR)3-9 SU(VAR)3-9 �� PCL PCL �� MRG-1 MRG-1 SUZ12 SUZ12 �� �� PHO PHO �� MSH-5 MSH-5SETDB1 SETDB1 � �� PIWI PIWI �� REC-8 REC-8 DPY-26 DPY-26 POF POF �� RPC-1 RPC-1DPY-27 DPY-27 PSC PSC �� SCC-1 SCC-1 DPY-28 DPY-28 RHINO RHINO �� SDC-1 SDC-1DPY-30 DPY-30 SFMBT SFMBT �� SDC-2 SDC-2 LIN-15B LIN-15B SPT16 DRE4 �� SDC-3 SDC-3TAG-315 TAG-315 TOP2 TOP2 �� SMC-4 SMC-4 MYS3 LSY-12 WDS WDS �� SMC-6 SMC-6NPP-13 NPP-13 XNP XNP �� ZIM-1 ZIM-1 PQN-85 PQN-85 ZW5 ZW5 �� ZIM-3 ZIM-3 RAD-51 RAD-51 TAF-1 TAF-1 ZFP-1 ZFP-1

�� �� �� TBP-1 TBP-1 ZHP-3 ZHP-3

Some of the names used in this study (highlighted in red) are different from their official names.

Page 72: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

72��

Supplementary Table 3. Overlap of DHS-based and p300-peak-based enhancers in human cell lines. The analysis generally identifies more DHS-based enhancers than p300-based enhancers. Between 60% and 90% of the p300-enhancers are within 500 bp of a DHS-based enhancer, suggesting that p300 binding sites are generally in DHSs.

DHS enhancer

w/ p300 enhancerwithin 100bp

w/ p300 enhancerwithin 500bp

p300enhancer

w/ DHS enhancerwithin 100bp

w/ DHS enhancerwithin 500bp

GM12878 40531 14094 (35%) 19190 (47%) 29108 14067 (48%) 18391 (63%)

H1-hESC 73496 1973 (3%) 3404 (5%) 3986 2007 (50%) 3139 (79%)

K562 69865 27521 (39%) 34714 (50%) 43659 27489 (63%) 33449 (77%)

HeLa-S3 63189 16784 (27%) 21515 (34%) 22861 16762 (73%) 20217 (88%)

Page 73: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

73��

Supplementary references�

22� Langmead,�B.,�Trapnell,�C.,�Pop,�M.�&�Salzberg,�S.�Ultrafast�and�memory�efficient�alignment�of�short�DNA�sequences�to�the�human�genome.�Genome�Biology�10,�doi:10.1186/gb�2009�10�3�r25�(2009).�

23� Li,�H.�&�Durbin,�R.�Fast�and�accurate�short�read�alignment�with�Burrows�Wheeler�transform.�Bioinformatics�25,�1754�1760�(2009).�

24� Egelhofer,�T.�A.�et�al.�An�assessment�of�histone�modification�antibody�quality.�Nature�Structural�&�Molecular�Biology�18,�91�93,�doi:10.1038/nsmb.1972�(2011).�

25� Kundaje,�A.�et�al.�Adaptive�calibrated�measures�for�rapid�automated�quality�control�of�massive�collections�of�ChIP�seq�experiments.�Submitted�(2013).�

26� Kharchenko,�P.�V.,�Tolstorukov,�M.�Y.�&�Park,�P.�J.�Design�and�analysis�of�ChIP�seq�experiments�for�DNA�binding�proteins.�Nat�Biotech�26,�1351�1359,�doi:10.1038/nbt.1508�(2008).�

27� Zhang,�Y.�et�al.�Model�based�Analysis�of�ChIP�Seq�(MACS).�Genome�Biology�9,�doi:10.1186/gb�2008�9�9�r137�(2008).�

28� Adams,�M.�D.�et�al.�The�genome�sequence�of�Drosophila�melanogaster.�Science�(New�York,�N.Y.)�287,�2185�2195�(2000).�

29� Song,�J.�et�al.�Model�based�analysis�of�two�color�arrays�(MA2C).�Genome�Biology�8,�doi:10.1186/gb�2007�8�8�r178�(2007).�

30� Larschan,�E.�et�al.�X�chromosome�dosage�compensation�via�enhanced�transcriptional�elongation�in�Drosophila.�Nature�471,�115�118,�doi:10.1038/nature09757�(2011).�

31� Core,�L.�J.,�Waterfall,�J.�J.�&�Lis,�J.�T.�Nascent�RNA�sequencing�reveals�widespread�pausing�and�divergent�initiation�at�human�promoters.�Science�(New�York,�N.Y.)�322,�1845�1848,�doi:10.1126/science.1162228�(2008).�

32� Thomas,�S.�et�al.�Dynamic�reprogramming�of�chromatin�accessibility�during�Drosophila�embryo�development.�Genome�Biology�12,�doi:10.1186/gb�2011�12�5�r43�(2011).�

33� Tolstorukov,�M.�Y.�et�al.�Histone�variant�H2A.Bbd�is�associated�with�active�transcription�and�mRNA�processing�in�human�cells.�Molecular�cell�47,�596�607,�doi:10.1016/j.molcel.2012.06.011�(2012).�

34� Tolstorukov,�M.�Y.,�Kharchenko,�P.�V.,�Goldman,�J.�A.,�Kingston,�R.�E.�&�Park,�P.�J.�Comparative�analysis�of�H2A.Z�nucleosome�organization�in�the�human�and�yeast�genomes.�Genome�Research�19,�967�977,�doi:10.1101/gr.084830.108�(2009).�

35� Harrow,�J.�et�al.�GENCODE:�The�reference�human�genome�annotation�for�The�ENCODE�Project.�Genome�Research�22,�1760�1774,�doi:10.1101/gr.135350.111�(2012).�

36� Chen,�R.�A.�et�al.�The�landscape�of�RNA�polymerase�II�transcription�initiation�in�C.�elegans�reveals�promoter�and�enhancer�architectures.�Genome�Res�23,�1339�1347,�doi:10.1101/gr.153668.112�(2013).�

37� Hothorn,�T.,�Buehlmann,�P.,�Kneib,�T.,�Schmid,�M.�&�Hofner,�B.�mboost:�Model�Based�Boosting,�R�package�version�2.�Journal�of�Machine�Learning�Research�11,�2109�2113�(2010).�

38� Ikegami,�K.,�Egelhofer,�T.�A.,�Strome,�S.�&�Lieb,�J.�D.�Caenorhabditis�elegans�chromosome�arms�are�anchored�to�the�nuclear�membrane�via�discontinuous�association�with�LEM�2.�Genome�Biology�11,�doi:10.1186/gb�2010�11�12�r120�(2010).�

39� van�Bemmel,�J.�G.�et�al.�The�insulator�protein�SU(HW)�fine�tunes�nuclear�lamina�interactions�of�the�Drosophila�genome.�PloS�one�5,�doi:10.1371/journal.pone.0015013�(2010).�

Page 74: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

74��

40� Guelen,�L.�et�al.�Domain�organization�of�human�chromosomes�revealed�by�mapping�of�nuclear�lamina�interactions.�Nature�453,�948�951,�doi:10.1038/nature06947�(2008).�

41� Bishop,�E.�P.�et�al.�A�map�of�minor�groove�shape�and�electrostatic�potential�from�hydroxyl�radical�cleavage�patterns�of�DNA.�ACS�chemical�biology�6,�1314�1320,�doi:10.1021/cb200155t�(2011).�

42� Sohn,�K.�A.,�Ghahramani,�Z.�&�Xing,�E.�P.�Robust�Estimation�of�Local�Genetic�Ancestry�in�Admixed�Populations�using�a�Non�parametric�Bayesian�Approach.�Genetics�191,�1295�1308�(2012).�

43� Van�Gael,�J.,�Saatci,�Y.,�Teh,�Y.�W.�&�Ghahramani,�Z.�in�Proc.�Int.�Conf.�on�Machine�Learning.��1088�1095.�

44� Hoffman,�M.�M.�et�al.�Unsupervised�pattern�discovery�in�human�chromatin�structure�through�genomic�segmentation.�Nature�methods�9,�473�476,�doi:10.1038/nmeth.1937�(2012).�

45� Ernst,�J.�&�Kellis,�M.�ChromHMM:�automating�chromatin�state�discovery�and�characterization.�Nature�Methods�9,�215�216,�doi:10.1038/nmeth.1906�(2012).�

46� Deal,�R.�B.,�Henikoff,�J.�G.�&�Henikoff,�S.�Genome�wide�kinetics�of�nucleosome�turnover�determined�by�metabolic�labeling�of�histones.�Science�328,�1161�1164,�doi:10.1126/science.1186777�(2010).�

47� Visel,�A.�et�al.�ChIP�seq�accurately�predicts�tissue�specific�activity�of�enhancers.�Nature�457,�854�858,�doi:10.1038/nature07730�(2009).�

48� Arnold,�C.�D.�et�al.�Genome�Wide�Quantitative�Enhancer�Activity�Maps�Identified�by�STARR�seq.�Science�(New�York,�N.Y.),�doi:10.1126/science.1232542�(2013).�

49� He,�H.�H.�et�al.�Nucleosome�dynamics�define�transcriptional�enhancers.�Nat�Genet�42,�343�347,�doi:10.1038/ng.545�(2010).�

50� Gaffney,�D.�J.�et�al.�Controls�of�nucleosome�positioning�in�the�human�genome.�PLoS�genetics�8,�doi:10.1371/journal.pgen.1003036�(2012).�

51� Steiner,�F.�A.,�Talbert,�P.�B.,�Kasinathan,�S.,�Deal,�R.�B.�&�Henikoff,�S.�Cell�type�specific�nuclei�purification�from�whole�animals�for�genome�wide�expression�and�chromatin�profiling.�Genome�research�22,�766�777,�doi:10.1101/gr.131748.111�(2012).�

52� Teves,�S.�S.�&�Henikoff,�S.�Heat�shock�reduces�stalled�RNA�polymerase�II�and�nucleosome�turnover�genome�wide.�Genes�&�development�25,�2387�2397,�doi:10.1101/gad.177675.111�(2011).�

53� Tillo,�D.�et�al.�High�nucleosome�occupancy�is�encoded�at�human�regulatory�sequences.�PloS�one�5,�doi:10.1371/journal.pone.0009129�(2010).�

54� Henikoff,�S.,�Henikoff,�J.�G.,�Sakai,�A.,�Loeb,�G.�B.�&�Ahmad,�K.�Genome�wide�profiling�of�salt�fractions�maps�physical�properties�of�chromatin.�Genome�research�19,�460�469,�doi:10.1101/gr.087619.108�(2009).�

55� Core,�L.�J.�et�al.�Defining�the�Status�of�RNA�Polymerase�at�Promoters.�Cell�Reports�2,�1025�1035,�doi:10.1016/j.celrep.2012.08.034�(2012).�

56� Yuan,�G.�C.�et�al.�Genome�scale�identification�of�nucleosome�positions�in�S.�cerevisiae.�Science�(New�York,�N.Y.)�309,�626�630,�doi:10.1126/science.1112178�(2005).�

57� Valouev,�A.�et�al.�Determinants�of�nucleosome�organization�in�primary�human�cells.�Nature�474,�516�520,�doi:10.1038/nature10002�(2011).�

58� Schones,�D.�et�al.�Dynamic�Regulation�of�Nucleosome�Positioning�in�the�Human�Genome.�Cell�132,�887�898,�doi:10.1016/j.cell.2008.02.022�(2008).�

59� Gilchrist,�D.�A.�et�al.�Pausing�of�RNA�Polymerase�II�Disrupts�DNA�Specified�Nucleosome�Organization�to�Enable�Precise�Gene�Regulation.�Cell�143,�540�551,�doi:10.1016/j.cell.2010.10.004�(2010).�

Page 75: 239268 0 supp 2028807 mwgp8w - Harvard University · 2014. 3. 31. · Supplementary Fig. 5 Relationship of enhancer H3K27ac levels with expression of nearby genes ... macs2 bdgcmp

75��

60� Ercan,�S.,�Lubling,�Y.,�Segal,�E.�&�Lieb,�J.�D.�High�nucleosome�occupancy�is�encoded�at�X�linked�gene�promoters�in�C.�elegans.�Genome�research�21,�237�244,�doi:10.1101/gr.115931.110�(2011).�

61� Jin,�C.�et�al.�H3.3/H2A.Z�double�variant�containing�nucleosomes�mark�'nucleosome�free�regions'�of�active�promoters�and�other�regulatory�regions.�Nature�genetics�41,�941�945,�doi:10.1038/ng.409�(2009).�

62� Henikoff,�J.�G.,�Belsky,�J.�A.,�Krassovsky,�K.,�MacAlpine,�D.�M.�&�Henikoff,�S.�Epigenome�characterization�at�single�base�pair�resolution.�Proceedings�of�the�National�Academy�of�Sciences�of�the�United�States�of�America�108,�18318�18323,�doi:10.1073/pnas.1110731108�(2011).�

63� Cherbas,�L.�et�al.�The�transcriptional�diversity�of�25�Drosophila�cell�lines.�Genome�research�21,�301�314,�doi:10.1101/gr.112961.110�(2011).�

64� Filion,�G.�J.�et�al.�Systematic�Protein�Location�Mapping�Reveals�Five�Principal�Chromatin�Types�in�Drosophila�Cells.�Cell�143,�212�224,�doi:10.1016/j.cell.2010.09.009�(2010).�

65� Hayashi�Takanaka,�Y.�et�al.�Tracking�epigenetic�histone�modifications�in�single�cells�using�Fab�based�live�endogenous�modification�labeling.�Nucleic�acids�research�39,�6475�6488,�doi:10.1093/nar/gkr343�(2011).�

66� Chandra,�T.�et�al.�Independence�of�repressive�histone�marks�and�chromatin�compaction�during�senescent�heterochromatic�layer�formation.�Molecular�cell�47,�203�214,�doi:10.1016/j.molcel.2012.06.010�(2012).�

67� Bender,�L.�B.,�Cao,�R.,�Zhang,�Y.�&�Strome,�S.�The�MES�2/MES�3/MES�6�complex�and�regulation�of�histone�H3�methylation�in�C.�elegans.�Current�biology:�CB�14,�1639�1643,�doi:10.1016/j.cub.2004.08.062�(2004).�