1
Chromatin structure is distinct between coding and non-coding single nucleotide polymorphisms
Hongde Liu1*; Jingchen Zhai1; Kun Luo2; Lingjie Liu1
1State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
2Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, The First Affiliated Hospital of
Xinjiang Medical University, Urumqi 830054, China
Supplementary Materials
2
0.03-2.539H3K36ac
1.02.240CTCF
1.00.364H2AK5ac
1.00.100H2BK12ac
1.255-1.237H3K79me2
1.20.0722H4R3me2
1.53.246H2AZ
30.866H3K36me1
4-1.278H4K5ac
4-1.204H4K20me3
90.273H3K18ac
9-5.144H2AK9ac
9-1.139H4K8ac
110.321H3R2me2
112.370H2BK20ac
12-1.0626H4K12ac
12-0.0319H3K9ac
130.331H3K27ac
140.130H3K4ac
14-0.0648H3K79me3
140.00146H3K4me1
16-1.599H2BK5ac
16-0.228H3K4me2
170.254H2BK120ac
19-5.423H3K14ac
201.514H3K23ac
25-0.711H3K27me2
30-0.714H3K27me1
42-0.0142H3K9me1
44-2.121H4K91ac
44-0.768H3K4me3
530.433H3K9me3
54-0.356PolII
65-4.089H3K27me3
75-0.819H3R2me1
82-0.164H3K79me1
1240.395H2BK5me1
129-1.398H4K16ac
132-0.108H3K9me2
159-0.130H4K20me1
1890.052H3K36me3
1.0Const
-log10(P-value) of a single HMParameters in the classifier
Linear classifier parameters (Col.2) and discriminating capacity (Col.3) of each of HMsTable S1
3
Number of nine types of SNPs
Figure S1
A
E
B
Distribution Number
Exon 1898
Intron 6721
Inter-gene 2237
UTR5 & UTR3 3652
Total 14508
Distribution of risk SNPs in different genomic loci
663
47557
45601
538
89937
20648
50817
218436
4707162
NumberTypes of SNPs Total
Intron 4707162
Near-gene-5
Near-gene-3
UTR5
204944
UTR3
Frameshift
*Coding-synon
Missense
Nonsense
Near-gene-3Near-gene-5
TSS
5’U
TR
Exon
Intro
n
3’U
TR
TTS
2k bp 500 bp
Exon
Intro
n
Exon
5’ 3’
Intron: 46.3%
UTR5 & UTR3:25.2%
Inter-gene:15.4%
Exon: 13.1%
D
C
* Randomly selected 14508 coding-
synonymous SNPs as neutral SNPs
Percentage of risk SNPs in different genomic loci
A: Scheme indicating genomic regions for nine categories of SNPs;B: Number of SNPs in the nine categories of SNPs;C: SNP frequencies around transcription start sites (TSSs);D: Number of risk-associated SNPs (risk SNPs); the risk SNPs data were retrieved from (http://www.genome.gov/gwastudies/) (Hindorff et al., 2009);E: Distribution of risk SNPs in different genomic regions.
4
Exon
UTR5
Near-gene-3
UTR3
Near-gene-5
Intron
Figure S2Profiles of nucleosome occupancy around random genomic loci. Random genomic loci are selected in exon, intron, UTR5, UTR3 and 5’ and 3’ of genes. Profile of nucleosome occupancy is calculated around the random loci, respectively.
5
UTR5 (GM12878 Cells) UTR3 (GM12878 Cells) Coding-synon (GM12878 Cells)
Intron (GM12878 Cells)
Profiles of histones modifications near SNPs sites in lymphoblastoid cell (GM12878 cells). A-D: Histones methylations near around 5'-untranslated region (UTR5)-SNPs sites (A), 3'-untranslated region (UTR5)-SNPs sites (B), coding-synonymous SNPs sites (C) and intron-SNPs sites (D), respectively, in lymphoblastoid cells; E and F: Binding of histone acetylases and deacetylase at neutral SNPs sites (E) and risk SNPs sites (F), respectively, in CD4+ T cells.
Figure S3
A B C
D Neutral SNPs (CD4+ T Cell)
Risk SNPs (CD4+ T Cell)
E
F
6
Correlation coefficients of profiles of HMs, H2AZ and CTCF between CD4+ T cells and lymphoblastoid cells (GM12878 cells) .The profile is for 3000-bp genomic region around SNPs. The calculation is for the four types of SNPs, UTR5-SNPs, UTR3-SNPs, coding-synonymous SNPs and intron-SNPs
Figure S4
7
Histones modifications can be used to distinguish risk SNPs and neutral SNPs. A: Histones modifications that are different between risk SNPs and neutral SNPs; P-values that indicate difference significance were calculated with a two-sample t-test;B: Receiver operating characteristic (ROC) curves of the linear classifier models that identify risk SNPs. The linear classifier parameters are listed in Table S1. Features refer to chromatin marks. The features are sorted according to the different significant P-values (Table S1). The 4, 8, 16 and all features were chosen, respectively, to construct the models. Area under the ROC curve (AUC) is indicated for each of the models.
Figure S5
A
B
8
A: GC-content profiles around nine categories of SNPs;B: Average GC-content in a 2-kbp region around the nine categories of SNPs;C: Nucleosome occupancy profiles for both base transition (A/G and C/T) and base transversion (G/T, A/C, C/G and A/T) in both risk SNPs and neutral SNPs.
Figure S6
A
B
C
Top Related