PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin...

72
BIBM 2010 Tutorial: Epigenomics and Cancer PART 4: Analysis of Epigenomics Data Dec 18, 2010 Sun Kim at Indiana University

Transcript of PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin...

Page 1: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

BIBM 2010 Tutorial:Epigenomics and Cancer

PART 4: Analysis of Epigenomics Data

Dec 18, 2010

Sun Kim at Indiana University

Page 2: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.1Measuring DNA Methylation Levels

Page 3: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Three Ways to Measure DNA Methylation Levels

• Measuring DNA methylation levels using the microarray data is very difficult due to the unevenness of CpG site density.

• MIRA-sequencing without bisulfite treatment can be a cost-effective. In this case, we can use the peak-finding techniques to identify highly methylated regions.

• Single-base level sequencing of bisulfite treated DNA is most accurate and of the highest resolution, but it is expensive.

Page 4: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Huang et al. Technol Cancer Res Treat. 2010

Page 5: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

MIRA-sequencing without bisulfite treatment

• ChIP-seq with MBD2.

• Since this is ChIP-seq, we need to run the peak-finding algorithms to determine highly methylated regions. It is not known how effective current peak finding algorithms (developed for histone marks and TF) are effective for MIRA-seq.

• Since it is not bisulfite treated, it is easier to map to the reference human genome. In addition, we obtain character variation information in the peak regions (limited).

• However, we cannot measure DNA methylation level at the single base resolution and it is also limited by the MBD binding specificity.

Page 6: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

MetMap for MethylSeq

PLoS Comput Biol. 2010 Aug 19;6(8)

MethylSeq based on sequencing of Fragment ends produced by a methylation-sensitive restriction enzymes.

Page 7: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

MetMap Graphical Model

PLoS Comput Biol. 2010 Aug 19;6(8)

Page 8: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Reference for DNA Methylation Profiling

Methods in DNA methylation profiling.

Zuo T, Tycko B, Liu TM, Lin HJ, Huang TH.

Epigenomics. 2009 Dec 1;1(2):331-345.

Profiling DNA methylomes from microarray to genome-scale sequencing.

Huang YW, Huang TH, Wang LS.

Technol Cancer Res Treat. 2010 Apr;9(2):139-47.

MetMap enables genome-scale Methyltyping for determining methylation states in populations.

Singer M, Boffelli D, Dhahbi J, Schönhuth A, Schroth GP, Martin DI, Pachter L.

PLoS Comput Biol. 2010 Aug 19;6(8)

DNA methylation profiling using the methylated-CpG island recovery assay (MIRA).

Rauch TA, Pfeifer GP.

Methods. 2010 Mar 19.

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation.

Robinson MD., et al. Genome Res. 2010 Nov 2

Page 9: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.2Histone Code Data Analysis

Page 10: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Histone Code

• There are a number of known methylation and accethylation at different residues of histones.

• A combination of methylation and accethylation marks can indicate the status of active and inactive, so a combination of these marks can be seen as “code.”

Page 11: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Histone modifications: Histone Code

Nature Reviews Genetics 8, 286-298 (April 2007)

Page 12: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Human Histone Proteins

http://en.wikipedia.org/wiki/Histone

Page 13: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Histone Modification and Gene Activation

http://en.wikipedia.org/wiki/Histone

Page 14: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Modeling Histone Marks

• There are some studies that show DNA sequence-based instructive mechanisms for histone modifications.

• Since a combination of histone marks indicates a staus of being active and inactive, we can use machine-learning based modeling techniques to determine which combination of histone marks are indicates a staus of being active and inactive.

Page 15: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of regulatory elements in mammalian genomes using chromatin signatures

Won et. al. BMC Bioinformatics 2008, 9:547

Page 16: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Hidden Markov models (HMMs) on the histone modification data

• A supervised learning method to predict promoters and enhancers based on their unique chromatin modification signatures.

• Hidden Markov models (HMMs) are trained on the histone modification data for known promoters and enhancers, and

• They are then used the trained HMMs to identify promoter or enhancer like sequences in the human genome. Using a simulated annealing (SA) procedure, the most informative combination and the optimal window size of histone marks were searched.

BMC Bioinformatics 2008, 9:547

Page 17: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

ANN for Chromatin Modeling

• A computational framework for identifying functional DNA elements using chromatin signatures.

• The framework consists of a data transformation and a feature extraction step followed by a classification step using time-delay neural network.

• CSI-ANN (chromatin signature identification by artificial neural network).

• Step1: Histone data in a 100 bp region in a 2kb window is transformed to mean and energy.

• Step2: 6 histone marks X (mean, energy) is reduced to the FDA feature.

. Bioinformatics. 2010 Jul 1;26(13):1579-86

Page 18: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

CS-ANN Framework

. Each time-copy layer (from 1 to 10) receive a portion (10 points or 1000 bp segment) of the entire 2 Kb window (20 points in total);

. Bioinformatics. 2010 Jul 1;26(13):1579-86

Page 19: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

A Multivariate Hidden Markov Model for Chromatin Modeling

• use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.

• 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states.

• Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles.

Nat Biotechnol. 2010 Aug;28(8):817-25

Page 20: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

HMM with 51 states and 41 histone markers

Nat Biotechnol. 2010 Aug;28(8):817-25

Page 21: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Probabilistic Model for Histone Markers

Given data v and model parameters a, b, p, the likelihood is computed as below.Ct is an interval of 200bp and vCt denotes a combination of m(=41) markers in Ct.

Given the likelihood formula, the number of states was determined by usingthe log likelihood + Bayesian Information Criterion (BIC) model complexity score.

The model with the best BIC score was a model with 79 states, and then stateswere iteratively removed, finally selecting a model with 51 states.

Nat Biotechnol. 2010 Aug;28(8):817-25

Page 22: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Chromatin States and functional and positional enrichments.

Nat Biotechnol. 2010 Aug;28(8):817-25

Page 23: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Reference for Histone Signature Modelinghttp://en.wikipedia.org/wiki/Histone

Capturing the dynamic epigenome. Deal RB, Henikoff S. Genome Biol. 2010 Oct 8;11(10):218.

Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome Nature Genetics 39, 311 - 318 (2007)

Prediction of regulatory elements in mammalian genomes using chromatin signatures BMC Bioinformatics 2008, 9:547

Sequence signatures of nucleosome positioning in Caenorhabditis elegans. Chen K, Wang L, Yang M, Liu J, Xin C, Hu S, Yu J. Genomics Proteomics Bioinformatics. 2010 Jun;8(2):92-102.

A novel DNA sequence periodicity decodes nucleosome positioning. Chen K, Meng Q, Ma L, Liu Q, Tang P, Chiu C, Hu S, Yu J. Nucleic Acids Res. 2008 Nov;36(19):6228-36.

Discover regulatory DNA elements using chromatin signatures and artificial neural network. Firpi HA, Ucar D, Tan K. Bioinformatics. 2010 Jul 1;26(13):1579-86.

Chromatin signature of embryonic pluripotency is established during genome activation. Vastenhouw NL, Zhang Y, Woods IG, Imam F, Regev A, Liu XS, Rinn J, Schier AF. Nature. 2010 Apr 8;464(7290):922-6.

A translational signature for nucleosome positioning in vivo. Caserta M, Agricola E, Churcher M, Hiriart E, Verdone L, Di Mauro E, Travers A. Nucleic Acids Res. 2009 Sep;37(16):5309-21.

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES. Nature. 2009 Mar 12;458(7235):223-7.

Discovery and characterization of chromatin states for systematic annotation of the human genome. Ernst J, Kellis M. Nat Biotechnol. 2010 Aug;28(8):817-25.

Page 24: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.3MicroRNA Target Predictions

Page 25: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

List of Widely Used MicroRNA Target Prediction Algorithms

http://en.wikipedia.org/wiki/List_of_RNA_structure_prediction_software#ncRNA_gene_prediction_software

Page 26: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Reference for MicroRNA Target Prediction Algorithms in Table (previous slide)

• (SURVEY PAPER) Computational methods to identify miRNA targets. Hammell M. Semin Cell Dev Biol. 2010 Sep;21(7):738-44.

• (Diana-microT) Maragkakis M, Alexiou P, Papadopoulos GL, Reczko M, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K, Simossis VA, Sethupathy P, Vergoulis T, Koziris N, Sellis T, Tsanakas P, Hatzigeorgiou AG (2009). "Accurate microRNA target

• prediction correlates with protein repression levels.". BMC Bioinformatics 10: 295.

• (MicroTar) Thadani R, Tammi MT (2006). "MicroTar: predicting microRNA targets from RNA duplexes.". BMC Bioinformatics 7 Suppl

• 5: S20.

• (miTarget) Kim SK, Nam JW, Rhee JK, Lee WJ, Zhang BT (2006). "miTarget: microRNA target gene prediction using a support vecto

• r machine.". BMC Bioinformatics 7: 411.

• (PicTar) Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsk

• y N (2005). "Combinatorial microRNA target predictions.". Nat Genet 37 (5): 495–500.

• (PITA) Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007). "The role of site accessibility in microRNA target recognit

• ion.". Nat Genet 39 (10): 1278–84.

• (RNA22) Miranda KC, Huynh T, Tay Y, Ang YS, Tam WL, Thomson AM, Lim B, Rigoutsos I (2006). "A pattern-based method for the id

• entification of MicroRNA binding sites and their corresponding heteroduplexes.". Cell 126 (6): 1203–17.

• (RNAhybrid)a b Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R (2004). "Fast and effective prediction of microRNA/target duplexes". RNA 10 (10): 1507–17.

• (RNAhybrid) a b Krüger J, Rehmsmeier M (2006). "RNAhybrid: microRNA target prediction easy, fast and flexible". Nucleic Acids

• Res. 34 (Web Server issue): W451–4.

• (Sylamer) van Dongen S, Abreu-Goodger C, Enright AJ (2008). "Detecting microRNA binding and siRNA off-target effects from exp

• ression data.". Nat Methods 5 (12): 1023–5.

• (TAREF) R. Heikham and R. Shankar (2010). "Flanking region sequence information to refine microRNA target predictions.". Jour

• nal of Biosciences 35 (1): 105–18.

• (TargetScan) Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003). "Prediction of mammalian microRNA targets.". Cell 115 (7): 787–98.

• (TargetScan) Lewis BP, Burge CB, Bartel DP (2005). "Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.". Cell 120 (1): 15–20.

• (TargetScan) Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP (2007). "MicroRNA targeting specificity in mammals: determinants beyond seed pairing.". Mol Cell 27 (1): 91–105.

Page 27: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Improving MicroRNA Target Predictions

• Sequence and accessibility based microRNA target predictions have limited prediction power.

• Most importantly, microRNA-mRNA relationship cannot be static.

Page 28: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Improving MicroRNA Target Predictions Using Data from More Accurate Experimental Techniques.

• HITS-CLIP:

• Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Chi SW, Zang JB, Mele A, Darnell RB. Nature. 2009 Jul 23;460(7254):479-86.

Page 29: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Improving MicroRNA Target Predictions Using Expression Data

• Using the negative correlation at the expression level between microRNAs and their targets, we can potentially improve the prediction and also incorporate the dynamics nature of microRNA-mRNA relationship.

• Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy.Liu B, Li J, Tsykin A, Liu L, Gaur AB, Goodall GJ.BMC Bioinformatics. 2009 Dec 10;10:408.

• Identification of microRNA activity by Targets' Reverse EXpression.Volinia S, Visone R, Galasso M, Rossi E, Croce CM.Bioinformatics. 2010 Jan 1;26(1):91-7

Page 30: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy

• In order to capture all possible interactions, we split expression profiles of miRNAs and mRNAs according to sample categories, and then build Bayesian networks on separate data sets. – (i) miRNAs down-regulate mRNAs (ii) miRNAs up-regulate mRNAs

and( iii) miRNAs down-regulate mRNAs in one category, but up-regulation appears in another category.

• Interaction networks identified on individual data sets are then integrated by BN averaging procedure.

• To avoid statistically insignificant results due to small data sets, we employ bootstrapping to achieve reliable inference and integration. We call this strategy splitting and averaging of Bayesian networks (SA-BNs).

Page 31: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

An example of Bayesian Network structure learning

(a) Observations of expression of miRNA A and mRNA B have been discretized to binary values, where 0 denotes under-expressed, and 1 stands for over-expressed.

(b) Two possible structures are hypotheses of their relationship. (c) The one that received highest score supported by the data is used to represent

the interaction of miRNA A and mRNA B.

Page 32: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Computing SA-BN

Page 33: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

BMC Bioinformatics. 2009 Dec 10;10:408

Page 34: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Identification of microRNA activity by Targets' Reverse EXpression

• First, a statistical value was obtained for the coding mRNAs in the dataset/experiment. For example, in a two-class experiment (i.e. treated versus non-treated cells, or disease versus control) we used the t-value for each measured mRNA (routinely between 5000 and 10 000). Alternatively, for a quantitative analysis (i.e. time course or dose-response) we used the Spearman's rank correlation coefficient.

• Then we applied the KS test to the respective values of target and non-target genes for each miRNA in each prediction algorithm.

Bioinformatics. 2010 Jan 1;26(1):91-7

Page 35: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.4Modeling DNA Methylation

Susceptibility

Page 36: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Why Modeling DNA Methylation Susceptibility?

• DNA methylation is not random and it seems that there is an instructive mechanism for DNA methylation.

• Then what instructive mechanism? We do not know much about this.

• There is a growing evidence that DNA sequences have some instructive mechanism.

• Of course, others like microRNA or nucleosome positioning may be equally important but we will focus on DNA sequences only for this tutorial.

Page 37: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Three Resolutions of DNA Methylation Susceptibility Modeling

• Promotor region

• CpG Islands

• Site-specific– The modeling at the site specific requires a high

resolution data by sequencing.

Page 38: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Character Composition Analysis of Methylation-Suceptible vs. Resistant CpG sites

from 25 Genes in Leukemia

Handa and Jeltsch's analysis (J Mol Bio, 2005) reported flanking sequences of up to +/-four base-pairs surrounding the central CG site that were characteristic of high (5'-CTTGCGCAAG-3') and low (5’ TGTTCGGTGG-3').

Kim et al, PSB 2008

Page 39: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Predictive model for methylation susceptibility

• To determine either of two classes (methylation-susceptible vs. resistant), they used four classification methods in WEKA: SMO, Multilayer Perceptron (MP), Naive Bayes (NB), Instance Based Classifier (IBk).

Kim et al, PSB 2008

Page 40: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA

Structure (Bock et al, PLoS Genetics 2006)

Page 41: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

CpG island mapping by epigenome prediction (Bock et al, PLoS CB 2007)

• An epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility.

• By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, They identify informative DNA attributes that correlate with open versus compact chromatin structures.

• These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide.

• They derived improved maps of predicted “bona fide” CpG islands.

Page 42: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Computational prediction of methylation status in human genomic sequences (Das et. Al. PNAS, 2006)

• It computes the methylation propensity for an 800-bp region centered on a CpG dinucleotide based on specific sequence features within the region.

• The support vector machine classifier (called hdfinder) presently has a prediction accuracy of 86%, as validated with CpG regions for which methylation status has been experimentally determined.

• Using hdfinder, we have depicted the entire genomic methylation patterns for all 22 human autosomes.

Page 43: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Features for Computational prediction of methylation status in human genomic sequences (Das et. Al. PNAS, 2006)

Page 44: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

CpG island mapping by epigenome prediction (Bock et al, PLoS CB 2007)

Page 45: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Reference for DNA Methylation Status Modeling

• Predicting methylation status of CpG islands in the human brain.Fang F, Fan S, Zhang X, Zhang MQ.Bioinformatics. 2006 Sep 15;22(18):2204-9. Computational prediction of methylation status in human genomic sequences.Das R, Dimitrova N, Xuan Z, Rollins RA, Haghighi F, Edwards JR, Ju J, Bestor TH, Zhang MQ.Proc Natl Acad Sci U S A. 2006 Jul 11;103(28):10713-6.

• CpG island mapping by epigenome prediction.Bock C, Walter J, Paulsen M, Lengauer T.PLoS Comput Biol. 2007 Jun;3(6):e110.

• CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure.

Bock C, Paulsen M, Tierling S, Mikeska T, Lengauer T, Walter J (2006) PLoS Genet 2: e26

• Predicting DNA methylation susceptibility using CpG flanking sequences.Kim S, Li M, Paik H, Nephew K, Shi H, Kramer R, Xu D, Huang TH.Pac Symp Biocomput. 2008

Page 46: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.5Integrated Analysis of microRNA and

Gene Expression Data

Page 47: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

microRNAs target genes at transcription and traslation levels

microRNAs�

Exp conditionsExp conditions

mRNAsof codinggenes

Page 48: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Some Papers On The Integrated Analysis

• MicroRNA and mRNA integrated analysis (MMIA): a web tool for examining biological functions of microRNA expression.

Nam S, Li M, Choi K, Balch C, Kim S, Nephew KP.

Nucleic Acids Res. 2009 Jul 1;37(Web Server issue):W356-62. Epub 2009 May 6.

• Computational analysis of microRNA profiles and their target genes suggests significant involvement in breast cancer antiestrogen resistance.

Xin F, Li M, Balch C, Thomson M, Fan M, Liu Y, Hammond SM, Kim S, Nephew KP.

Bioinformatics. 2009 Feb 15;25(4):430-4.

• Discovery of microRNA-mRNA modules via population-based probabilistic learning.

Joung JG, Hwang KB, Nam JW, Kim SJ, Zhang BT.

Bioinformatics. 2007 May 1;23(9):1141-7.

• Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy.

Liu B, Li J, Tsykin A, Liu L, Gaur AB, Goodall GJ.

BMC Bioinformatics. 2009 Dec 10;10:408.

Page 49: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

(Xin et. al.) Step 1: Differentially Expressed microRNAs in MCF7 vs. MCF Fulvestrant Resistent Cell Lines.

Page 50: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

(Xin et. al.) Step 2: Negative Correlation in Expression Levels

Page 51: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

(Xin et. al.) Step 2: Stronger Negative Correlations for Multiple Targets or microRNAs

Page 52: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Nam et al, NAR 2009

Page 53: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Liu et al, BMC Bioinformatics 2009

Page 54: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Discovering microRNA-mRNA Modules using Genetic Programming

Joung et al, Bioinformatics 2007

Page 55: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Flowchart for Steps in Discovering microRNA-mRNA Modules using Genetic Programming

Joung et al, Bioinformatics 2007

Page 56: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Fitness Function for microRNA-mRNA Modules

Joung et al, Bioinformatics 2007

Page 57: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

4.5Putting Altogether Towards Data-

Driven Systems Biology

Page 58: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Elements for Epigenetic Study

DNA methylation

Histone modifications

Micro RNAs

Coding genesCpG islands

Transcription factors

mRNA

Long nc RNAs

Page 59: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of Gene Regulatory Network

• The paper reconstructed a regulatory module network using gene expression, microRNA expression and a clinical parameter, all measured in lymphoblastoid cell lines derived from patients having aggressive or non-aggressive forms of prostate cancer.

• The first step is a two-way clustering of genes and conditions, using a Gibbs sampling procedure.

• In the second stage, the algorithm infers a prioritized list of regulators for each cluster of co-expressed genes.

Bioinformatics. 2010 Sep 15;26(18):i638-44

Page 60: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of Gene Regulatory Network (Stage 1: 2-way clustering)

• A two-way clustering of genes and conditions, using a Gibbs sampling procedure.

Bioinformatics 25(4): 490-496 (2009)

Page 61: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of Gene Regulatory Network (Stage 2: Regulator Prediction)

Bioinformatics 25(4): 490-496 (2009)

Page 62: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Gene Regulatory Tree Example

Bioinformatics 25(4): 490-496 (2009)

Page 63: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Regulatory Networks in Prostate Cancer

Bioinformatics. 2010 Sep 15;26(18):i638-44

Page 64: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of Epigenetically Regulated Genes in Breast Cancer Cell Lines

BMC Bioinformatics. 2010 Jun 4;11:305

Page 65: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Two Steps for Prediction of Epigenetically Regulated Genes in Breast Cancer Cell Lines

• Dimensional reduction by clustering

– Clustering on the basis of proximity:

• Methylation sites adjoining within 2kb region are aggregated to form distinct regions of more than 2kb in length.

– Clustering on the basis of similarity:

• K-spectral clustering on the largest principal components (99% variance of the data).

• Methylation-(gene)Expression association

– E =ae-bM + c where E(expression level) is modeled by M(ethylation level) fitting the model parameters (a, b, and c) to the data.

Page 66: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Prediction of Epigenetically Regulated Genes in Breast Cancer Cell Lines (Subnetwork enrichment)

BMC Bioinformatics. 2010 Jun 4;11:305

• Subnetwork enrichment analysis of the predicted 58 genes through Pathway Studio identified 35 common regulators of the type "Pathway" with 6 or more predicted genes.

Page 67: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Putting MicroRNA in the Systems ContextTF and microRNA example

Mixed miRNA/TF Feed-Forward Loops and CircuitsDB construction pipelineBMC Bioinformatics 2010

Page 68: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Reference for Integrated Analysis• Prediction of a gene regulatory network linked to prostate cancer from

gene expression, microRNA and clinical data.Bonnet E, Michoel T, Van de Peer Y.Bioinformatics. 2010 Sep 15;26(18):i638-44

• (Method for the above)

Anagha Joshi, Riet De Smet, Kathleen Marchal, Yves Van de Peer, Tom Michoel: Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics 25(4): 490-496 (2009)

• Prediction of epigenetically regulated genes in breast cancer cell lines.Loss LA, Sadanandam A, Durinck S, Nautiyal S, Flaucher D, Carlton VE, Moorhead M, Lu Y, Gray JW, Faham M, Spellman P, Parvin B. BMC Bioinformatics. 2010 Jun 4;11:305.

• CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse. Friard O, Re A, Taverna D, De Bortoli M, Corá D. BMC Bioinformatics. 2010 Aug 23;11:435.

• Combinatorial regulation of transcription factors and microRNAs. Su N, Wang Y, Qian M, Deng M. BMC Syst Biol. 2010 Nov 8;4:150.

Page 69: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

More discussion on microRNA targets and interactions

Page 70: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

3’UTR and microRNA Target

• Many microRNA target prediction algorithms rely on targets in 3’UTR. Indeed this is important.

– Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Mavr C, Bartel DP. Cell. 2009 Aug 21;138(4):673-84

– A Variant in a MicroRNA complementary site in the 3' UTR of the KIT oncogene increases risk of acral melanoma. Godshalk SE, et al. Oncogene. 2010 Nov 29.

• But we may have to consider full gene transcripts.

Page 71: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

Isomirs: Character Variations in Target Sequences

Character variations at the 3’ end show bigger variations of expression levels (B)than variations at the 5’ end (A), measured by three different qPCR methods.

Complexity of the microRNA repertoire revealed by next-generation sequencing,RNA. 2010 Nov;16(11):2170-80

Page 72: PART 4: Analysis of Epigenomics Data€¦ · A Multivariate Hidden Markov Model for Chromatin Modeling • use a multivariate Hidden Markov Model to reveal 'chromatin states' in human

TF, MicroRNA, and Targets• Proc Natl Acad Sci U S A. 2010 Nov 16;107(46):19879-84. Epub 2010 Oct 27. A

network connecting Runx2, SATB2, and the miR-23a~27a~24-2 cluster regulates the osteoblast differentiation program.

• Here we report that Runx2, a transcription factor essential for osteoblastogenesis, negatively regulates expression of the miR cluster 23a∼27a∼24-2. Overexpression, reporter, and chromatin immunoprecipitation assays established the presence of a functional Runx binding element that represses expression of these miRs.

• …….The biological significance of Runx2 repression of this miR cluster is that each miR directly targets the 3' UTR of SATB2, which is known to synergize with Runx2 to facilitate bone formation. The findings suggest Runx2-negative regulation of multiple miRs by a feed-forward mechanism to cause derepression of SATB2 to promote differentiation. We find also that miR-23a represses Runx2 in the terminally differentiated osteocyte, representing a feedback mechanism to attenuate osteoblast maturation.

• …..Taken together, we have established a regulatory network with a central role for the miR cluster 23a∼27a∼24-2 in both progression and maintenance of the osteocyte phenotype.