Mapping Transcription Mechanisms from Multimodal Genomic Data
description
Transcript of Mapping Transcription Mechanisms from Multimodal Genomic Data
![Page 1: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/1.jpg)
1
Harvard Medical School
Mapping Transcription Mechanisms from Multimodal Genomic Data
Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni
Children’s Hospital Informatics ProgramHarvard-MIT Division of Health Sciences and Technology
Harvard Medical School March 10, 2010
![Page 2: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/2.jpg)
2
Harvard Medical School
Information Flow in Multimodal Genomic Data
• Genetic Variants– 100k – 1000k SNPs– 250k copy number
variations (CNVs)– 250k methylation
measurements
• Transcripts– 50k mRNA expression levels– 50k microRNA expression levels– 1.5M exon expression / splicing
Information
Information
![Page 3: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/3.jpg)
3
Harvard Medical School
Expression Quantitative Trait Loci (eQTLs)• Connection from variant to expression is an
information channel– A DNA locus is modulating the expression level of
a gene = eQTL• Cis(Trans) eQTLs are the genetic variants
located close to (far away) genes.• Identifying cis-eQTLs is easier
– Focusing on cis-eQTL reduces search space– trans eQTLs?
![Page 4: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/4.jpg)
4
Harvard Medical School
• Cancer: based on genetic modification (variants) and cellular malfunction (gene expression)
• Identification of eQTLs helps understand molecular mechanisms in cancer and provides biological insight.
• Clinical study of Acute lymphoblastic leukemia (ALL)– The most common malignancy in children, nearly one third of all
pediatric cancers.– A few cases are associated with inherited genetic syndromes (i.e., Down
syndrome, Bloom syndrome, Fanconi anemia), but the cause remains unknown.
• Data– 29 patients.– Genotyped 100,000 SNPs (Affymetrix Human Mapping 100K).– Profiled 50,000 gene expressions (Affymetrix HG-U133 Plus 2.0).
Clinical Study on Pediatric Leukemia
![Page 5: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/5.jpg)
5
Harvard Medical School
Challenges in Finding eQTLs
• Compare the distribution of each Variant to the levels of each expression measurement– Computational
• All pairs of variants vs. expressions is costly• Usually discretize expression levels (Pensa et al., BioKDD, 2004)
– Multiple testing considerations• Understanding
– Too many associations to test via laboratory science• Computational methods of biological discovery• Want to summarize main informational (biological) pathways
• Answer: Use transcriptional information
![Page 6: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/6.jpg)
6
Harvard Medical School
Transcriptional Information Channel
X Y
SNPs are modeled as binomial variables.
Expressions are modeled as log-normal variables.
• Mutual Information quantifies information flow:
• Higher MI is achieved by larger σ2 and smaller σk2 , i.e., when expression level Y is more likely modulated by SNP X.
Transcription Channel
• Info Theory:measures Entropy,H(X)
![Page 7: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/7.jpg)
7
Harvard Medical School
• Transcript Y is modulated by SNP X:
• Transcript Y is independent of SNP X:
![Page 8: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/8.jpg)
8
Harvard Medical School
Transcriptional Information Map
X1 Y1
Y2
X4 Y4
X5 Y5
X6
X7 Y7
Y8
X9 Y9
X8
Y3
Y6
![Page 9: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/9.jpg)
9
Harvard Medical School
ALL Transcriptional Information Map of Chr21
![Page 10: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/10.jpg)
10
Harvard Medical School
Cluster Genes and SNPs into Networks
X1 Y1
X2 Y2
X3
X4 Y4
X5 Y5
X6
X7 Y7
Y8
X9 Y9
X8
Y3
Y6
![Page 11: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/11.jpg)
11
Harvard Medical School
X1 Y1
Y2
X3
X4
Y9
X8
Cluster Genes and SNPs into Networks
• We can further infer the optimal modulation patterns using Bayesian networks.
![Page 12: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/12.jpg)
12
Harvard Medical School
• Bayesian networks are directed acyclic graphs: – Nodes correspond to random variables.– Directed arcs encode conditional probabilities of the target nodes on the source nodes.
– p(X) depends on (A,B)– p(Z|X,Y) independent of (A,B)
Bayesian Networks
A
B
X
Y Z
![Page 13: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/13.jpg)
13
Harvard Medical School
Infer Bayesian Networks in Individual Clusters
Y1
Y2
Y9• Step 1: Use TIM as the initial network.• Step 2: Bayesian network infers SNP-SNP connections.
![Page 14: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/14.jpg)
14
Harvard Medical School
A Bayesian Network Inferred from Chr21 TIM
![Page 15: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/15.jpg)
15
Harvard Medical School
Information Theoretic Network Analysis
• Find hubs, motifs, guilds, etc.– Abstract edges– Global patterns -> local patterns– Reveal emergent properties– Information theoretic approach using Data
Compression
• Alterovitz G, and Ramoni MF, “Discovering biological guilds through topological abstraction,” AMIA Annu Symp Proc, pp. 1-5, 2006.
![Page 16: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/16.jpg)
16
Harvard Medical School
Identified Fundamental Components
Reference: Alterovitz and Ramoni, AMIA Annu Symp Proc, pp. 1-5, 2006.
![Page 17: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/17.jpg)
17
Harvard Medical School
Identification of Cis- and Trans eQTL
• RIPK4, 21q22.3– Related to Downs
Syndrome– RIPK4 has 5
(trans) SNPs in q11.2 (shown as blue in the figure) affecting its expression.
RIPK4
![Page 18: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/18.jpg)
18
Harvard Medical School
Identification of Cis and Trans eQTL• CYYR1, 21q21.1
– Recently discovered. – Encodes a cysteine and
tyrosine-rich protein.– Recent study found a
correlation with neuroendocrine tumors.
– TIM shows CYYR1 modulated by SNPs across the q arm of chromosome 21.
– DSCAM related to Down’s syndrome
– DSCAM-CYYR1 interaction leads to ALL?
DSCAM
![Page 19: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/19.jpg)
19
Harvard Medical School
Complete TIM Algorithm
Infer Network in Individual
Clusters
Cluster 1
Cluster N
...
...
...
...
...
Compute Transcriptional
Information
...
...
...
...
Genetic Variant Transcript
Group Linked SNPs and Transcripts
Cluster 1
Cluster N
. . .
Network Topology
Analysis and Summary
![Page 20: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/20.jpg)
20
Harvard Medical School
Transcriptional Information Maps
• Make large multimodal genetic dataset amenable to transcriptional analysis
• Identifies– Modulation patterns between genetic variants
and transcripts.– CIS and TRANS eQTL.
• Analysis of pediatric ALL helps identify biological hypotheses regarding connection to Down’s syndrome
![Page 21: Mapping Transcription Mechanisms from Multimodal Genomic Data](https://reader035.fdocuments.net/reader035/viewer/2022062800/5681428a550346895daeb505/html5/thumbnails/21.jpg)
21
Harvard Medical School
Questions?Thanks to
Prof. Marco F. Ramoni, Dr. Hsun-Hsien Chang, Dr. Gil Alterowitz, Children’s
Hospital Informatics Program, Brigham and Women’s Hospital