Data Visualization in Molecular Biology Alexander Lex July 29, 2013.
Data Visualization in Molecular Biology
description
Transcript of Data Visualization in Molecular Biology
![Page 1: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/1.jpg)
Data Visualization in Molecular BiologyAlexander Lex
July 29, 2013
![Page 2: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/2.jpg)
2
research requires understanding data
but there is so much of it…
![Page 3: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/3.jpg)
3
What is important? Where are the connections?
![Page 4: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/4.jpg)
4
methylation levels
mRNA expression
copy number status
mutation status
microRNA expression
clinical parameters
pathways
protein expression
sequencingdata
publications
![Page 5: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/5.jpg)
5
Data Visualization… makes the data accessible… combines strengths of humans and computers… enables insight… communicates
![Page 6: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/6.jpg)
6
THREE TOPICS
Teaser Picture
![Page 7: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/7.jpg)
7
Cancer Subtype Analysis
![Page 8: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/8.jpg)
8
Pathways & Experimental Data
![Page 9: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/9.jpg)
9
Managing Pathways & Cross-Pathway Analysis
![Page 10: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/10.jpg)
10
Who am I?PostDoc @ Harvard, Hanspeter Pfister’s GroupPhD from TU Graz, AustriaCo-leader ofCaleydo Project
![Page 11: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/11.jpg)
11
What is Caleydo?Software analyzing molecular biology data
Software for doing research in visualizationdeveloped in academic setting
platform for trying out radically new visualization ideas
Quest for compromise between academic prototyping and ready-to-use software
![Page 12: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/12.jpg)
12
What is Caleydo?Open source platform for developing visualization and data analysis techniques
easily extendible due to plug-in architecture
you can create your own views
you can plug-in your own algorithms
![Page 13: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/13.jpg)
13
The TeamMarc Streit Johannes Kepler University Linz, AT
Christian Partl Graz University of Technology, AT
Samuel Gratzl Johannes Kepler University Linz, AT
Nils Gehlenborg Harvard Medical School, Boston, USA
Dieter Schmalstieg Graz University of Technology, AT
Hanspeter Pfister Harvard University, Cambridge, USA
![Page 14: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/14.jpg)
14
CANCER SUBTYPE VISUALIZATION
Caleydo StratomeX
![Page 15: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/15.jpg)
15
Cancer SubtypesCancer types are not homogeneousThey are divided into Subtypes
different histology
different molecular alterations
Subtypes have serious implicationsdifferent treatment for subtypes
prognosis varies between subtypes
![Page 16: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/16.jpg)
16
Cancer Subtype Analysis
Done using many different types of data,for large numbers of patients.
![Page 17: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/17.jpg)
18
Goal: Support tumor subtype
characterization through
Integrative visual analysis of cancer genomics data sets.
![Page 18: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/18.jpg)
19
StratificationPa
tient
sTabular Data
Candidate Subtypes
Genes, Proteins, etc.
![Page 19: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/19.jpg)
20
Stratification of a Single Dataset
Cluster A1
Cluster A2
Cluster A3
Subtypes are identified by stratifying datasets, e.g.,
based on an expression pattern
a mutation status
a copy number alteration
a combination of these
![Page 20: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/20.jpg)
21
Patie
nts
Genes
Candidate Subtype /Heat Map
Header /Summary of whole Stratification
![Page 21: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/21.jpg)
22
Stratification of Multiple Datasets
Cluster A1
Cluster A2
Cluster A3
B1
B2
Tabulare.g., mRNA
Categorical,e.g., mutation status
![Page 22: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/22.jpg)
23
Stratification of Multiple Datasets
Cluster A1
Cluster A2
Cluster A3
B1
B2
Tabulare.g., mRNA
Categorical,e.g., mutation status
![Page 23: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/23.jpg)
24
Clustering of mRNA Data
Stratification on Copy Number Status
Many shared Patients
![Page 24: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/24.jpg)
25
Stratification of Multiple Datasets
Cluster A1
Cluster A2
Cluster A3
B1
B2
Tabulare.g., mRNA
Categorical,e.g., mutation status
![Page 25: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/25.jpg)
26
Stratification of Multiple Datasets
Cluster A1
Cluster A2
Cluster A3
B1
B2
Dependent Data,e.g. clinical data
Dep. C1
Dep. C2
Tabulare.g., mRNA
Categorical,e.g., mutation status
![Page 26: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/26.jpg)
27
Survival data in Kaplan Meier plots
![Page 27: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/27.jpg)
28
Detail View
![Page 28: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/28.jpg)
29Dependent Pathway
![Page 29: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/29.jpg)
30
![Page 30: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/30.jpg)
31Stratification based on
clinical variable (gender)
![Page 31: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/31.jpg)
32
How to Choose Stratifications?~ 15 clusterings per matrix~ 15,000 stratifications for copy number & mutations~ 500 pathways~ 20 clinical variablesCalculating scores for matchesRanking the results
![Page 32: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/32.jpg)
33
Query column Result column
Ranked Stratifications
Considered Datasets
![Page 33: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/33.jpg)
34
Algorithms for finding..… matching stratification… matching subtype… mutual exclusivity… relevant pathway… stratification with significant effect in survival… high/low structural variation
![Page 35: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/35.jpg)
36
PATHWAYS & EXPERIMENTAL DATA
Caleydo enRoute
Teaser Picture
![Page 36: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/36.jpg)
37
Experimental Data and Pathways
Cannot account for variation found in real-world dataBranches can be (in)activated due to
mutation,
changed gene expression,
modulation due to drug treatment,
etc.
![Page 37: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/37.jpg)
38
Why use Visualization?Efficient communication of information
A -3.4B 2.8C 3.1
D -3E 0.5F 0.3
C
B
D
F
A
E
![Page 38: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/38.jpg)
39
Experimental Data and Pathways
[Lindroos2002]
[KEGG]
![Page 39: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/39.jpg)
40
Five RequirementsIdeal visualization technique addresses all
Talking about 3 today
![Page 40: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/40.jpg)
41
R I: Data ScaleLarge number of experiments
Large datasets have more than 500 experiments
Multiple groups/conditions
![Page 41: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/41.jpg)
42
R II: Data HeterogeneityDifferent types of data, e.g.,
mRNA expression numerical
mutation statuscategorical
copy number variation ordered categorical
metabolite concentration numerical
Require different visualization techniques
![Page 42: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/42.jpg)
43
R V: Supporting Multiple Tasks
Two central tasks:Explore topology of pathway
Explore the attributes of the nodes (experimental data)
Need to support both!
C
B
D
F
A
E
![Page 43: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/43.jpg)
44
Pathway View
A
E
C
B
D
F
Pathway View
C
B
D
F
A
E
enRoute View
Concept
Group 1Dataset 1
Group 2Dataset 1
Group 1Dataset 2
B
C
F
A
D
E
![Page 44: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/44.jpg)
45
Pathway View
On-Node Mapping
Path highlighting with Bubble Sets [Collins2009]
IGF-1
low high
![Page 45: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/45.jpg)
46
enRoute View
Group 1Dataset 1
Group 2Dataset 1
Group 1Dataset 2
B
C
F
A
D
E
Path Representation
![Page 46: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/46.jpg)
47
enRoute View
Group 1Dataset 1
Group 2Dataset 1
Group 1Dataset 2
B
C
F
A
D
E
Experimental Data Representation
![Page 47: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/47.jpg)
48
Experimental Data Representation
Gene Expression Data (Numerical)
Copy Number Data (Ordered Categorical)
Mutation Data
![Page 48: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/48.jpg)
49
enRoute View – Putting All Together
![Page 49: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/49.jpg)
50
CCLE Cell lines & Cancer Drugs
Analysis by Anne Mai Wasserman
![Page 50: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/50.jpg)
51
MANAGING PATHWAYS & CROSS-PATHWAY ANALYSIS
Collaboration with AM Wassermann, M Borowsky, M Glick @NIBR
Teaser Picture
![Page 51: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/51.jpg)
52
PathwaysPartitioning in pathways is artificialPurpose: reduce complexity
“Relevant” subset of nodes and edge
Makes it hard tounderstand cross-talk
identify role of nodes in other pathways
![Page 52: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/52.jpg)
53[Klukas & Schreiber 2006]
![Page 53: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/53.jpg)
54
Solution: Contextual Subsets
![Page 54: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/54.jpg)
55
Solution: Contextual Subsets
![Page 55: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/55.jpg)
56
![Page 56: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/56.jpg)
57
Levels of Detail
Experimental data highlights
Thumbnail showing paths
![Page 57: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/57.jpg)
58
How to Select Pathways?Search PathwayFind pathways that contain focus nodeFind pathway that is similar to another one
![Page 58: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/58.jpg)
59
Ranked pathway list
Datasets
Selected Path
![Page 59: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/59.jpg)
60
Integrating enRoute
![Page 61: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/61.jpg)
62
More Information
http://caleydo.orgSoftware, Help, Project Information, Publications, Videos
![Page 62: Data Visualization in Molecular Biology](https://reader035.fdocuments.net/reader035/viewer/2022062400/5681692f550346895de076a8/html5/thumbnails/62.jpg)
63
Data Visualization In Molecular Biology
Alexander Lex, Harvard [email protected]://alexander-lex.com
?Credits:
Marc Streit, Nils Gehlenborg, Hanspeter Pfister, Anne Mai
Wasserman, Mark Borowsky,, Christian Partl, Denis Kalkofen,
Samuel Gratzl, Dieter Schmalstieg