Visual Analytics in Omics: why, what, how?
description
Transcript of Visual Analytics in Omics: why, what, how?
Visual Analytics in omics - why, what, how?
Prof Jan AertsSTADIUS - ESAT, Faculty of Engineering, University of Leuven, BelgiumData Visualization Lab
[email protected]@datavislab.org
creativecommons.org/licenses/by-nc/3.0/
• What problem are we trying to solve?
• What is Visual Analytics and how can it help?
• How do we actually do this?
• Some examples
• Challenges
2
A. So what’s the problem?
3
hypothesis-driven -> data-driven
Scientific Research Paradigms (Jim Gray, Microsoft)
I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.
1st 1,000s years ago empirical
2nd 100s years ago theoretical
3rd last few decades computational
4rd today data exploration
4
What does this mean?
• immense re-use of existing datasets
• much of initial analysis is exploratory in nature => what’s my hypothesis?
• biologically interesting signals may be too poorly understood to be analyzed in automated fashion
• visualization is very effective in facilitating human reasoning about complex data
• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)
5
inputfilter 1
filter 2
output A
filter 3
output B output
Opening the black box
6
A B
C
7
A B
C
8
A B
C
9
What’s my hypothesis?
10
Martin Krzywinski
11
Martin Krzywinski
12
Martin Krzywinski
B. What is Visual Analytics and how can it help?
13
14
What is visualization?
T. Munzner
15
What is visualization?
T. Munzner
cognition <=> perceptioncognitive task => perceptive task
16
• record information
• blueprints, photographs,seismographs, ...
• analyze data to support reasoning
• develop & assess hypotheses
• discover errors in data
• expand memory
• find patterns (see Snow’s cholera map)
• communicate information
• share & persuade
• collaborate & revise
Why do we visualize data?
17
pictorial superiority effect
“information”
“informa” “i”65% 1%
72hr
18
Steven’s psychophysical law
= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength
19
Accuracy of quantitative perceptual tasks
McKinlay
what/where (qualitative)how much (quantitative)
20
Accuracy of quantitative perceptual tasks
McKinlay
what/where (qualitative)how much (quantitative)
21
Accuracy of quantitative perceptual tasks
McKinlay“power of the plane”
what/where (qualitative)how much (quantitative)
22
Pre-attentive vision
= ability of low-level human visual system to rapidly identify certain basic visual properties
• some features “pop out”
• used for:
• target detection
• boundary detection
• counting/estimation
• ...
• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure
23
24
25
1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture
Limitations of preattentive vision
2. Speed depends on which channel (use one that is good for categorical; see further (“accuracy”))
26
Gestalt laws - interplay between parts and the whole
27
Gestalt laws - interplay between parts and the whole
• simplicity
• proximity
• similarity
• connectedness
• good continuation
• common fate
• familiarity
• symmetry
28
Context affects perceptual tasks
C. How do we actually do this?
30
Talking to domain experts
31
Data visualization framework
32
Card sorting
33
Tools of the trade
34
Vega - https://github.com/trifacta/vega/wiki
• html + json
37
To use vega
• Create the json file
• Create the index.html
• Run “python -m SimpleHTTPServer”
• Go to http://127.0.0.1:8000/index.html
• Get help at https://github.com/trifacta/vega/wiki
38
D. Examples
39
HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)
40
Aracari
Ryo Sakai
Bartlett C et al. BMC Bioinformatics (2012)
41
MeanderPavlopoulos et al. Nucl Acids Res (2013)
42
Georgios Pavlopoulos
ParCoordBoogaerts T et al. IEEE International Conference on
Bioinformatics & Bioengineering (2012)
Thomas Boogaerts
Endeavour gene prioritization
43
Data filtering (visual parameter setting)
TrioVis
Ryo Sakai
Sakai R et al. Bioinformatics (2013)
44
User-guided analysis
SparkNielsen et al. Genome Research (2012)
clustering
chromatin modification
DNA methylationRNA-Seq
data samples
regions of interest
45
Bret Victor - Ladder of abstration
46
E. Challenges
47
Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation
• infrastructure & architecture
• fast imprecise answers with progressive refinement
• incremental re-computation
• steering computation towards data regions of interest
48
Thank you
• Georgios Pavlopoulos
• Ryo Sakai
• Thomas Boogaerts
• Data Visualization Lab (datavislab.org)
• Erik Duval
• Andrew Vande Moere
49