Taxonomy-Based Glyph Design
-
Upload
eamonn-maguire -
Category
Business
-
view
633 -
download
1
description
Transcript of Taxonomy-Based Glyph Design
Taxonomy-Based Glyph Design– with a Case Study on Visualizing Workflows of Biological
ExperimentsEamonn Maguire, Philippe Rocca-Serra, Susanna-Assunta Sansone, Jim Davies and Min Chen
University of Oxford, UK
The Road Map…
Aid quicker exploration and comparison of experimental workflows for biologists performing experiments and curators who validate them.
Getting there
Image Sources: http://kspark.kaist.ac.kr/Human%20Engineering.files/Chernoff/Chernoff%20Faces.htm
Healey, C et al. Perceptually-Based Brush Strokes for Nonphotorealistic Visualization 2004
e.g. Chernoff Faces e.g. Glyph based on rectangles using color and orientation.
Glyph: A glyph is a small visual object composed of a number of visual channels which can be used independently as well as constructively to depict attributes of a data record.
But, we have a problem…
We have 21,000 studies with > 500,000 individual experiments giving > 60 processes (actions on materials) and >4000 inputs/outputs to those processes.
Creating 1000s of glyphs for each individual concept is simply not scalable.
We need a systematic process for glyph creation based on the properties of these concepts.
Some of the ~4000 conceptslabelingnucleic acid extractionhybridizationfeature extractionbioassay data transformationGrowthcultured cellssaccharomyces cerevisiae scr101 poolimage acquisitionbehavioral stimuluspurifypcr amplificationnormalizationlowess group normalizationextractionScanningfeature extraction and analysisimmunoprecipitationcompound based treatmenttransformation protocollinear amplificationfresh frozen tissuesaccharomyces cerevisiae bqs252. exponential growth in ypd.pool of 10 different cellinespool of male dna derived from bloodprap1. exponential growth in ypd.total genomic dna from fy1679 yeast strainsaccharomyces cerevisiae y01089 (euroscarf) tpk2 mutant. exponential growth in ypd.saccharomyces cerevisiae scr101 prap1 sil. exponential growth in ypd.sjy6 yeast strain, exponential growth in ypd.saccharomyces cerevisiae bqs252. 2h after the change to ypgal medium.sjy6 yeast strain,7 h after adding doxycycline to exponential growing cells in ypd in order to deplete spt16p.saccharomyces cerevisiae bqs252. exponential growth 14.5 h after the change to ypgal medium.sjy6 yeast strain,5 h after adding doxycycline to exponential growing cells in ypd in order to deplete spt16p.cfmnpv viral genomic dnacf203 insect cells infected with cfmnpvsaccharomyces cerevisiae y01261 (euroscarf) tpk1 mutant. exponential growth in ypdsaccharomyces cerevisiae bqs252. exponential growth in ypdsaccharomyces cerevisiae scr101 prap1sil. exponential growth in ypd.saccharomyces cerevisiae arg1 (rpb1 myc). exponential growth in ypdsaccharomyces cerevisiae sjy25 (spt16 myc). exponential growth in ypdin vitro genomic dna labeled with 33p in a yeast strain exponentially growing in ypdin vivo labeled total rna of saccharomyces cerevisiae y01261 (euroscarf) tpk1 mutant. exponential growth in ypdaspergillus nigerstenocarpella maydisaspergillus clavatusfusarium verticillioidesfusarium equisetibipolaris sorokinianafusarium acuminatumaspergillus versicolorpenicillium corylophilumpenicillium islandicumclaviceps purpureapenicillium expansumfusarium oxysporumfusarium semitectumfusarium compactumfusarium subglutinansfusarium avenaceumfusarium sporotrichioidesfusarium anthophilumeurotium amstelodamidrechsleraaspergillus multicoloraspergillus carbonariuseurotium chevalieripenicillium italicumfusarium sambucinumfusarium crookwellensefusarium graminearumaspergillus parasiticusfusarium rugulosumfusarium globosumfusarium decemcellularepenicillium fellutanumpenicillium rugulosumalternaria alternatacurvularia lunatapenicillium viridicatumfusarium solanipenicillium funiculosumpithomyces chartarumovine (dairy sheep)humansheepcaprine (goat)bovine (cow)acyrthosiphon pisum (clone pll01)mousemedicago sativa cv. europechickencaenorhabditis elegans l1 larvae (strain n2)chenopodium quinoadatura stramoniumnicotiana tabacummedicago sativa cv. locationlcockroachbalb/c mousemedicago truncatulalocustundeterminedmicrotus fortisrabbitglycine maxchenopodiun amaranticolortaraxacum officinalepanax notoginsengphaseolus vulgariszinnia elegansreferencecucumis sativustetragonia expansahomo sapiens bloodhomo sapiens livergenomic dnacdnarnagenomic dnareverse transcribed total rnamicrornaall fraction rnashort fraction rnanucleosomalgenomic dnahomo sapiensmus musculusrattus norvegicussaccharomyces cerevisiaearabidopsis thalianadrosophila melanogastercaenorhabditis elegansdissectionloess group normalizationloess scaled group normalizationmedian log normalizationreverse transcriptionharvesttransformation protocol seriessplitinoculaterna extractionstress inductionirradiateincubatequantificationfractionatebiological fluid collectionreplicate analysisinfectsamplinggenetic modificationlowess normalizationstarvationtransfectdifferential expressioncell purificationbioassay data normalizationpool preparationcell isolationbiopsynormalization checktreatment checknormalization by scalingcell subpopulation isolationtransfectiontotal bacteria rna extractionknock downmrna extractionextractdye swap mergecell isolation, sort
Solution outline
Create a taxonomy A structured hierarchical arrangement of concepts.One or more concepts represented by leaf nodes
Map taxonomy to visual channels.We can create a glyph for items based on the position in taxonomy.
Higher levels in the taxonomy will command better visual channels.
Order Visual Channels…Color > shape > size > orientation > texture.
And provide design guidelines.
Creating the taxonomy
Creating the taxonomy…input format
In each scheme, there are sub-classifications (4 in S1). If a concept can be classified with this classification, it is assigned a 1, otherwise 0.
Creating the taxonomy…general workings
The algorithm runs recursively, selecting each best scheme S and attempting to sub-classify each classification C
But how do we select the best scheme?
100% coverage yields value of 1The more concepts a scheme can classify, the better.
Metric 1: Coverage
Higher occurrence yields value closer to 1
Metric 2: Potential Use
Low standard deviation in number of concepts in each classification yields value closer to 1A balanced tree is desirable and prevents a tree from having excessive height (greater height = need for more visual channels).
Metric 3: Sub tree balance
Low number of classes yields value close to 1Schemes with a high number of subclasses are penalized since a high number of subclasses would mean a high number of levels to map to with the selected visual channels.
Metric 4: Number of Subclasses
Only consider subclasses that are used.
Application to our case study
We have 8 schemes shown here, focusing mainly on processes.
Application to our case study
1. Concepts were extracted 2. Categories were created by a domain
expert.3. Taxonomy algorithm applied.4. Taxonomy on the right created >>
Next we attempt to order visual channels and create design guidelines.
We have 21,000 biological studies with > 500,000 individual experiments giving > 60 processes (actions on materials) and >4000 inputs/outputs to those processes.
Guidelines for design
Ordering Visual Channels
Bertin’s Visual Channels
Associativefacilitate grouping of all elements of a variable despite differing values:
texture, color, orientation and shape.
Selective facilitate selection of one category of data and ignore others:
texture, color, orientation, shape, planar, size & brightness.
Orderedfacilitate visual ranking of data:
texture, color & size.
Quantitativepermits extraction of ratios without the need to inspect a legend:
planar & size.
Pop-out effect(Williams 67, Duncan 89, Luck 94, Bertin 83, Green 98, Wolfe 89, Treisman 77, Palmer 77, Parkhurst 02)
Visual Hierarchy
In particular we look at:1.top-down (global);2.salient feature detection of edges, points and colors.
Since they are most relevant to overview level processing of a scene.
[Palmer 77, Navon 77, Shor 71, Love 99, Kinchla 79]
Metaphor is important!
Material Combination
Material Amplification
Material Separation
Material Collection
From Taxonomy to Visual Channels
Visual MappingSelect design options based on the guidelines and the level of the classification in the taxonomy and map the scheme to selected Visual Channels and structure
C1 C3 C2
In Vitro In Vivo In Silico
Visual Mapping
Crush test.
We should be able to distinguish high-level classes in the taxonomy even at low resolutions.
Schemes high up in the taxonomy should be distinguishable at low resolution...overview level.
Implementation & Dissemination
Towards interoperable bioscience dataSansone et al, 2012Nature Genetics
Contributions
1. Systematic Approach For Glyph Design
• Ordering of concepts• Ordering of visual channels according
to psychological literature• Mapping between them
2. Application• Biological Metadata• Biological Workflows
Questions?
Funders
Thanks to the organizers and everyone here for listening!