A Overview of Information...
Transcript of A Overview of Information...
A Overview of Information Visualization
Michael McGuffin, Ph.D.
Associate Professor Department of Software and IT Engineering
École de technologie supérieure (ETS) Montreal, Canada
http://profs.etsmtl.ca/mmcguffin/
SECTION1:Visualiza2onofmul2dimensionalmul2variate(MDMV)data,alsoknownastabulardataorrela2onaldata
Someelementarymath• Giventwosets,adomain(example:R)andacodomain
(example:R),wecanformthecartesianproduit(R×R=R2)thatisthesetofallpossiblepairs(x,y)– {wild,tame}×{dog,cat}={(wild,dog),(wild,cat),(tame,dog),(tame,cat)}– A×B={(a,b)|aϵAandbϵB}– A×B×C×D={(a,b,c,d)|aϵAandbϵBandcϵCanddϵD}
• Arela0onisasubsetofacartesianproduct– Example:theequa2onx=y2correspondstoasubsetofR2;
theinequa2onx<ycorrespondstoanothersubsetofR2
• Arela2onisafunc0onifeachmemberxofthedomainhasatmostonecorrespondingmemberyinthecodomain– x=y2isnotafunc2onbecause(4,2)and(4,-2)arebothmembersofthe
rela2ondefinedbytheequa2on
• Onewaytorepresentarela2on(orafunc2on)istosimplyenumeratethepairsoftherela2oninatable
The function y = x^0.5: x y --- --- 0 0 1 1 4 2 9 3 ... The relation in a table in a relational database: Client_name Product_bought Price Date ... ------------- ----------------- ------- ------------ ----- Robert G. Trombone 500.00 2008 March 7 . Robert G. Partitions vol. 1 45.00 2008 March 7 . Lucie M. Flute 180.00 2007 Nov 11 . Cynthia S. Partitions vol. 2 40.00 2008 June 16 Jules T. Piano 6000.00 2008 Jan 10 Jules T. Partitions vol. 1 45.00 2008 Jan 13 ... A video or movie file: x y time red green blue --- --- ------- ------- ------ ------ 0 0 0 255 0 0 0 1 0 200 10 6 ... 0 0 0.1 255 50 100 0 1 0.1 255 200 190 ...
Examples of mathematical relations (i.e., multidimensional multivariate data). A relation is a subset of a cartesian product of two or more sets (example: a subset of R×R). In the examples here, each row is a tuple (member of the relation), and each column corresponds to a set within the cartesian product.
A video: x y time red green blue --- --- ------- ------- ------ ------ 0 0 0 255 0 0 0 1 0 200 10 6 ... 0 0 0.1 255 50 100 0 1 0.1 255 200 190 ...
Domains Independent variables
Dimensions Dimensions
Co-domains Dependant variables Variables (hence the term “MDMV”) Measures (database terminology)
Tuple, Multidimensional point, Vector, Row
Columns, Variables
Terminology
I’ll use the terms in bold
Output graphical representation: at most 3 spatial dimensions (often just 2), at most 1 temporal dimension (animation)
Tovisualizedata,weneedtochooseamapping
Input data: an arbitrary number of dimensions and measures
1dimension+1measure:barchart(orlinechart)
2measures:sca\erplot
2dimensions+1measure:heatmap
Red
Blue
Green
Avideo
[Gareth Daniel and Min Chen, 2003]
Fluidvisualiza2onWhich dimensions and measures would be involved in such data?
Chernofffaces(1973)(anexampleofa"glyph")
Advantage: better than text for getting an overall impression of the data, and for identifying interesting elements.
Disadvantage: the mapping from variables to faces has an effect on the salience of each variable.
Disadvantage (?): redundancy of symmetry of faces
http://kspark.kaist.ac.kr/Hum
an%20E
ngineering.files/Chernoff/life_in_LA
.jpg ht
tp://
map
mak
er.ru
tger
s.ed
u/35
5/C
hern
off_
face
.gif
Otherexamplesofglyphs
M. Ward (2002), “A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization”, Information Visualization.
Otherexamplesofglyphs
Wittenbrink, Pang, Lodha (1996) “Glyphs for Visualizing Uncertainty in Vector Fields”, IEEE TVCG.
Candles2ckcharts
• InventedbyHommaMunehisa(1724-1803),whoamassedafortunebytradingrice
• Usedinthetechnicalanalysisoffinancialmarkets• Glyphsthatshowsevolu2onover2me
http://goodasgoeldi.com/wordpress/2009/06/26/reading-a-candlestick-graph/
http://goodasgoeldi.com/wordpress/2009/06/26/reading-a-candlestick-graph/
http://goodasgoeldi.com/wordpress/2009/06/26/reading-a-candlestick-graph/
1 White candlestick 2 Black candlestick 3 Long lower shadow 4 Long upper shadow 5 Hammer 6 Inverted hammer 7 Spinning top white 8 Spinning top black 9 Doji 10 Long legged doji 11 Dragonfly doji 12 Gravestone doji 13 Marubozu white 14 Marubozu black
http://en.wikipedia.org/wiki/Candlestick_chart
Interac2vePresenta2onfromtheUN(UnitedNa2onsDevelopmentProgramme,HumanDevelopmentReport)
See also Hans Rosling’s presentations on http://www.ted.com
Notice that the “points” here are glyphs, each with a radius and a color.
Tableau:sohwareforvisualizingdatabases(Mackinlayetal.2007,tableausohware.com)
x
y
b
a
x
yx
y
x
y
Rows: b, y
Columns: a, x
This is called dimensional stacking, and has been used in previous work before Tableau, but Tableau is a really nice implementation of it, and a successful commercial product.
Tableau
• Formoreinforma2on:h\p://www.tableausohware.com/products/tourh\p://www.tableausohware.com/products/desktop/demo
Kindsofvariables• Quan2ta2ve(orcon2nuousormetric)
– Examples:x,y,2me,temperature,money
• Ordinal– Wecanputthevaluesinorder,butwecannotsay
thatonevalueisN2meslargerthansomeothervalue– Example:levelofeduca2on(highschooldiploma,bachelor’sdegree,
master’sdegree,…)
• Categorical(ornominal)– Thereisnonaturalordering(exceptmaybealphabe2cal,
whichisarbitraryandlanguage-dependent)– Example:foodgroups(meat,dairy,fruitsandvegetables,cereals)– Example:bachelor’sinmechanicalengineering,bachelor’sincivil
engineering,etc.– Example:Honda,Toyota,GM,Chrysler,etc.– Example:countriesoftheworld
Output graphical representation: at most 3 spatial dimensions (often just 2), at most 1 temporal dimension (animation) … and also many graphical variables
Visualiza2onisamapping
Input data: each variable can be {independent, dependent} (i.e., a dimension or measure) and {quantitative, ordinal, categorical}
Hierarchyofgraphicalvariables(Mackinlay,1986)
Comparison of positions (common origin) in a barchart:
Comparison of lengths (different origins) in a stacked barchart:
Comparison of positions (common origin) in a grouped barchart:
Note: stacked barcharts are better than grouped barcharts for “part-to-whole” comparisons, even though this involves a length comparison, because the totals are not even shown in a grouped barchart.
Comparison of angles (common origin, and different origins):
Comparison of "densities" (color value) :
Comparison of color hue :
Comparison of areas :
scale
scale
Hierarchyofgraphicalvariables(Mackinlay,1986)
Astudytoconfirmthehierarchy(JeffreyHeeretMichaelBostock,"CrowdsourcingGraphicalPercep2on:Using
MechanicalTurktoAssessVisualiza2onDesign",CHI2010)
Positions
Lengths
Angles
Circular Areas
Rectangular areas
(aligned, or in a treemap)
Tableau
Quan2ta2vevariableasafunc2onofacategoricalvariable
Barchart
Quan2ta2vevariableasafunc2onofaquan2ta2vevariable
Linegraph
Quan2ta2vevariableasafunc2onof(ordinal)2me
Twodependentquan2ta2vevariables
Sca\erplot
Categoricalvariableasafunc2onofaquan2ta2vevariable
Gan\chart
Categoricalindependentvariable+quan2ta2veindependentvariable
Twoindependentcategoricalvariables
Crosstabula2on(“crosstab”)
Tableau applies heuristics like these to choose graphics according to the type of data.
Barchartversuslinegraph
Which makes it easier to see changes in slope ?
Tufte (1983)
Length vs area
IEEE Canadian Review, 2009, No. 60, page 31
ExamplesfromacoursebyMarilynOstergrenatUWashington
(h\p://courses.washington.edu/info424/Week3Prac2ce_ExcelGraphs.html)
http://www.research.ibm.com/people/l/lloydt/color/color.HTM Rogowitz and Treinish, “Why Should Engineers and Scientists Be Worried About Color?”
Borland and Taylor, “Rainbow Color Map (Still) Considered Harmful”, IEEE CG&A, 27(2):14-17, 2007
Classexercise:Designoneormoregraphicstovisualizeadatasetwiththefollowingvariables:
• Carmodel:{Accord,AMCPacer,Audi5000,BMW320i,Champ,ChevNova,…}(19modelsintotal,onemodelforeachtuple;i.e.19tuples)
• Carprice:[$0,$13500]• Carmileage:[0,40]• Repairrecord:{Great,Good,OK,Bad,Terrible}• Carweight:[0,5500]
Most important variables
• Carmodel:{Accord,AMCPacer,Audi5000,BMW320i,Champ,ChevNova,…}(19modelsintotal,onemodelforeachtuple;i.e.19tuples)
• Carprice:[$0,$13500]• Carmileage:[0,40]• Repairrecord:{Great,Good,OK,Bad,Terrible}
• Carweight:[0,5500]
Most important variables
mdmv data Marks of students taking 4 courses: • Physics: 90% • Math: 95% • French literature: 65% • History: 70%
Each student is like a tuple: • (90%, 95%, 65%, 70%) • Etc.
Parallel Coordinates (Alfred Inselberg)
100%
0%
Physics Math French Literature History
(90%, 95%, 65%, 70%)
Parallel Coordinates (Alfred Inselberg)
100%
0%
Physics Math French Literature History
(90%, 95%, 65%, 70%)
(30%, 20%, 90%, 90%)
Scatterplot Matrix (SPLOM)
Physics
Math
French Literature
History
(90%, 95%, 65%, 70%)
French Literature Math
Scatterplot Matrix (SPLOM)
Physics
Math
French Literature
History
(90%, 95%, 65%, 70%)
(30%, 20%, 90%, 90%)
French Literature Math
Sca\erplotMatrixorSPLOM
Nik
las
Elm
qvis
t, P
ierr
e D
ragi
cevi
c, J
ean-
Dan
iel F
eket
e (2
008)
. “R
ollin
g th
e D
ice:
Mul
tidim
ensi
onal
Vis
ual E
xplo
ratio
n us
ing
Sca
tterp
lot M
atrix
Nav
igat
ion”
. P
roce
edin
gs o
f Inf
oVis
200
8.
Within each scatterplot, we could be interested in seeing outliers, correlations, etc.
Notice: the upper triangular half is the same as the lower triangular half, and the diagonal is not very interesting.
Sca\erplotMatrixorSPLOM
Wilkinson, Anand, Grossman, “Graph-Theoretic Scagnostics”, 2005
Note: the diagonal is used to show the names of the variables
Matrixofcorrela2oncoefficientsWhenwehavemanymeasures,wecansummarizeeachsca\erplotbycompu2ngitscorrela2oncoefficientanddisplayingonlythat,insteadof
displayingalltheindividualdatapoints.Thebelowinterfacealsoallowstheusertoselectonesca\erplotandseeazoomed-inviewfordetails.
Jinwook Seo and Ben Shneiderman, “A Rank-by-Feature Framework for …”, Proceedings of InfoVis 2004. Implemented in HCE ( http://www.cs.umd.edu/hcil/hce/ )
Sca\erDice(Elmqvistetal.2008)h\ps://www.youtube.com/watch?v=2bYIRcO-gwg
ExamplefromMatlab:“carbig.mat”
http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/mvplotdemo.html
Summaryoftechniquesforvisualizingmdmvdata
• 1dimension+1measure:barchart,linechart
• 0dimensions+2measures:sca\erplot
• 2dimensions+1measure:parallelbarcharts,heatmap
• Upto≈6dimensions+≈4measures:dimensionalstacking
• Upto≈2dimensions+≈20measures:glyphs,parallelcoordinates,sca\erplotmatrix
• Howcanwevisualizemanydimensions?
?
“NutsandBolts”dataset(72rows):
Region Month Product Sales Equipment_costs Labor_costs
0 0 0 2.76 0.92 4.30 0 1 4.92 1.64 4.30 1 0 4.2 1 4.30 1 1 8.4 2 4.30 2 0 5.28 9.6 4.30 2 1 14.52 26.4 4.30 3 0 5.016 0.88 4.30 3 1 8.436 1.48 4.3… … … … … …2 10 0 8.82 2.1 4.92 10 1 9.156 2.18 4.92 11 0 5.655 1.45 5.22 11 1 9.4965 2.435 5.2
Dimensions Measures
“NutsandBolts”dataset
“NutsandBolts”dataset
Not very useful
The SPLOM works well with mesures, but not with dimensions
“NutsandBolts”dataset
“NutsandBolts”dataset
Not very useful
Parallel coordinates work well with mesures, but not with dimensions
“NutsandBolts”dataset
Each of the above examples only show 4 of the 6 variables. Showing all 6 variables (3 dimensions and 3 mesures) would require much space.
Example view that would be possible with Tableau (dimensional stacking):
“NutsandBolts”datasetExample view that would be possible with Tableau (dimensional stacking):
The above example only shows 4 of the 6 variables. One of these is “Month”, which has 12 levels, greatly increasing space requirements.
Glyphs
dimension
dimension
measure
dimension
measure
measure
Glyphs can show many measures at once, but cannot easily show more than 2 dimensions at once.
GeneralizedPLOtMatrix(GPLOM)ofthe“NutsandBolts”dataset
Scales better than Tableau’s dimensional stacking to a large number of dimensions (and can also show many measures)
GeneralizedPLOtMatrix(GPLOM)[Im,McGuffin,Leung,IEEEInfoVis2013]
Summaryoftechniquesforvisualizingmdmvdata
• 1dimension+1measure:barchart,linechart
• 0dimensions+2measures:sca\erplot
• 2dimensions+1measure:parallelbarcharts,heatmap
• Upto≈6dimensions+≈4measures:dimensionalstacking
• Upto≈2dimensions+≈20measures:glyphs,parallelcoordinates,sca\erplotmatrix
• Upto≈12dimensions+≈12measures:generalizedplotmatrix