Information visualization, data mining and machine...

159
Information visualization, data mining and machine learning Fabrice Rossi Télécom ParisTech April 2011

Transcript of Information visualization, data mining and machine...

Page 1: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information visualization, data mining andmachine learning

Fabrice Rossi

Télécom ParisTech

April 2011

Page 2: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Goals of this lecture

1. To give an introduction to Information Visualization (Infovis)I enhancement methods for classical displaysI specialized displaysI why you should leverage infovis in your everyday work

2. To outline links between Infovis and Machine LearningI why do they exist?I current solutionsI open research problems

3. To give examples of successful joint researches:I Machine learning methods designed for visualizationI Visualization of machine learning algorithm results

Page 3: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 4: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information Visualization

The use of computer-supported interactive, visualrepresentation of abstract data to amplify cognition

Card, Mackinlay & Shneiderman

Human preattentive processing capabilitiesI non conscious processing (no thinking involved)I low level visual systemI extremely fast: 200 msI scalable (no browsing⇒ sublinear scaling)I feature type must match data type (e.g., hue is suitable for

categories, less for real values)

Page 5: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information Visualization

The use of computer-supported interactive, visualrepresentation of abstract data to amplify cognition

Card, Mackinlay & Shneiderman

Human preattentive processing capabilitiesI non conscious processing (no thinking involved)I low level visual systemI extremely fast: 200 msI scalable (no browsing⇒ sublinear scaling)I feature type must match data type (e.g., hue is suitable for

categories, less for real values)

Page 6: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Preattentive featuresHue

Page 7: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Preattentive featuresShape

Page 8: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information Visualization

The use of computer-supported interactive, visualrepresentation of abstract data to amplify cognition

Card, Mackinlay & Shneiderman

Tool metaphor (hammer, microscope, etc.)I extending user possibilities:

I more scalable processing (speed and/or volume)I details enhancementI multi-source fusionI etc.

I under user control

Page 9: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

File explorer

Page 10: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

File explorer

Page 11: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

File explorer

Page 12: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information Visualization

The use of computer-supported interactive, visualrepresentation of abstract data to amplify cognition

Card, Mackinlay & Shneiderman

Overview first, zoom and filter, and then details-on-demand

Information Seeking Mantra, Shneiderman

InteractivityI enables user control:

I exploration (panning)I zoomingI 3D world

I reduces clutter on the screen

Page 13: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Information Visualization

The use of computer-supported interactive, visualrepresentation of abstract data to amplify cognition

Card, Mackinlay & Shneiderman

Abstract dataI digital data with no real world “visual” counterpart, e.g.:

I soundI high dimensional vectors

I no “natural” visual representation of the data, e.g.:I requests received by a web serverI file systemsI source code

Infovis 6= scientific visualization (Scivis)

Page 14: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Data visualization

Closely related to information visualizationI data ' a data table (N objects described by P variables)I no real consensus on a boundaries between infovis and datavis:

I datavis ⊂ infovis: data ' preprocessed information (turned intonumerical quantities)

I datavis ∩ infovis = ∅ : information (in infovis) is explicitly nonnumerical (is it?)

I infovis ⊂ datavis: because data include scientific data...

Also related to statistical graphics

Page 15: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 16: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

What is infovis used for?

Some specific goalsI easier access (learning curve):

I GUI in generalI e.g., File system browsing

I productivity (doing the same things but faster):I IDE (on the fly documentation, multi-view, graphical programming,

etc.)I on the fly search (Google suggest)

I organization:I tree paradigm (sorting)I metadata (image, music, etc.)I overview

Page 17: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Visual data mining (VDM)

A.k.a. Visual Analytics and Visual Data Analysis:

Interactive visual exploration of massive data setsI cluster analysisI outlier detectionI dependency assessmentI pattern detection (repetition, sub-structure, etc.)I etc.

Interactive visualization of the results of data miningalgorithms

I parameter tuningI quality assessmentI mining on the results (e.g., meta-clustering)I etc.

Page 18: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

But...

Limited external impactMany discoveries of infovis are not used (or even known) outside theinfovis community

External “state-of-the-art”I mostly static graphicsI with broken defaults (color-wise, shape-wise, etc.)

Serious state-of-the-artI interactive graphicsI high quality static graphicsI sound defaults

Page 19: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

For instance, in default R...

Sepal.Length

2.0 3.0 4.0 0.5 1.5 2.5

4.5

5.5

6.5

7.5

2.0

3.0

4.0

Sepal.Width

Petal.Length

13

57

4.5 5.5 6.5 7.5

0.5

1.5

2.5

1 2 3 4 5 6 7

Petal.Width

Fisher/Anderson’s Iris

Page 20: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Rather than...

Sepal.Length

2.0 3.0 4.0 0.5 1.5 2.5

4.5

5.5

6.5

7.5

2.0

3.0

4.0 Sepal.Width

Petal.Length

12

34

56

7

4.5 5.5 6.5 7.5

0.5

1.5

2.5

1 2 3 4 5 6 7

Petal.Width

Fisher/Anderson’s Iris

Page 21: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 22: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Limitations of Infovis and VDM

I Visual illusionsI Distortion, occlusion, etc.I Visual semioticsI Scalability

I number of objectsI number of descriptorsI human scalabilityI computer scalability

I Art or Science?

Page 23: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Café wall illusion

http://en.wikipedia.org/wiki/File:Caf%C3%A9_wall.svg

Page 24: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Grey levels

http://web.mit.edu/persci/people/adelson/index.html

Page 25: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Grey levels

http://web.mit.edu/persci/people/adelson/index.html

Page 26: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Area and sizeEbbinghaus illusion

http://en.wikipedia.org/wiki/File:Mond-vergleich.svg

Page 27: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Area and sizeEbbinghaus illusion

http://en.wikipedia.org/wiki/File:Mond-vergleich.svg

Page 28: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Color blindness

http://en.wikipedia.org/wiki/File:Ishihara_9.png

Page 29: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

The scalability issue

I Vision is limited to 2 or 3 dimensionsI Position can be combined with other features:

I color (intensity and hue)I shape (e.g., star icon)I textureI etc.

I But fast pre-attentive processing is limited to roughly 5 combinedfeatures (for some combination only!)

I Correlating distant things is difficultI Computer screens have a “low” resolution (HD is 2 millions pixels)I Complex HD interactive display requires dedicated graphic board

and associated software (OpenGL and Direct3D, Shaderlanguages)

Page 30: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Broken preattentive featuresHue + shape

Page 31: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

How to scale?

Complementary solutionsI interactivity (zooming, distorting, details on demand, etc.)I data transformation:

I interaction between objects rather than objects themselvesI similarity between objects

I data simplification:I reduction of the number of objects (summary, clustering, etc.)I reduction of the number of characteristics (selection, projection, etc.)I compact layout: one glyph per object or one pixel per measurement

I data ordering:I positioning related things closely on the screenI one to three dimensional ordering

Obviously linked to Machine Learning (clustering, projection, etc.)

Page 32: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 33: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A classical example

Displaying a one dimensional numerical variableI dataset: N observations (Xi)1≤i≤N described by P numerical

variablesI objective: display (Xij) for a fixed j and all i

Classical solutionA histogram:

I a rough densityestimator

I enormous scalability

Sepal Length

Den

sity

4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

Page 34: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A classical example

Displaying a one dimensional numerical variableI dataset: N observations (Xi)1≤i≤N described by P numerical

variablesI objective: display (Xij) for a fixed j and all i

Classical solutionA histogram:

I a rough densityestimator

I enormous scalability

Sepal Length

Den

sity

4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

Page 35: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Histograms in detail

Visualization algorithm1. choose a number of bins and their widths:

I numerous possibilities:√

N, log N, etc.I no perfect one

2. for each bin ]a,b] compute the fraction of i such that Xij ∈]a,b]3. display a bar per bin, with an area proportional to the fraction

computed at step 2

RemarksI this is statistical learning: non parametric estimation of p(X.j)I the original data is not displayedI the drawing itself remains rather unspecified (at this point) but has

crucial consequences: color, aspect ratio, labels, position, etc.

Page 36: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Histograms in detail

Visualization algorithm1. choose a number of bins and their widths:

I numerous possibilities:√

N, log N, etc.I no perfect one

2. for each bin ]a,b] compute the fraction of i such that Xij ∈]a,b]3. display a bar per bin, with an area proportional to the fraction

computed at step 2

RemarksI this is statistical learning: non parametric estimation of p(X.j)I the original data is not displayedI the drawing itself remains rather unspecified (at this point) but has

crucial consequences: color, aspect ratio, labels, position, etc.

Page 37: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Chi & Riedl’s Operator ModelA formal view of infovis [CR98]

Production of the View from the Data:

Data Analytical Abstraction Visualization Abstraction View

Data Transformation Analytical Transformation

TransformationVisual Mapping

Arrows represent transformation operators.

Page 38: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Chi & Riedl’s Operator Model

Applied to histograms

data (Xij)1≤i≤N for a fixed janalytical abstraction K bins (ak ,bk , fk )1≤k≤K , where fk is the fraction

of objects falling in bin kvisual abstraction K adjacent bars (wk , hk )1≤k≤K , where the width

wk is proportional to bk − ak and the hk isproportional to fk

wk

view the histogram itself

RemarksI the visual mapping step gathers most of the wizard’s tricksI machine learning may operate at numerous stagesI the model includes feedback and interactivity

Page 39: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Machine learning operators

Some examples

data transformation main field: density estimation, clustering,etc.

analytical transformation second field: dimensionality reductiontechniques

visual mapping seldom used: optimal ordering and optimalcoloring

Internal operators

data stage de-noising, clustering as a pre-processinganalytical stage some form of dimensionality reduction,

feature selection, model buildingvisualization stage visual oriented transformation (e.g., second

clustering)

Page 40: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Visual Mapping Transformation

View “Language”Graphical primitives:

I coordinates (two or three)I symbols (dots, ticks, glyphs)I lines and areas (larger scale symbols)I textI color:

I Hue and SaturationI Ligthness

MappingI expression of the visual abstraction in terms of the graphical

languageI arrangement, color choice, axes, etc.

Page 41: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

View design

The difficult partI probably the most difficult part of infovisI many heated debates...I between art and scienceI tricky evaluation:

I experimental psychologyI long and complexI task oriented

Graph visualization

Page 42: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Histogram mapping

The mapping

bar a solid rectangle, noborder

layout horizontal data axis,vertical density axis

key data description,value description,horizontal grid

Sepal Length

Den

sity

4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

Rationale?I do we need color?I don’t we need bar borders?I what are we trying to show?

Page 43: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Histogram mapping

The mapping

bar a solid rectangle, noborder

layout horizontal data axis,vertical density axis

key data description,value description,horizontal grid

Sepal Length

Den

sity

4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

Rationale?I do we need color?I don’t we need bar borders?I what are we trying to show?

Page 44: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Let’s summon statistics...

DefinitionA histogram is a (rough) density estimator. Given X1, . . . ,XNdistributed according to P(X ) is approximates f (x) such thatP(X ∈ B) =

∫B f (x)dx .

“Natural” visualizationThen let’s draw f (x)!

I natural axes: horizontal for the values, vertical for the likelihoodI colored and borderless bars emphasize the area (as opposed to

the height): proper probabilistic interpretation

UseI shape analysis: symmetric? unimodal? Gaussian like?I quality control: outliers and other unexpected behaviors

Page 45: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Let’s summon statistics...

DefinitionA histogram is a (rough) density estimator. Given X1, . . . ,XNdistributed according to P(X ) is approximates f (x) such thatP(X ∈ B) =

∫B f (x)dx .

“Natural” visualizationThen let’s draw f (x)!

I natural axes: horizontal for the values, vertical for the likelihoodI colored and borderless bars emphasize the area (as opposed to

the height): proper probabilistic interpretation

UseI shape analysis: symmetric? unimodal? Gaussian like?I quality control: outliers and other unexpected behaviors

Page 46: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Density or counting?

Petal Width

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

this is not a histogram

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Counting as the main outcomeThe right hand side“histogram” displays thenumber of objects who fallinto each bin.

Colorlessbordered bars are moreadapted to inference in thiscase.

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Page 47: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Density or counting?

Petal Width

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

this is not a histogram

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Counting as the main outcomeThe right hand side“histogram” displays thenumber of objects who fallinto each bin.

Colorlessbordered bars are moreadapted to inference in thiscase.

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Page 48: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Density or counting?

Petal Width

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

this is not a histogram

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Counting as the main outcomeThe right hand side“histogram” displays thenumber of objects who fallinto each bin. Colorlessbordered bars are moreadapted to inference in thiscase.

Petal Width

Cou

nt

0.0 0.5 1.0 1.5 2.0 2.5

05

1525

35

Page 49: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Unnatural visualizationSo far, so good

X

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

1.2

X

Cou

nt

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

Oops... (this is a uniform distribution)

X

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

1.2

X

Cou

nt

0.0 0.2 0.4 0.6 0.8 1.0

050

100

150

Page 50: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Unnatural visualizationSo far, so good

X

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

1.2

X

Cou

nt

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

Oops... (this is a uniform distribution)

X

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

1.2

X

Cou

nt

0.0 0.2 0.4 0.6 0.8 1.0

050

100

150

Page 51: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

The infovis processWrapping up

The process

1. extract information from the raw data2. build an abstract visualization of this information3. map it to a view using a sound graphical language

Major pointsI a view should be adapted to some specific goalI less is more: remove interferencesI self-contained views: axes, keys, labels are part of the mapping

processI faithful views: do what you claim to do; beware of “natural

representations”

Page 52: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 53: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Another classical example

Displaying a one dimensional categorical variableI dataset: N observations (Xi)1≤i≤N described by one categorical

variable with Q possible values in V = {V1, . . . ,VQ}I zero structure on V (in particular, no order)I objective: display (Xi)1≤i≤N

Classical solution

(but a very bad one)

A pie chartI a counting like

approachI enormous scalability

with respect to N (nonewith respect to Q)

setosa

versicolor

virginica

Page 54: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Another classical example

Displaying a one dimensional categorical variableI dataset: N observations (Xi)1≤i≤N described by one categorical

variable with Q possible values in V = {V1, . . . ,VQ}I zero structure on V (in particular, no order)I objective: display (Xi)1≤i≤N

Classical solution (but a very bad one)A pie chart

I a counting likeapproach

I enormous scalabilitywith respect to N (nonewith respect to Q)

setosa

versicolor

virginica

Page 55: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Chi & Riedl’s Operator Model

Applied to pie charts

data (Xi)1≤i≤N

analytical abstraction Nq the number of i such that Xi = Vq

visual abstraction a sliced pie whose Q slices have an areaproportional to Nq

view the colored pie, with labeled slices orderedaccording to some rule

Representation principleI the quantity of interest is encoded via an area (equivalently an

angle)I inference is based on area (angle) comparisonsI in histograms, inference is based on shape analysis (skewness,

spread, etc.)I in counting histograms, inference is based on bar lengths

Page 56: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Let’s play...

Please write down your estimation of the ratio of the areas of thosedisks.

Page 57: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Let’s play...

Please write down your estimation of the ratio of the lengths of thosebars.

Page 58: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Bar chart

Another visual abstractionUsing the same counting data, replace the Q pie slices by Q bars withlength/height proportional to Nq

And the views are

setosa

versicolor

virginicasetosa versicolor virginica

010

2030

4050

Did you noticed the unequal proportions in the first occurrence of thispie chart? Can you now?

Page 59: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Bar chart

Another visual abstractionUsing the same counting data, replace the Q pie slices by Q bars withlength/height proportional to Nq

And the views are

setosa

versicolor

virginicasetosa versicolor virginica

010

2030

4050

Did you noticed the unequal proportions in the first occurrence of thispie chart? Can you now?

Page 60: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Stevens’ Power Law

Definition (Wikipedia)Stevens’ power law is a proposed relationship between the magnitudeof a physical stimulus and its perceived intensity or strength, with thegeneral form ψ(I) = kIα

Some examples in visualization

length: α ∈ [0.9;1.1]area: α ∈ [0.6;0.9]

volume: α ∈ [0.5;0.8]

We tend to underestimate large areas and to overestimate small ones.

Page 61: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Cheating with the pie...

Steve Jobs’ keynote at Macworld 2008, source:http://www.engadget.com/2008/01/15/live-from-macworld-2008-steve-jobs-keynote/

Page 62: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Cheating with the pie...

Steve Jobs’ keynote at Macworld 2008, source:http://www.engadget.com/2008/01/15/live-from-macworld-2008-steve-jobs-keynote/

Page 63: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Cheating with the pie...

Steve Jobs’ keynote at Macworld 2008, source:http://www.engadget.com/2008/01/15/live-from-macworld-2008-steve-jobs-keynote/

Page 64: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Back to the facts

With Jobs’ ordering

Other

Nokia

Motorola

Palm

Apple

RIM

Other Nokia Motorola Palm Apple RIM

010

2030

Page 65: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Back to the facts

With a more natural ordering

NokiaMotorola

Palm

Apple

Other

RIM

Nokia Motorola Palm Apple Other RIM

010

2030

Page 66: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

The lie factor

DefinitionThe lie factor is the ratio between the (relative) size of a quantity on agraphic and its (relative) size in the data.

The representation of numbers, as physically measuredon the surface of the graphic itself, should be directlyproportional to the quantities represented.

Tufte, 1991

Jobs’ keynote lie factor

real effect 19.5%/21.2% ' 0.92graphic effect roughly 1.5

lie factor roughly 1.6

Page 67: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Variations over the view

RemarksI Pie charts and Bar charts differ mostly on the visual abstraction:

area/angle versus length/heightI Cleveland and McGill have shown that lengths encode numerical

values much more efficiently than areas and therefore that youshould

Save the pies for dessert

Stephen FewI what about the view?

I common challenges for both charts: ordering, labelling, coloringI but do we need to materialize the bars?

Page 68: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Cleveland’s dot plot

Mostly a visual mapping variationI use the same visual abstraction (bars of length proportional to

counts)I with a much simpler view

Nokia

Motorola

Palm

Apple

Other

RIM

5 10 15 20 25 30 35 40

I much more scalable than the bar chart with respect to the numberof modalities

Page 69: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Ranking visual featuresCleveland & McGill

Quality of numerical encodingIn decreasing order:

1. position on a common scale2. position along identical scales but non-aligned3. length4. angle and slope5. area6. volume7. color properties (brightness, etc.)

Page 70: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Visual elements

SummaryI the visual abstraction induces the main aspects of the view:

I lengths will be mapped to positions or actual lengthsI areas will be mapped to graphical primitives (disks, squares, etc.)

with the requested areasI visual features are not born equal:

I positions and lengths are rather easily comparedI areas and angles are much more misleadingI colors (hue) are useful for differentiation not for coding quantities

I visual mapping can ruin the visual abstraction:I 3D is very dangerous: it does not preserve areas, angles, etc.I distraction, cluttering, etc. must be avoided

Page 71: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 72: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Interactivity

Core feature of information visualizationInfovis research emphasizes the practical relevance of interactivity

I exploration: panning, zooming, rotating (3D)I details on demand, distortion: reduces clutterI conditional analysis: helps testing “what if?” scenarioI user choice: parameter tuning, guided exploration

Nice bonus: some form of interactivity is mostly orthogonal to the restof the infovis pipeline

Current statusI rather limited diffusion except for basic tricks, such as 3DI still computationally demanding; for some interactions, the

hardware is there (even in smartphones!) but programming it isstill hard

I major ideas (e.g., brushing and linking) are still not mainstream

Page 73: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A simple example

The tips datasetI tips received during two and half months by a food server in the

early 90sI 7 variables including the tip received for the meal

Count histogram of the tips: bin number effect1 dollar resolution

tip

Freq

uenc

y

0 2 4 6 8 10

020

4060

Page 74: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A simple example

The tips datasetI tips received during two and half months by a food server in the

early 90sI 7 variables including the tip received for the meal

Count histogram of the tips: bin number effect10 cents resolution

tip

Freq

uenc

y

010

2030

0 1 2 3 4 5 6 7 8 9 10

Page 75: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Parameter selection

General paradigmI some parametric visualization method:

I analytical abstraction (e.g. number of bins)I visual abstraction (e.g. histogram vs counting histogram)I view (e.g. aspect ratio)

I simple interactive way of modifying the parameter value (slider,keyboard, etc.)

I real time update of the view

Strong requirementsI deterministic results (minimum surprise principle)I real time performancesI animated transitions (recommended)

Page 76: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example

K-means clusteringI a simple two dimensional dataset (for illustration)I minimalist view: scatter plot with prototypesI interaction: number of clusters

50 100 150 200 250

150

200

250

300

350

palmitoleic

stea

ric

Page 77: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=2

palmitoleic

stea

ric

Page 78: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=3

palmitoleic

stea

ric

Page 79: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=4

palmitoleic

stea

ric

Page 80: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=5

palmitoleic

stea

ric

Page 81: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=4 (oops)

palmitoleic

stea

ric

Page 82: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A naive example (of failure)

50 100 150 200 250

150

200

250

300

350

k=3

palmitoleic

stea

ric

Page 83: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

State of the art

AvailableI generic view interaction (including 3D): zooming, panning, rotating,

etc.I specific view interaction: ordering, alpha blending, labelling, etc.I visual abstraction interaction: switching from one representation to

anotherI data abstraction interaction: basic control of simple things (e.g.,

histogram bandwidth)

Todo...I deeper interaction: complex data abstraction control (e.g., number

of clusters, of variables, etc.)I quality feedbackI hypotheses formulation and testing

Page 84: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Brushing and Linking

A.k.a. ConditioningI simple but crucial techniqueI represent whatever variables X and Y (numerical, categorical,

high dimensional, etc.)I show the effect on X of selecting part of the values of YI technically: compare e.g. P(X |Y ∈ Y and P(X )

DefinitionBrushing: selecting a subset of the data items with an input device

DefinitionLinking: showing the effect of a brush on all representations of the data

Page 85: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips10 cents resolution

tip

Freq

uenc

y

010

2030

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 86: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips10 cents resolution

tip

Freq

uenc

y

010

2030

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 87: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips10 cents resolution

tip

Freq

uenc

y

010

2030

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 88: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips1 dollar resolution

tip

Freq

uenc

y

020

4060

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 89: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips1 dollar resolution

tip

Freq

uenc

y

020

4060

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 90: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

Back to the tips datasetI tips received during two and half months by a food server in the

early 90sI effect of the gender of the tiper?

Count histogram of the tips1 dollar resolution

tip

Freq

uenc

y

020

4060

0 1 2 3 4 5 6 7 8 9 10 F M

050

100

150

Page 91: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Brushing and Linking

Implementationis not that obvious...

I complex effect: goes all the way back to the dataI process:

1. provide a selection technique2. map selection on the screen to selection on the original data3. apply the data transformation chain to this subset on all the views4. display the subset together with the full set on all the views

I numerous implied hypotheses:I deterministic methodsI context aware methods (the full data context must be imposed to the

subset)I meaningful superimposed displays

I highly goal dependent (again...)

Page 92: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Main points

I infovis methods consist of a succession transformation steps fromthe raw data to a view

I machine learning and statistics can play some role in most of thetransformations

I visual mapping is crucial to readability and inference:I explicit encoding principlesI visual features encoding quality ordering

I brushing and linking enable conditional analysisI interactivity needs real-time, animation and determinism

Page 93: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 94: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Histograms

Can we do better?I “numerical” summaries: median, mean, standard deviation, etc.I smooth density estimates

Associated visualizations

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

Kernel density estimator

Petal Width

Den

sity

0.5

1.0

1.5

2.0

2.5

Boxplot

Peta

lWid

th

Page 95: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Analysis

Chi and Riedl’s modeldata (Xij)1≤i≤N for a fixed j

analytical abstraction a kernel density estimate of p(x) or some basicstatistics

visual abstraction a function x 7→ p(x) or a fancy box withwhiskers (legs in French)

view see previous slide

VariationsCan we do better?

I less distraction (simpler design)I readabilityI more context

Page 96: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Data-Ink ratio

Less is more (sort of)Edward Tufte’s version:

I ink is used to print data and non dataI one should maximize the ratio between the data ink and the total

ink

A large share of ink on a graphic should presentdata-information, the ink changing as the data change.Data-ink is the non-erasable core of a graphic, thenon-redundant ink arranged in response to variation in thenumbers represented.

Tufte, 1983

Page 97: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Examples

Original designs

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

Kernel density estimator

Petal Width

Den

sity

0.5

1.0

1.5

2.0

2.5

Boxplot

Peta

lWid

th

Page 98: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Examples

Higher data-ink ratios

Kernel density estimator

Petal Width

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

0.0

0.5

1.0

1.5

2.0

2.5

Boxplot

Peta

lWid

th

Page 99: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ReadabilityAspect ratioDrawing a function is simple, is it?

I the aspect ratio is the graph plays in fact a crucial roleI slopes are difficult to judgeI Cleveland, McGill and McGill have shown that slopes far away

from 45° are difficult to estimateTo maximize readability, choose the aspect ratio according to thisfinding

ExampleKernel density estimator (flat)

Petal Width

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.4

0.8

1.2

Page 100: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ReadabilityAspect ratioDrawing a function is simple, is it?

I the aspect ratio is the graph plays in fact a crucial roleI slopes are difficult to judgeI Cleveland, McGill and McGill have shown that slopes far away

from 45° are difficult to estimateTo maximize readability, choose the aspect ratio according to thisfinding

ExampleKernel density estimator (peaky)

Petal Width

Den

sity

-2 0 2 4 6

0.0

0.4

0.8

Page 101: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ReadabilityAspect ratioDrawing a function is simple, is it?

I the aspect ratio is the graph plays in fact a crucial roleI slopes are difficult to judgeI Cleveland, McGill and McGill have shown that slopes far away

from 45° are difficult to estimateTo maximize readability, choose the aspect ratio according to thisfinding

ExampleKernel density estimator (optimal)

Petal Width

Den

sity

-1 0 1 2 3 4

0.0

0.4

0.8

Page 102: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Context

Misleading aspects of density plotsI smoothing effect: generally assign weights to empty intervalsI smoothing effect again: generally extend the actual range of the

observations

⇒ Context is needed

Adding a rug to the plotKernel density estimator

Petal Width

Den

sity

0 1 2 3

0.0

0.4

0.8

Page 103: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Context

Misleading aspects of density plotsI smoothing effect: generally assign weights to empty intervalsI smoothing effect again: generally extend the actual range of the

observations

⇒ Context is needed

Adding a rug to the plotKernel density estimator

Petal Width

Den

sity

0 1 2 3

0.0

0.4

0.8

Page 104: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Complex process

Chi and Riedl’s modeldata (Xij)1≤i≤N for a fixed j

analytical abstraction a kernel density estimate of p(x) and the datathemselves

visual abstraction a function x 7→ p(x)view a function representation with optimal aspect

ratio (computed via the median of the slopes ofp), together with a rug obtained from (Xij)1≤i≤N(maybe with some added jitter)

Added value over histogramsI smooth estimatorI no border effects (]a,b] or [a,b[...)I less non data inkI easier comparison between distribution

Page 105: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ConditioningComparing distributions

Conditional distributionI estimation of p(x |y) for a discrete yI draw p(x |y)p(y) to ease superposition with the global distribution

Kernel density estimator

Petal width

Den

sity

setosaversicolorvirginica

0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Page 106: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ConditioningComparing distributions

Conditional distributionI estimation of p(x |y) for a discrete yI draw p(x |y)p(y) to ease superposition with the global distribution

Kernel density estimator

Petal width

Den

sity

setosaversicolorvirginica

0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Page 107: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

ConditioningComparing distributions

Conditional distributionI estimation of p(x |y) for a discrete yI draw p(x |y)p(y) to ease superposition with the global distribution

Counting histograms

Petal width

Cou

nts

setosaversicolorvirginica

0.0 0.5 1.0 1.5 2.0 2.5

010

2030

Page 108: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Scaling the comparison

Density estimator limitationsI only a small number of color codes can be distinguished quickly

(less than 10)I overlapping occurs also quickly

Boxplots are more scalable, up to ordering issues

An example

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

02

46

8

daily active power

hour

activ

epo

wer

Page 109: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Scaling the comparison

Density estimator limitationsI only a small number of color codes can be distinguished quickly

(less than 10)I overlapping occurs also quickly

Boxplots are more scalable, up to ordering issues

An example

02

46

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

daily active power

hour

activ

epo

wer

Page 110: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Interaction

Brushing and linkingI brushing induces conditioning (directly or indirectly)I the conditional data should be superposed to or displayed close to

the original dataI boxplots are easy to superpose and/or to display side by sideI density plots also: additionally, multiple brushing is possible

(contrarily to histograms)

Other actionsI interactive bandwidth selectionI boxplots positioning (could be automated)

Page 111: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Outline

Information VisualizationDefinitionInfovis applicationsLimitations of Infovis and VDMAn introductory example: the histogramAnother example: categorical dataInteractivity

Machine learning and visualizationOne dimensional dataTwo dimensional data

Page 112: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Moving to “higher” dimensions

Mixed caseI one numerical variable, one categorical variableI done by conditioning!I density plots for less than 10 modalities, boxplots for more

Numerical caseI classical solution: scatter plot (to be discussed)I less classical solution: discretize one variableI shingles: overlapping intervals that span the discretized variableI back to the mixed case

Page 113: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A trellis plot

Numerical case with shingles

Sepal.Length

Den

sity

0.0

0.5

1.0

4 5 6 7 8

Petal.Length.S Petal.Length.S

Petal.Length.S

0.0

0.5

1.0

Petal.Length.S0.0

0.5

1.0

Petal.Length.S

4 5 6 7 8

Petal.Length.S

Page 114: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Categorical variables

Stacked bar chartsI conditional approachI use spinograms: display P(U|V ) as a bar of length one with

sub-bars of length proportional to the conditional probability of themodalities of U

I stack the spinograms

Mosaic plotI recursive spinogramsI compute Q bars of width proportional to the probabilities of VI split each bar in sub-bars with heights proportional to the

conditional probabilities of U given V

Page 115: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Categorical variablesStacked bar charts

Proportion

All Postdoctorates

Social and Behavioral Sciences

Physics and Astronomy

Medical Sciences

Engineering

Earth, Atmospheric, and Ocean Sciences

Chemistry

Biological Sciences

0.0 0.2 0.4 0.6 0.8 1.0

Expected or Additional TrainingWork with Specific Person

Training Outside PhD FieldOther Employment Not Available

Other

Page 116: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A complex example

The Titanic

Freq

1st

2nd

3rd

Crew

0 200 400 600 800

MaleChild

0 200 400 600 800

FemaleChild

0 200 400 600 800

MaleAdult

0 200 400 600 800

FemaleAdult

SurvivedNo Yes

Page 117: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

A complex example

The Titanic

Freq

1st

2nd

3rd

Crew

0 10 20 30 40 50

MaleChild

0 10 20 30

FemaleChild

0 200 400 600 800

MaleAdult

0 50 100 150

FemaleAdult

SurvivedNo Yes

Page 118: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Mosaic plot

Proportion

Black

Brown

Red

Blond

0.0 0.2 0.4 0.6 0.8 1.0

BrownBlue

HazelGreen

Page 119: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Mosaic plot

Hair

Eye

Black Brown Red BlondB

row

nB

lue

Haz

elG

reen

Page 120: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Mosaic plot

Sta

ndar

dize

dR

esid

uals

:<-

4-4

:-2-2

:00:

22:

4>4

Hair

Eye

Black Brown Red BlondB

row

nB

lue

Haz

elG

reen

Page 121: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Mosaic plot in N dim

Sta

ndar

dize

dR

esid

uals

:<-

4-4

:-2-2

:00:

22:

4>4

Hair

Eye

Black Brown Red BlondB

row

nB

lue

Haz

elG

reen

Male Female Male Female MaleFemale Male Female

Page 122: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Mosaic plots

Built-in quality assessmentI mosaic plots show both the data and some statistical indicatorsI this limits the risk of false interpretationI should be used whenever possibleI remains an open challenge for most visualization methods

InteractivityI show information attached to each cell (size, value of the residual)I change the splitting orderI add or remove variablesI brush and link

Page 123: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Scatter plot

The standard tool for 2 numerical variablesI one point per objectI two dominant numerical characteristics per objectI a few additional characteristics:

I a nominal variable (hue or shape coded)I a numerical variable (lightness or shape coded)I a label

I does not scale:I low dimension onlyI very sensitive to overlapping

Page 124: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

0 100 200 300

150

200

250

300

350

palmitoleic

stea

ric

Page 125: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

0 100 200 300

150

200

250

300

350

Alpha blending

palmitoleic

stea

ric

Page 126: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

0 100 200 300

150

200

250

300

350

Alpha blending + class

palmitoleic

stea

ric

123

Page 127: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Simple example

0 100 200 300

150

200

250

300

350

Alpha blending + class + bubbles

palmitoleic

stea

ric

123

Page 128: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

How to improve scalability?

Two scalability problems1. Dimension (a.k.a. variable or characteristics or feature):

I Reduce the number of dimension:I user choiceI automated: selection and extraction

I Display more than 2/3 dimensions at once: visual layout withbrushing and linking

2. Object:I Reduce the number of objects: clustering and quantizationI Reduce the size of an objectI Constrain the display to forbid (or reduce) overlapping

Page 129: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Automated scalability

Reducing the number of variablesI a major research topic in machine learningI feature selection: keeping a subset of the original variablesI feature extraction: producing new variables in small quantityI supervised: variables should “explain” a target variableI or unsupervised: variables optimize some quality criterionI some methods:

I Principal Component AnalysisI Linear Discriminant AnalysisI SNE and variantsI Variable ClusteringI etc.

Page 130: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Difficulties

What should be optimized?I goal dependent:

I outlier detection⇒ maximize distances between outliers and centralobjects

I visual clustering⇒ respect the neighborhood relationshipsI rule finding⇒ keep original variablesI etc.

I machine learning algorithms optimize abstract quantities (e.g.,reconstruction error)

I links between ML optimality and visual usefulness is unclear (atbest!)

Page 131: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

(Un)Supervised

-4 -2 0 2 4

-4-2

02

Principal Component Analysis (0.69%)

PC1

PC

2

123

Page 132: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

(Un)Supervised

-8 -6 -4 -2 0 2

-4-2

02

4

Linear Discriminant Analysis

LD1

LD2

123

Page 133: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighborhood structure preservation

To enable visual cluster analysis on projected dataset, theneighborhood structure of the data must be preserved by theprojection.

I Quality measures (Venna & Kaski, 2001→ 2007):I Trustworthiness: neighbors on the screen are real neighborsI Continuity: real neighbors are neighbors on the screenI Precision ' TrustworthinessI Recall ' Continuity

I Optimization methods:I Stochastic Neighbor Embedding (Hinton & Roweis, 2002)I Neighbor Retrieval Visualizer (Venna & Kaski, 2006)

Page 134: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighbor Preservation

Original space to projection space

6 neighbors

Page 135: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighbor Preservation

Original space to projection space

Correct projection

Page 136: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighbor Preservation

Original space to projection space

Trustworthiness violation

Page 137: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighbor Preservation

Original space to projection space

Continuity violation

Page 138: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Trustworthiness and Precision

Can you trust neighbors in the projection space?I Ok (xi): k -nn of xi in the original spaceI Pk (xi): k -nn of xi in the projection spaceI Uk (xi) = Pk (xi) \Ok (xi)

I PrecisionI maximal precision: Pk (xi) ⊂ Ok (xi)

I mean on i of 1− #Uk (xi)

#Pk (xi)

I TrustworthinessI rank preservation: rO(xj , xi) rank of xj as a neighbor of xi in the

original space

I M1(k) = 1− 2Nk(2N − 3k − 1)

N∑i=1

∑xj∈Uk (xi )

(rO(xj , xi)− k

)

Page 139: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Trustworthiness and Precision

7

11

9

I Contribution to precision: 0.5I Contribution to trustworthiness: 9

Page 140: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Continuity and Recall

Do you miss neighbors in the projection space?I Ok (xi): k -nn of xi in the original spaceI Pk (xi): k -nn of xi in the projection spaceI Vk (xi) = Ok (xi) \ Pk (xi)

I RecallI maximal recall: Ok (xi) ⊂ Pk (xi)

I mean on i of 1− #Vk (xi)

#Ok (xi)

I ContinuityI rank preservation: rP(xj , xi) rank of xj as a neighbor of xi in the

projection space

I M2(k) = 1− 2Nk(2N − 3k − 1)

N∑i=1

∑xj∈Vk (xi )

(rP(xj , xi)− k

)

Page 141: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Continuity and Recall

7

9

10

I Contribution to recall: 0.5I Contribution to continuity: 8

Page 142: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Stochastic Neighbor Embedding (SNE)

Approximating the “neighborhood distribution” (Hinton &Roweis, 2002)

I Original space: pij =exp(−d2

ij )∑k 6=j exp(−d2

ik )(dij dissimilarity between xi

and xj )I xi projected to yi

I Projection space: qij =exp(−‖yi − yj‖2)∑

k 6=j exp(−‖yi − yk‖2)

I Minimize the Kullback-Leibler divergence between pi. and qi., i.e.,

C =∑

i

∑j

pij logpij

qij

I Corresponds to a smoothed version of the recall (Venna & Kaski,2006)

Page 143: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Neighbor Retrieval Visualizer

Optimizing smoothed versions of precision and recall(Venna & Kaski, 2006)

I inspired by stochastic neighbor embedding (SNE)I minimization of

λ∑

i

∑j

pij logpij

qij+ (1− λ)

∑i

∑j

qij logqij

pij

I λ = 1 (SNE)⇒ recallI λ = 0⇒ precisionI λ is user chosen (can be chosen to optimize a quality measure)I slow (O(N3) per iteration for N objects, as SNE)

Page 144: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Demonstration

-4 -2 0 2 4

-4-2

02

Principal Component Analysis (0.69%)

PC1

PC

2

123

Page 145: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Demonstration

-8 -6 -4 -2 0 2

-4-2

02

4

Linear Discriminant Analysis

LD1

LD2

123

Page 146: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Demonstration

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

SNE

V1

V2

123

Page 147: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Demonstration

-1 0 1 2

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

NeRV, λ = 0.5

V1

V2

123

Page 148: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Demonstration

-2 -1 0 1 2

-2-1

01

23

NeRV, λ = 0

V1

V2

123

Page 149: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Dimensionality reduction

StatusI very active fieldI numerous methods, improving resultsI regular breakthroughs: non linearity, emphasizis on small

distances, importance of ranks, etc.

LimitationsI slow methods (nowhere near realtime on realistic datasets)I highly nonlinear methods:

I no axisI nothing is uniform

I built-in quality assessment is a work in progressI interactivity?

Page 150: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Picking dimensions manually

Interactive solutionI show hints of possible interesting pairsI let the user choose

Correlation matrix

palmitic

palmitoleic

stearic

oleic

linoleic

linolenic

arachidic

eicosenoic

palm

itic

palm

itole

ic

stea

ric

olei

c

linol

eic

linol

enic

arac

hidi

c

eico

seno

ic

Page 151: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Reducing the overlapping problem

An open research topicI rendering:

I transparency (alpha layer)I stereo visionI overlap counting

I interactivity:I sub-samplingI zooming and panningI excentric labelingI magic lensesI distortion

I machine learning and related methods:I clusteringI force field approach

Page 152: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Clustering

Reduction of the number of objectsI very general solution (not limited to scatter plot)I clustering⇒ Cluster prototype⇒ Prototype visualizationI interactivity:

I number of clusters:I deterministic clusteringI minimal surprise principle (e.g., hierarchical)

I cluster extension (objects or summary)I important warning:

I clustering in 2D 6= clustering the original dataI guaranteed non overlapping: 2D!

Page 153: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Where to cluster?

0 100 200 300

150

200

250

300

350

Original space (k=100)

palmitoleic

stea

ric

Page 154: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Where to cluster?

0 100 200 300

150

200

250

300

350

2D space (k=200)

palmitoleic

stea

ric

Page 155: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Where to cluster?

-6 -4 -2 0 2 4 6

-4-2

02

Original space (k=100) PCA display

PC1

PC

2

Page 156: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

2D density

palmitoleic

stea

ric

150

200

250

300

350

50 100 150 200 250

Counts

1

2

3

4

5

6

7

8

9

Page 157: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

2D density

palmitoleic

stea

ric

150

200

250

300

350

50 100 150 200 250

Counts

1

2

3

4

5

6

7

8

9

Page 158: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Scatter plot

The workhorse of numerical visualizationI very efficient representation for 2 numerical variables (position on

a common scale, best feature according to Cleveland and McGill)I hue can be used to display an additional categorical variableI luminance or shape or symbol size can encode another variable

(less efficiently)I overlapping reduces the encoding efficient back to 2 variables

Scalability issuesI are mostly unsolved...I should be task oriented!

Page 159: Information visualization, data mining and machine learningapiacoa.org/publications/teaching/visualization/visualization-cil-2011.pdfInformation visualization, data mining and machine

Bibliography I

[CR98] Ed H. Chi and John T. Riedl. An operator interactionframework for visualization systems. In Proceedings of theIEEE Symposium on Information Visualization (InfoVis’98),pages 63–70, Research Triangle Park, North Carolina, USA,October 1998.