How Humans See Data

Post on 14-Apr-2017

1.282 views 0 download

Transcript of How Humans See Data

How Humans See Data

John Rauser@jrauser

November 2016

How Humans See Data

John Rauser@jrauser

November 2016

visualization

visualizationis

communication

how to make better visualizations

help humans solve analytical problems quickly and accurately

with visualization

Part I: Why visualize data at all?

x1.972

y1.236

x y

0.111 0.5421.112 1.994 0.902 0.0050.000 1.009 0.598 0.0850.665 1.942 1.613 1.7900.235 0.356 1.298 1.9550.247 1.658 0.651 1.9371.275 1.961 1.949 1.3160.702 0.045 0.099 0.5671.760 0.350 0.862 0.0101.691 0.277 0.027 0.7681.628 1.778 0.706 1.9561.957 1.290 1.042 1.999

pre-attentive processing

A graph is an encoding of the data.

x1.972

y1.236

x y

0.111 0.5421.112 1.994 0.902 0.0050.000 1.009 0.598 0.0850.665 1.942 1.613 1.7900.235 0.356 1.298 1.9550.247 1.658 0.651 1.9371.275 1.961 1.949 1.3160.702 0.045 0.099 0.5671.760 0.350 0.862 0.0101.691 0.277 0.027 0.7681.628 1.778 0.706 1.9561.957 1.290 1.042 1.999

n x y n x y1 1.972 1.236 13 0.111 0.5422 1.112 1.994 14 0.902 0.0053 0.000 1.009 15 0.598 0.0854 0.665 1.942 16 1.613 1.7905 0.235 0.356 17 1.298 1.9556 0.247 1.658 18 0.651 1.9377 1.275 1.961 19 1.949 1.3168 0.702 0.045 20 0.099 0.5679 1.760 0.350 21 0.862 0.010

10 1.691 0.277 22 0.027 0.76811 1.628 1.778 23 0.706 1.95612 1.957 1.290 24 1.042 1.999

Good visualizations optimize for the human visual system.

How does the human visual system work?

How does the human visual system decode a graph?

Cleveland’s three visual operations of pattern perception:

1. Detection2. Assembly3. Estimation

Part II: estimation

Three levels of estimation

a. discrimination X=Y X!=Yb. ranking X>Y X<Yc. ratioing X / Y = ?

At the heart of quantitative reasoning is a single question: Compared to what?

- Tufte, Envisioning Information

Three levels of estimation

a. discrimination X=Y X!=Yb. ranking X>Y X<Yc. ratioing X / Y = ?

the most important

thing

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

“The first rule of color: do not talk about color!”

- Tamara Munzner

luminance

saturation

hue

luminance

saturation

hue

Observation: Alphabetical is almost never the correct ordering

of a categorical variable.

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

11 mpg

11 mpg

11 mpg

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned

scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

Observation: Stacked anything is nearly always

a mistake.

Stacking makes the reader decode lengths, not position

on a common scale.

11 mpg

Observation: Stacked anything is nearly always

a mistake.

Observation: Pie charts are

ALWAYS a mistake.

Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps.

http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/

Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps.

http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.

-Edward Tufte, The Visual Display of Quantitative Information

Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.

-Edward Tufte, The Visual Display of Quantitative Information

Clinton TrumpAmong Democrats 99% 1%Among Republicans 53% 47%

Who do you think did a better job in tonight’s debate?

Afghanistan Albania Algeria Angola ArgentinaAustralia Austria Bahrain Bangladesh BelgiumBenin Bolivia Bosnia and Herzegovina Botswana BrazilBulgaria Burkina Faso Burundi Cambodia CameroonCanada Central African Republic Chad Chile ChinaColombia Comoros Congo, Dem. Rep. Congo, Rep. Costa RicaCote d'Ivoire Croatia Cuba Czech Republic DenmarkDjibouti Dominican Republic Ecuador Egypt El SalvadorEquatorial Guinea Eritrea Ethiopia Finland FranceGabon Gambia Germany Ghana GreeceGuatemala Guinea Guinea-Bissau Haiti HondurasHong Kong, China Hungary Iceland India IndonesiaIran Iraq Ireland Israel ItalyJamaica Japan Jordan Kenya Korea, Dem. Rep.Korea, Rep. Kuwait Lebanon Lesotho LiberiaLibya Madagascar Malawi Malaysia MaliMauritania Mauritius Mexico Mongolia MontenegroMorocco Mozambique Myanmar Namibia NepalNetherlands New Zealand Nicaragua Niger NigeriaNorway Oman Pakistan Panama ParaguayPeru Philippines Poland Portugal Puerto RicoReunion Romania Rwanda Sao Tome and Principe Saudi ArabiaSenegal Serbia Sierra Leone Singapore Slovak RepublicSlovenia Somalia South Africa Spain Sri LankaSudan Swaziland Sweden Switzerland SyriaTaiwan Tanzania Thailand Togo Trinidad and TobagoTunisia Turkey Uganda United Kingdom United StatesUruguay Venezuela Vietnam West Bank and Gaza Yemen, Rep.Zambia Zimbabwe

All good pie charts are jokes.

Observation: Comparison is trivial on a common scale.

the dashboard metaphor is fundamentally flawed

Observation: Scatterplotsshow relationships directly.

Observation: Growth charts usually aren’t.

If growth (slope) is important, plot it directly.

Observation: Growth charts usually aren’t.

If growth (slope) is important, plot it directly.

The most important measurement should exploit the highest ranked encoding possible.

• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue

Cleveland’s three visual operations of pattern perception:

1. Detection2. Assembly3. Estimation

Part three: assembly

Gestalt Psychology

reification

emergence

emergence

Prägnanz

Law Of Closure

Law Of Continuity

Observation: Good plots leverage the law of continuity

to assist with assembly.

Law of Similarity

Law of Proximity

Observation: dodged bar charts are a bad idea

Cleveland’s three visual operations of pattern perception:

1. Detection2. Assembly3. Estimation

Part IV: detection

excel’s defaults are pretty bad

1 2 3 4 5 6 -

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

Observation: Detection isn’t as trivial as it seems.

“Above all else, show the data.”-Tufte

Part V: other useful results

Weber’s law: The “Just Noticeable Difference” is proportional to the

size of the initial stimuli.

10 20

10 20

100 110

12 units

12 units

Observation: Weber’s Law is why gridlines are useful

“Erase non-data ink.”

-Tufte

“Erase non-data ink, within reason.”

-Tufte

“Erase non-data ink that interferes with detection or doesn’t assist assembly and estimation.”

-Rauser

You are best at detecting variation in slope near 45 degrees.

banking to 45

Observation: Banking to 45 best shows variation in slope

Q: Should I include 0 on my scale?

Q: Should I include 0 on my scale?

A: It depends.

Q: Should I include 0 on my scale?

A: Relying on the pre-attentive perception of size or intensity?Yes, otherwise you will mislead.

Using position? It’s up to you.

“Above all else, show the data.”

-Tufte

“Above all else, show the variation in the data.”

-Rauser (via Tufte)

R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV

The rendered document is at http://rpubs.com/jrauser/hhsd_notes

This presentation is at http://goo.gl/VKxxya

I will tweet these links as @jrauser

coda

visualization is

communication

art is

communication

visualization is art

why does it make you feel that way?

visualization has as much to learn from art as from science

R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV

The rendered document is at http://rpubs.com/jrauser/hhsd_notes

This presentation is at http://goo.gl/VKxxya

I will tweet these links as @jrauser

end