How to display data badly - Biostatistics and Medical ...kbroman/... · ER Tufte (1983) The visual...

Post on 10-Jun-2020

14 views 0 download

Transcript of How to display data badly - Biostatistics and Medical ...kbroman/... · ER Tufte (1983) The visual...

How to display data badly

Karl W BromanBiostatistics & Medical InformaticsUniversity of Wisconsin – Madison

kbroman.org

github.com/kbroman

@kwbroman

Using Microsoft Excel toobscure your data and

annoy your readers

Karl W BromanBiostatistics & Medical InformaticsUniversity of Wisconsin – Madison

kbroman.org

github.com/kbroman

@kwbroman

Inspiration

This lecture was inspired by

H Wainer (1984) How to display data badly. American Statistician38(2): 137–147

Dr. Wainer was the first to elucidate the principles of thebad display of data.

The now widespread use of Microsoft Excel has resulted inremarkable advances in the field.

3

General principles

The aim of good data graphics:

Display data accurately and clearly.

Some rules for displaying data badly:

• Display as little information as possible.

• Obscure what you do show (with chart junk).

• Use pseudo-3d and color gratuitously.

• Make a pie chart (preferably in color and 3d).

• Use a poorly chosen scale.

• Ignore sig figs.

4

Example 1

5

Example 1

6

Example 1

7

Example 1

8

Example 1

9

Example 1

10

Example 1

11

Example 1

12

Example 2

13

Example 2

14

Example 2

15

Example 2

16

Example 2

17

Example 3

18

Example 3

19

Example 3

20

Example 3

21

Example 4

22

Example 4

23

Example 4

24

Example 5

25

Example 5

26

Example 5

27

Example 5

28

Example 5

29

Example 5

30

Example 6

31

Example 6

32

Example 7

33

Example 7

34

Displaying data well

• Be accurate and clear.

• Let the data speak.– Show as much information as possible, taking care not to obscure

the message.

• Science not sales.– Avoid unnecessary frills (esp. gratuitous 3d).

• In tables, every digit should be meaningful. Don’t dropending 0’s.

35

More on data visualization

Karl W BromanBiostatistics & Medical InformaticsUniversity of Wisconsin – Madison

kbroman.org

github.com/kbroman

@kwbroman

Ease comparisons(things to be compared should be adjacent)

10

12

14

16

18

Phe

noty

pe

Female Male Female Male Female Male

AA AB BB

10

12

14

16

18

Phe

noty

pe

AA AB BB AA AB BB

Female Male

2

Ease comparisons(add a bit of color)

10

12

14

16

18

Phe

noty

pe

Female Male Female Male Female Male

AA AB BB

10

12

14

16

18

Phe

noty

pe

AA AB BB AA AB BB

Female Male

3

Which comparison is easiest?

A B

0

50

100

150

A B

0

50

100

150

0

100

200

300

400

AB

0

100

200

300

400

A

B

0

100

200

300

400

A

B

B

A

4

Don’t distort the quantities(value ∝ radius)

Wheat (17 Gbp)

Arabidopsis (0.145 Gbp)

Human (3.2 Gbp)

5

Don’t distort the quantities(value ∝ area)

Wheat (17 Gbp)

Arabidopsis (0.145 Gbp)

Human (3.2 Gbp)

6

Don’t use areas at all(value ∝ length)

Gen

ome

size

(G

bp)

0

5

10

15

Arabidopsis Human Wheat

7

Encoding data

Quantities

• Position

• Length

• Angle

• Area

• Luminance (light/dark)

• Chroma (amount of color)

Categories

• Shape

• Hue (which color)

• Texture

• Width

8

Ease comparisons(align things vertically)

Women

Height (in)

55 60 65 70 75

Men

Height (in)

55 60 65 70 75

Men

Height (in)

55 60 65 70 75

9

Ease comparisons(use common axes)

Women

Height (in)

55 60 65 70 75

Men

Height (in)

60 65 70 75

Women

Height (in)

55 60 65 70 75

Men

Height (in)

55 60 65 70 75

10

Use labels not legends

●●● ●●

●●

●●

●●

●●

● ●●

●●

●●●●

● ●● ●

●●

●●

●●●●

● ●

● ●

●●●

●●

●●

● ●

●●●

●●

1 2 3 4 5 6 7

0.0

0.5

1.0

1.5

2.0

2.5

Petal length (cm)

Pet

al w

idth

(cm

)

setosaversicolorvirginica

●●● ●●

●●

●●

●●

●●

● ●●

●●

●●●●

● ●● ●

●●

●●

●●●●

● ●

● ●

●●●

●●

●●

● ●

●●●

●●

1 2 3 4 5 6 7

0.0

0.5

1.0

1.5

2.0

2.5

Petal length (cm)

Pet

al w

idth

(cm

)

setosa

versicolor

virginica

11

Don’t sort alphabetically

0 5 10 15

Health care spending (% GDP)

United StatesUnited Kingdom

TurkeySwitzerland

SwedenSpain

Russian FederationPoland

NorwayNetherlands

MexicoKorea, Rep.

JapanItaly

IndonesiaIndia

GermanyFranceChina

CanadaBrazil

BelgiumAustria

AustraliaArgentina ●

0 5 10 15

Health care spending (% GDP)

IndonesiaIndia

ChinaMexico

Russian FederationTurkeyPoland

Korea, Rep.Argentina

BrazilAustraliaNorway

JapanUnited Kingdom

SwedenSpain

ItalyBelgiumAustria

SwitzerlandGermany

CanadaFrance

NetherlandsUnited States ●

12

Must you include 0?

Method

Det

ectio

n ra

te (

%)

0

20

40

60

80

100

120

A B C

96.5% 98.1% 99.2%

95

96

97

98

99

100

Method

Det

ectio

n ra

te (

%)

A B C

13

Summary

• Put the things to be compared next to each other

• Use color to set things apart, but consider color blind folks

• Use position rather than angle or area to represent quantities

• Align things vertically to ease comparisons

• Use common axis limits to ease comparisons

• Use labels rather than legends

• Sort on meaningful variables (not alphabetically)

• Must 0 be included in the axis limits?

• Consider taking logs and/or differences

14

Further reading

• ER Tufte (1983) The visual display of quantitative information. Graphics Press.

• ER Tufte (1990) Envisioning information. Graphics Press.

• ER Tufte (1997) Visual explanations. Graphics Press.

• WS Cleveland (1993) Visualizing data. Hobart Press.

• WS Cleveland (1994) The elements of graphing data. CRC Press.

• A Gelman, C Pasarica, R Dodhia (2002) Let’s practice what we preach: Turningtables into graphs. The American Statistician 56:121-130

• NB Robbins (2004) Creating more effective graphs. Wiley

• Nature Methods columns: http://bang.clearscience.info/?p=546

15