1 basics

46
July 2010 Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University Visualisation in R Sunday, 25 July 2010

description

 

Transcript of 1 basics

Page 2: 1 basics

HadleyHELLO

my name is

Sunday, 25 July 2010

Page 3: 1 basics

http://had.co.nz/vanderbilt-vis

Sunday, 25 July 2010

Page 4: 1 basics

1. Preview of today

2. About ggplot2

3. More resources

4. Diving in

Sunday, 25 July 2010

Page 5: 1 basics

Fuel economyBasic graphics

Sunday, 25 July 2010

Page 6: 1 basics

displ

hwy

15

20

25

30

35

40

●●

●●

●● ●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

● ●

●●

● ●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●●

●● ●

2 3 4 5 6 7

Sunday, 25 July 2010

Page 7: 1 basics

displ

hwy

15

20

25

30

35

40

●●

●●

●● ●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

● ●

●● ●

● ●

● ●

● ●

● ●

●●●

●●

2 3 4 5 6 7

class● 2seater● compact● midsize● minivan● pickup● subcompact● suv

Sunday, 25 July 2010

Page 8: 1 basics

Diamond pricesDisplaying large data

Sunday, 25 July 2010

Page 9: 1 basics

depth

count

0

1000

2000

3000

4000

56 58 60 62 64 66 68 70

Sunday, 25 July 2010

Page 10: 1 basics

table

price

5000

10000

15000

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●

●●●●●●●●●●

●●●●●●●●

●●●

●●●●●

●●

●●●

●●

●●●●●

●●●

●●●

●●

●●

●●

50 60 70 80 90

Sunday, 25 July 2010

Page 11: 1 basics

carat

price

5000

10000

15000

1 2 3 4 5

count1000200030004000500060007000

Sunday, 25 July 2010

Page 12: 1 basics

US baby namesUS baby namesData manipulation and

transformation

Sunday, 25 July 2010

Page 13: 1 basics

year

avg_length

5.4

5.6

5.8

6.0

6.2

6.4

1880 1900 1920 1940 1960 1980 2000

sexboygirl

Sunday, 25 July 2010

Page 14: 1 basics

year

prop

0.70

0.75

0.80

0.85

0.90

0.95

1880 1900 1920 1940 1960 1980 2000

sexboygirl

Sunday, 25 July 2010

Page 15: 1 basics

Polishing your plots

Sunday, 25 July 2010

Page 16: 1 basics

1. Scales: used to override default perceptual mappings, and tune parameters of axes and legends.

2. Themes: control presentation of non-data elements.

3. Saving your work: to include in reports, presentations, etc.

Sunday, 25 July 2010

Page 17: 1 basics

Sunday, 25 July 2010

Page 18: 1 basics

ggplot2

Sunday, 25 July 2010

Page 19: 1 basics

About ggplot2

Graphical grammar (domain specific language), based on “The Grammar of Graphics” by Leland Wilkinson.

Specify what you want, not how to create it. Many fiddly details taken care of.

“Instead of spending time making your graph look pretty, you can focus on creating a graph that bests reveals the messages in your data.”

Sunday, 25 July 2010

Page 20: 1 basics

http://had.co.nz/ggplot2

http://had.co.nz/ggplot2/book

http://groups.google.com/group/ggplot2

http://learnr.wordpress.com

http://ggplot2.wik.is

Useful resources

Sunday, 25 July 2010

Page 21: 1 basics

Learning a newlanguage is hard!

Sunday, 25 July 2010

Page 22: 1 basics

Scatterplot basicsinstall.packages("ggplot2")library(ggplot2)

?mpghead(mpg)str(mpg)summary(mpg)

qplot(displ, hwy, data = mpg)

Sunday, 25 July 2010

Page 23: 1 basics

Scatterplot basicsinstall.packages("ggplot2")library(ggplot2)

?mpghead(mpg)str(mpg)summary(mpg)

qplot(displ, hwy, data = mpg)

In ggplot2, we always explicitly specify the data

Sunday, 25 July 2010

Page 24: 1 basics

displ

hwy

15

20

25

30

35

40

●●

●●

●● ●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

● ●

●●

● ●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●●

●● ●

2 3 4 5 6 7

qplot(displ, hwy, data = mpg)Sunday, 25 July 2010

Page 25: 1 basics

Additional variables

Can display additional variables with aesthetics (like shape, colour, size) or facetting (small multiples displaying different subsets)

Sunday, 25 July 2010

Page 26: 1 basics

displ

hwy

15

20

25

30

35

40

●●

●●

●● ●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

● ●

●● ●

● ●

● ●

● ●

● ●

●●●

●●

2 3 4 5 6 7

class● 2seater● compact● midsize● minivan● pickup● subcompact● suv

qplot(displ, hwy, colour = class, data = mpg)Sunday, 25 July 2010

Page 27: 1 basics

displ

hwy

15

20

25

30

35

40

●●

●●

●● ●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

● ●

●● ●

● ●

● ●

● ●

● ●

●●●

●●

2 3 4 5 6 7

class● 2seater● compact● midsize● minivan● pickup● subcompact● suv

Legend chosen and displayed automatically.

qplot(displ, hwy, colour = class, data = mpg)Sunday, 25 July 2010

Page 28: 1 basics

Your turn

Experiment with colour, size, and shape aesthetics.

What’s the difference between discrete or continuous variables?

What happens when you combine multiple aesthetics?

Sunday, 25 July 2010

Page 29: 1 basics

Discrete Continuous

Colour

Size

Shape

Rainbow of colours

Gradient from red to blue

Discrete size steps

Linear mapping between radius

and value

Different shape for each

Doesn’t work

Sunday, 25 July 2010

Page 30: 1 basics

Faceting

Small multiples displaying different subsets of the data.

Useful for exploring conditional relationships. Useful for large data.

Sunday, 25 July 2010

Page 31: 1 basics

Your turnqplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl)

qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .)

qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl)

qplot(displ, hwy, data = mpg) + facet_wrap(~ class)

Sunday, 25 July 2010

Page 32: 1 basics

Summary

facet_grid(): 2d grid, rows ~ cols, . for no split

facet_wrap(): 1d ribbon wrapped into 2d

Scales argument controls whether position scales are fixed or free.

Sunday, 25 July 2010

Page 33: 1 basics

cty

hwy

15

20

25

30

35

40

● ●

● ●

● ●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●●●

● ●

●●

●●

● ●

●●

● ●

● ●

●●●●

● ●●

● ●●

10 15 20 25 30 35qplot(cty, hwy, data = mpg)

What’s the problem with this plot?

Sunday, 25 July 2010

Page 34: 1 basics

cty

hwy

15

20

25

30

35

40

●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

● ●

●●

●●

● ●

●●●

● ●●

● ●●

10 15 20 25 30 35qplot(cty, hwy, data = mpg, geom = "jitter")Sunday, 25 July 2010

Page 35: 1 basics

cty

hwy

15

20

25

30

35

40

●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

● ●

●●

●●

● ●

●●●

● ●●

● ●●

10 15 20 25 30 35qplot(cty, hwy, data = mpg, geom = "jitter")

geom controls “type” of plot

Sunday, 25 July 2010

Page 36: 1 basics

class

hwy

15

20

25

30

35

40

●●

●●

●●●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

2seater compact midsize minivan pickup subcompact suv

qplot(class, hwy, data = mpg)Sunday, 25 July 2010

Page 37: 1 basics

class

hwy

15

20

25

30

35

40

●●

●●

●●●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

2seater compact midsize minivan pickup subcompact suv

qplot(class, hwy, data = mpg)

How could we improve this plot?

Brainstorm for 1 minute.

Sunday, 25 July 2010

Page 38: 1 basics

reorder(class, hwy)

hwy

15

20

25

30

35

40

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

●●

●●

●●●●

pickup suv minivan 2seater midsize subcompact compact

Sunday, 25 July 2010

Page 39: 1 basics

reorder(class, hwy)

hwy

15

20

25

30

35

40

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

●●

●●

●●●●

pickup suv minivan 2seater midsize subcompact compact

qplot(reorder(class, hwy), hwy, data = mpg)

Incredibly useful technique!

Sunday, 25 July 2010

Page 40: 1 basics

reorder(class, hwy)

hwy

15

20

25

30

35

40

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●● ●

●●

●●

●●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

pickup suv minivan 2seater midsize subcompact compactqplot(reorder(class, hwy), hwy, data = mpg, geom = "jitter")Sunday, 25 July 2010

Page 41: 1 basics

reorder(class, hwy)

hwy

15

20

25

30

35

40

●●●

●●

pickup suv minivan 2seater midsize subcompact compactqplot(reorder(class, hwy), hwy, data = mpg, geom = "boxplot")Sunday, 25 July 2010

Page 42: 1 basics

reorder(class, hwy)

hwy

15

20

25

30

35

40

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

● ●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●●

● ●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

pickup suv minivan 2seater midsize subcompact compactqplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot"))Sunday, 25 July 2010

Page 43: 1 basics

Your turn

Read the help for reorder. Redraw the previously plots with class ordered by median hwy.

How would you put the jittered points on top of the boxplots?

Sunday, 25 July 2010

Page 44: 1 basics

Aside: coding strategy

At the end of each interactive session, you want a summary of everything you did. Two options:

1. Save everything you did with savehistory() then remove the unimportant bits.

2. Build up the important bits as you go. (this is how I work)

Sunday, 25 July 2010

Page 45: 1 basics

Sunday, 25 July 2010

Page 46: 1 basics

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Sunday, 25 July 2010