20 Polishing

37
Hadley Wickham Stat405 Polishing your plots for presentation Wednesday, 4 November 2009

description

 

Transcript of 20 Polishing

Page 1: 20 Polishing

Hadley Wickham

Stat405Polishing your plots for presentation

Wednesday, 4 November 2009

Page 2: 20 Polishing

1. Thanksgiving

2. Finish off exploratory graphics

3. Communication graphics

4. Scales

5. Themes

Wednesday, 4 November 2009

Page 3: 20 Polishing

No class Wed 25 November.

No assignment due over Thanksgiving.

Thanksgiving

Wednesday, 4 November 2009

Page 4: 20 Polishing

25

30

35

40

45

50

●●

●●

●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

−120 −110 −100 −90 −80 −70

% cancelled● 0.0● 0.2

● 0.4

● 0.6

● 0.8

● 1.0

Wednesday, 4 November 2009

Page 5: 20 Polishing

Perform identify the data and layers in the flight delays data, then write the ggplot2 code to create it.

library(ggplot2)

library(maps)

usa <- map_data("state")

feb13 <- read.csv("delays-feb-13-2007.csv")

Your turn

Wednesday, 4 November 2009

Page 6: 20 Polishing

usa <- map_data("state")

ggplot(feb13, aes(long, lat)) + geom_point(aes(size = 1), colour = "white") + geom_polygon(aes(group = group), data = usa, colour = "grey70", fill = NA) + geom_point(aes(size = ncancelw / ntot), colour = alpha("black", 1/2))

# Polishing: up nextlast_plot() + scale_area("% cancelled", to = c(1, 8), breaks = seq(0, 1, by = 0.2), limits = c(0, 1)) scale_x_continuous("", limits = c(-125, -67)), scale_y_continuous("", limits = c(24, 50))

Wednesday, 4 November 2009

Page 7: 20 Polishing

troops <- read.csv("minard-troops.csv")cities <- read.csv("minard-cities.csv")

ggplot(cities, aes(long, lat)) + geom_path(aes(size = survivors, colour = direction, group = interaction(group, direction)), data = troops) + geom_text(aes(label = city), hjust = 0, vjust = 1, size = 4)

# Polish appearancelast_plot() + scale_x_continuous("", limits = c(24, 39)) + scale_y_continuous("") + scale_colour_manual(values = c("grey50","red")) + scale_size(to = c(1, 10))

Wednesday, 4 November 2009

Page 8: 20 Polishing

When you need to communicate your findings, you need to spend a lot of time polishing your graphics to eliminate distractions and focus on the story.

Now it’s time to pay attention to the small stuff: labels, colour choices, tick marks...

Communication graphics

Wednesday, 4 November 2009

Page 9: 20 Polishing

ContextWednesday, 4 November 2009

Page 10: 20 Polishing

ConsumptionWednesday, 4 November 2009

Page 11: 20 Polishing

Tools

Scales. Used to override default perceptual mappings, and tune parameters of axes and legends.

Themes: control presentation of non-data elements.

Wednesday, 4 November 2009

Page 12: 20 Polishing

ScalesControl how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot.

Important parameters: name, breaks & labels, limits.

Naming scheme: scale_aesthetic_name. All default scales have name continuous or discrete.

Wednesday, 4 November 2009

Page 13: 20 Polishing

# Default scalesscale_x_continuous()scale_y_discrete()scale_colour_discrete()

# Custom scalesscale_colour_hue() scale_x_log10()scale_fill_brewer()

# Scales with parametersscale_x_continuous("X Label", limits = c(1, 10))scale_colour_gradient(low = "blue", high = "red")

Wednesday, 4 November 2009

Page 14: 20 Polishing

p <- qplot(cyl, displ, data = mpg)

# First argument (name) controls axis labelp + scale_y_continuous("Displacement (l)")p + scale_x_continuous("Cylinders")

# Breaks and labels control tick marksp + scale_x_continuous(breaks = c(4, 6, 8))p + scale_x_continuous(breaks = c(4, 6, 8), labels = c("small", "medium", "big"))

# Limits control range of datap + scale_y_continuous(limits = c(1, 8))# same as:p + ylim(1, 8)

Wednesday, 4 November 2009

Page 15: 20 Polishing

qplot(carat, price, data = diamonds, geom = "hex")

Manipulate the fill colour legend to:

• Change the title to “Count"

• Display breaks at 1000, 3500 & 7000

• Add commas to the keys (e.g. 1,000)

• Set the limit for the scale from 0 to 8000.

Your turn

Wednesday, 4 November 2009

Page 16: 20 Polishing

p <- qplot(carat, price, data = diamonds, geom = "hex")

# First argument (name) controls legend titlep + scale_fill_continuous("Count")

# Breaks and labels control legend keysp + scale_fill_continuous(breaks = c(1000, 3500, 7000))p + scale_fill_continuous(breaks = c(0, 4000, 8000))

# Why don't 0 and 8000 have colours?p + scale_fill_continuous(breaks = c(0, 4000, 8000), limits = c(0, 8000))

# Can use labels to make more human readablebreaks <- c(0, 2000, 4000, 6000, 8000)labels <- format(breaks, big.mark = ",") p + scale_fill_continuous(breaks = breaks, labels = labels, limits = c(0, 8000))

Wednesday, 4 November 2009

Page 17: 20 Polishing

p <- qplot(color, carat, data = diamonds)

# Basically the same for discrete variablesp + scale_x_discrete("Color")

# Except limits is now a character vectorp + scale_x_discrete(limits = c("D", "E", "F"))

# Should work for boxplots tooqplot(color, carat, data = diamonds, geom = "boxplot") + scale_x_discrete(limits = c("D", "E", "F"))# But currently a bug :(

Wednesday, 4 November 2009

Page 18: 20 Polishing

Can also override the default choice of scales. You are most likely to want to do this with colour, as it is the most important aesthetic after position.

Need to know a little theory to be able to use it effectively: colour spaces & colour blindness.

Alternate scales

Wednesday, 4 November 2009

Page 19: 20 Polishing

Colour spaces

Most familiar is rgb: defines colour as mixture of red, green and blue. Matches the physics of eye, but the brain does a lot of post-processing, so it’s hard to directly perceive these components.

A more useful colour space is hcl: hue, chroma and luminance

Wednesday, 4 November 2009

Page 20: 20 Polishing

chromalum

inance

hue

Wednesday, 4 November 2009

Page 21: 20 Polishing

Demo of real shape in 3d

chromalum

inance

hue

Wednesday, 4 November 2009

Page 22: 20 Polishing

Default colour scales

Discrete: evenly spaced hues of equal chroma and luminance. No colour appears more important than any other. Does not imply order.

Continuous: evenly spaced hues between two colours.

Wednesday, 4 November 2009

Page 23: 20 Polishing

Alternatives

Discrete: brewer

Continuous: gradient2, gradientn

Wednesday, 4 November 2009

Page 24: 20 Polishing

Color brewer

Cynthia Brewer applied basics principles and then rigorously tested to produce selection of good palettes, particularly tailored for maps: http://colorbrewer2.org/

Can use cut_interval() or cut_number() to convert continuous to discrete.

Wednesday, 4 November 2009

Page 25: 20 Polishing

# Fancy looking trigonometric functionvals <- seq(-4 * pi, 4 * pi, len = 50)df <- expand.grid(x = vals, y = vals)df$r <- with(df, sqrt(x ^ 2 + y ^ 2))df$z <- with(df, cos(r ^ 2) * exp(- r / 6))df$z_cut <- cut_interval(df$z, 9)

(p1 <- qplot(x, y, data = df, fill = z, geom = "tile")) (p2 <- qplot(x, y, data = df, fill = z_cut, geom = "tile"))

Wednesday, 4 November 2009

Page 26: 20 Polishing

p1 + scale_fill_gradient(low = "white", high = "black")

# Highlight deviationsp1 + scale_fill_gradient2()p1 + scale_fill_gradient2(breaks = seq(-1, 1, by = 0.25), limits = c(-1, 1))p1 + scale_fill_gradient2(mid = "white", low = "black", high = "black")

p2 + scale_fill_brewer(pal = "Blues")

Wednesday, 4 November 2009

Page 27: 20 Polishing

Colour blindness

7-10% of men are red-green colour “blind”. (Many other rarer types of colour blindness)

Solutions: avoid red-green contrasts; use redundant mappings; test. I like color oracle: http://colororacle.cartography.ch

Wednesday, 4 November 2009

Page 28: 20 Polishing

Read through the examples for scale_colour_brewer, scale_colour_gradient2 and scale_colour_gradientn.

Experiment!

Your turn

Wednesday, 4 November 2009

Page 29: 20 Polishing

Other resources

A. Zeileis, K. Hornik, and P. Murrell. Escaping RGBland: Selecting colors for statistical graphics. Computational Statistics & Data Analysis, 2008.

http://statmath.wu-wien.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2008.pdf.

Wednesday, 4 November 2009

Page 30: 20 Polishing

Themes

Wednesday, 4 November 2009

Page 31: 20 Polishing

Visual appearance

So far have only discussed how to get the data displayed the way you want, focussing on the essence of the plot.

Themes give you a huge amount of control over the appearance of the plot, the choice of background colours, fonts and so on.

Wednesday, 4 November 2009

Page 32: 20 Polishing

# Two built in themes. The default:qplot(carat, price, data = diamonds)

# And a theme with a white background:qplot(carat, price, data = diamonds) + theme_bw()

# Use theme_set if you want it to apply to every# future plot.theme_set(theme_bw())

# This is the best way of seeing all the default# optionstheme_bw()theme_grey()

Wednesday, 4 November 2009

Page 33: 20 Polishing

You can also make your own theme, or modify and existing.

Themes are made up of elements which can be one of: theme_line, theme_segment, theme_text, theme_rect, theme_blank

Gives you a lot of control over plot appearance.

Elements

Wednesday, 4 November 2009

Page 34: 20 Polishing

ElementsAxis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y

Legend: legend.background, legend.key, legend.text, legend.title

Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor

Strip: strip.background, strip.text.x, strip.text.y

Wednesday, 4 November 2009

Page 35: 20 Polishing

p <- qplot(displ, hwy, data = mpg) + opts(title = "Bigger engines are less efficient")

# To modify a plotp p + opts(plot.title = theme_text(size = 12, face = "bold"))p + opts(plot.title = theme_text(colour = "red"))p + opts(plot.title = theme_text(angle = 45))p + opts(plot.title = theme_text(hjust = 1))

Wednesday, 4 November 2009

Page 36: 20 Polishing

Your turnFix the overlapping y labels on this plot:

qplot(reorder(model, hwy), hwy, data = mpg)

Rotate the labels on these strips so they are easier to read.

qplot(hwy, reorder(model, hwy), data = mpg) + facet_grid(manufacturer ~ ., scales = "free", space = "free")

Wednesday, 4 November 2009

Page 37: 20 Polishing

Feedbackhttp://hadley.wufoo.com/forms/

stat405-feedback/

Wednesday, 4 November 2009