Data and donuts: Data Visualization using R

24
Data Visualization using R C. Tobin Magle, PhD 03-08-2017 10:00-11:00 a.m. Morgan Library Computer Classroom 175 Based on http://www.datacarpentry.org/R-ecology-lesson /

Transcript of Data and donuts: Data Visualization using R

Page 1: Data and donuts: Data Visualization using R

Data Visualization using R

C. Tobin Magle, PhD03-08-2017

10:00-11:00 a.m.Morgan Library

Computer Classroom 175Based on http://www.datacarpentry.org/R-ecology-lesson/

Page 2: Data and donuts: Data Visualization using R

Outline

• Data wrangling using dplyr• select• filter

• Visualization using ggplot2• Basics (Data, Aesthetics, Geoms)• Modifications (transparency, color, faceting) • Themes (modifying default, using premade, saving your own)

Page 3: Data and donuts: Data Visualization using R

Setup a working directory

• Start RStudio •  File >  New project > New directory > Empty project• Enter a name for this new folder and choose a convenient

location for it (working directory)• Click on “Create project”• Create a data folder in your working directory• Create a new R script (File > New File > R script) and save it

in your working directory

Page 4: Data and donuts: Data Visualization using R

Data frames

• Store data tables• Rows = observations• Cols = variables

• Example: R ecology dataset • Observation: an animal in the

wild• Variables: record_id, month,

day, year, plot_id, species_id, sex, hindfoot_length, weight, genus, species, taxa, plot_type

Page 5: Data and donuts: Data Visualization using R

Load data into R

• Can download using download.file• download.file("https://ndownloader.figshare.com/files/2292169",

"data/portal_data_joined.csv")

• Read data using read.csv function• surveys <- read.csv('data/portal_data_joined.csv')

Page 6: Data and donuts: Data Visualization using R

Subsetting (Using dplyr)

• select: pick specific columnsselect(surveys, plot_id, species_id, weight)

• filter: pick specific rowsfilter(surveys, year == 1995)

• Combine with pipessurveys_sml<-surveys %>%

filter(weight < 5) %>% select(species_id, sex, weight)

Page 7: Data and donuts: Data Visualization using R

Ex: Remove all incomplete data records

• Missing data (NA) when graphing• Points aren’t plotted• warning messages

• Solution: remove

surveys_complete <- surveys %>% filter(species_id != "", # remove missing species_id !is.na(weight), # remove missing weight

!is.na(hindfoot_length), # remove missing hindfoot_length sex != "") # remove missing sex

Page 8: Data and donuts: Data Visualization using R

Graphics with ggplot2

• data: what data frame do you want to graph?

• aesthetics: what variables and variables to define presentation

• geoms: graphical representations (points, lines, bars)

• Requires + operator

Page 9: Data and donuts: Data Visualization using R

Basicggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point()

Page 10: Data and donuts: Data Visualization using R

Add transparencyggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point(alpha = 0.1)

Page 11: Data and donuts: Data Visualization using R

Add colorggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point(alpha = 0.1, color = "blue")

Page 12: Data and donuts: Data Visualization using R

Add color by speciesggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, aes(color=species_id))

Page 13: Data and donuts: Data Visualization using R

Example: Box plotggplot(data = surveys_complete, aes(x = species_id, y = hindfoot_length)) +

geom_boxplot()

Page 14: Data and donuts: Data Visualization using R

Example: Time series data

Reshape data: yearly_counts <- surveys_complete %>%

group_by(year, species_id) %>% tally

Plot: ggplot(data = yearly_counts, aes(x = year, y = n)) + geom_line()

Page 15: Data and donuts: Data Visualization using R

Separate by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id)) +

geom_line()

Page 16: Data and donuts: Data Visualization using R

Color by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id, colour = species_id)) +

geom_line()

Page 17: Data and donuts: Data Visualization using R

Applying a premade themeggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex, group = sex)) +

geom_line() + theme_bw()

Page 18: Data and donuts: Data Visualization using R

Making it pretty (themes)Remove background ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + theme_bw() + theme(panel.grid.major.x = element_blank(),

panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())

Page 19: Data and donuts: Data Visualization using R

Customize axis labels

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw()

Page 20: Data and donuts: Data Visualization using R

Customize font size

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw() + theme(text=element_text(size=16,

family="Arial"))

Page 21: Data and donuts: Data Visualization using R

Rotate axis labels

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) +

geom_line() + labs(title = 'Species count over time',

x = 'Year of observation', y = 'Count') + theme_bw() + theme(axis.text.x = element_text(colour="grey20",

size=12, angle=90,

hjust=.5, vjust=.5),

axis.text.y = element_text(colour="grey20", size=12),

text=element_text(size=16, family="Arial"))

Page 22: Data and donuts: Data Visualization using R

Create your own themearial_grey_theme <- theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))

ggplot(surveys_complete, aes(x = species_id, y = hindfoot_length)) + geom_boxplot() + arial_grey_theme

Page 23: Data and donuts: Data Visualization using R

Save your plot

my_plot <- ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Observed species in time', x = 'Year of observation', y = 'Number of species') + theme_bw() + theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))

ggsave("name_of_file.png", my_plot, width=15, height=10)

Page 24: Data and donuts: Data Visualization using R

Need help?• Email: [email protected]

• Data Management Services website: http://lib.colostate.edu/services/data-management

• Data Carpentry: http://www.datacarpentry.org/• R Ecology Lesson: http://www.datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html

• Cheat Sheets:• Data wrangling: https://

www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf• Ggplot2: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf