Data and donuts: Data Visualization using R

Post on 12-Apr-2017

129 views 11 download

Transcript of Data and donuts: Data Visualization using R

Data Visualization using R

C. Tobin Magle, PhD03-08-2017

10:00-11:00 a.m.Morgan Library

Computer Classroom 175Based on http://www.datacarpentry.org/R-ecology-lesson/

Outline

• Data wrangling using dplyr• select• filter

• Visualization using ggplot2• Basics (Data, Aesthetics, Geoms)• Modifications (transparency, color, faceting) • Themes (modifying default, using premade, saving your own)

Setup a working directory

• Start RStudio •  File >  New project > New directory > Empty project• Enter a name for this new folder and choose a convenient

location for it (working directory)• Click on “Create project”• Create a data folder in your working directory• Create a new R script (File > New File > R script) and save it

in your working directory

Data frames

• Store data tables• Rows = observations• Cols = variables

• Example: R ecology dataset • Observation: an animal in the

wild• Variables: record_id, month,

day, year, plot_id, species_id, sex, hindfoot_length, weight, genus, species, taxa, plot_type

Load data into R

• Can download using download.file• download.file("https://ndownloader.figshare.com/files/2292169",

"data/portal_data_joined.csv")

• Read data using read.csv function• surveys <- read.csv('data/portal_data_joined.csv')

Subsetting (Using dplyr)

• select: pick specific columnsselect(surveys, plot_id, species_id, weight)

• filter: pick specific rowsfilter(surveys, year == 1995)

• Combine with pipessurveys_sml<-surveys %>%

filter(weight < 5) %>% select(species_id, sex, weight)

Ex: Remove all incomplete data records

• Missing data (NA) when graphing• Points aren’t plotted• warning messages

• Solution: remove

surveys_complete <- surveys %>% filter(species_id != "", # remove missing species_id !is.na(weight), # remove missing weight

!is.na(hindfoot_length), # remove missing hindfoot_length sex != "") # remove missing sex

Graphics with ggplot2

• data: what data frame do you want to graph?

• aesthetics: what variables and variables to define presentation

• geoms: graphical representations (points, lines, bars)

• Requires + operator

Basicggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point()

Add transparencyggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point(alpha = 0.1)

Add colorggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +

geom_point(alpha = 0.1, color = "blue")

Add color by speciesggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, aes(color=species_id))

Example: Box plotggplot(data = surveys_complete, aes(x = species_id, y = hindfoot_length)) +

geom_boxplot()

Example: Time series data

Reshape data: yearly_counts <- surveys_complete %>%

group_by(year, species_id) %>% tally

Plot: ggplot(data = yearly_counts, aes(x = year, y = n)) + geom_line()

Separate by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id)) +

geom_line()

Color by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id, colour = species_id)) +

geom_line()

Applying a premade themeggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex, group = sex)) +

geom_line() + theme_bw()

Making it pretty (themes)Remove background ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + theme_bw() + theme(panel.grid.major.x = element_blank(),

panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())

Customize axis labels

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw()

Customize font size

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw() + theme(text=element_text(size=16,

family="Arial"))

Rotate axis labels

ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) +

geom_line() + labs(title = 'Species count over time',

x = 'Year of observation', y = 'Count') + theme_bw() + theme(axis.text.x = element_text(colour="grey20",

size=12, angle=90,

hjust=.5, vjust=.5),

axis.text.y = element_text(colour="grey20", size=12),

text=element_text(size=16, family="Arial"))

Create your own themearial_grey_theme <- theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))

ggplot(surveys_complete, aes(x = species_id, y = hindfoot_length)) + geom_boxplot() + arial_grey_theme

Save your plot

my_plot <- ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Observed species in time', x = 'Year of observation', y = 'Number of species') + theme_bw() + theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))

ggsave("name_of_file.png", my_plot, width=15, height=10)

Need help?• Email: tobin.magle@colostate.edu

• Data Management Services website: http://lib.colostate.edu/services/data-management

• Data Carpentry: http://www.datacarpentry.org/• R Ecology Lesson: http://www.datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html

• Cheat Sheets:• Data wrangling: https://

www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf• Ggplot2: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf