Data and donuts: Data Visualization using R
-
Upload
c-tobin-magle -
Category
Data & Analytics
-
view
129 -
download
11
Transcript of Data and donuts: Data Visualization using R
Data Visualization using R
C. Tobin Magle, PhD03-08-2017
10:00-11:00 a.m.Morgan Library
Computer Classroom 175Based on http://www.datacarpentry.org/R-ecology-lesson/
Outline
• Data wrangling using dplyr• select• filter
• Visualization using ggplot2• Basics (Data, Aesthetics, Geoms)• Modifications (transparency, color, faceting) • Themes (modifying default, using premade, saving your own)
Setup a working directory
• Start RStudio • File > New project > New directory > Empty project• Enter a name for this new folder and choose a convenient
location for it (working directory)• Click on “Create project”• Create a data folder in your working directory• Create a new R script (File > New File > R script) and save it
in your working directory
Data frames
• Store data tables• Rows = observations• Cols = variables
• Example: R ecology dataset • Observation: an animal in the
wild• Variables: record_id, month,
day, year, plot_id, species_id, sex, hindfoot_length, weight, genus, species, taxa, plot_type
Load data into R
• Can download using download.file• download.file("https://ndownloader.figshare.com/files/2292169",
"data/portal_data_joined.csv")
• Read data using read.csv function• surveys <- read.csv('data/portal_data_joined.csv')
Subsetting (Using dplyr)
• select: pick specific columnsselect(surveys, plot_id, species_id, weight)
• filter: pick specific rowsfilter(surveys, year == 1995)
• Combine with pipessurveys_sml<-surveys %>%
filter(weight < 5) %>% select(species_id, sex, weight)
Ex: Remove all incomplete data records
• Missing data (NA) when graphing• Points aren’t plotted• warning messages
• Solution: remove
surveys_complete <- surveys %>% filter(species_id != "", # remove missing species_id !is.na(weight), # remove missing weight
!is.na(hindfoot_length), # remove missing hindfoot_length sex != "") # remove missing sex
Graphics with ggplot2
• data: what data frame do you want to graph?
• aesthetics: what variables and variables to define presentation
• geoms: graphical representations (points, lines, bars)
• Requires + operator
Basicggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +
geom_point()
Add transparencyggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +
geom_point(alpha = 0.1)
Add colorggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +
geom_point(alpha = 0.1, color = "blue")
Add color by speciesggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, aes(color=species_id))
Example: Box plotggplot(data = surveys_complete, aes(x = species_id, y = hindfoot_length)) +
geom_boxplot()
Example: Time series data
Reshape data: yearly_counts <- surveys_complete %>%
group_by(year, species_id) %>% tally
Plot: ggplot(data = yearly_counts, aes(x = year, y = n)) + geom_line()
Separate by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id)) +
geom_line()
Color by speciesggplot(data = yearly_counts, aes(x = year, y = n, group = species_id, colour = species_id)) +
geom_line()
Applying a premade themeggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex, group = sex)) +
geom_line() + theme_bw()
Making it pretty (themes)Remove background ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + theme_bw() + theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())
Customize axis labels
ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw()
Customize font size
ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Species count over time', x = 'Year of observation', y = 'Count') + theme_bw() + theme(text=element_text(size=16,
family="Arial"))
Rotate axis labels
ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) +
geom_line() + labs(title = 'Species count over time',
x = 'Year of observation', y = 'Count') + theme_bw() + theme(axis.text.x = element_text(colour="grey20",
size=12, angle=90,
hjust=.5, vjust=.5),
axis.text.y = element_text(colour="grey20", size=12),
text=element_text(size=16, family="Arial"))
Create your own themearial_grey_theme <- theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))
ggplot(surveys_complete, aes(x = species_id, y = hindfoot_length)) + geom_boxplot() + arial_grey_theme
Save your plot
my_plot <- ggplot(data = yearly_counts, aes(x = year, y = n, color = species_id)) + geom_line() + labs(title = 'Observed species in time', x = 'Year of observation', y = 'Number of species') + theme_bw() + theme(axis.text.x = element_text(colour="grey20", size=12, angle=90, hjust=.5, vjust=.5), axis.text.y = element_text(colour="grey20", size=12), text=element_text(size=16, family="Arial"))
ggsave("name_of_file.png", my_plot, width=15, height=10)
Need help?• Email: [email protected]
• Data Management Services website: http://lib.colostate.edu/services/data-management
• Data Carpentry: http://www.datacarpentry.org/• R Ecology Lesson: http://www.datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html
• Cheat Sheets:• Data wrangling: https://
www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf• Ggplot2: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf