Pure, predictable, pipeable: creating fluent interfaces with...
Transcript of Pure, predictable, pipeable: creating fluent interfaces with...
![Page 1: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/1.jpg)
Hadley Wickham @hadleywickham Chief Scientist, RStudio
Pure, predictable, pipeable: creating fluent interfaces with R
January 2015
![Page 2: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/2.jpg)
Hadley Wickham @hadleywickham Chief Scientist, RStudio
MOAH PIPES!
January 2015
![Page 3: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/3.jpg)
%>%magrittr::
![Page 4: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/4.jpg)
x %>% f(y) # f(x, y)
x %>% f(z, .) # f(z, x)
x %>% f(y) %>% g(z) # g(f(x, y), z)
# Turns function composition (hard to read) # into sequence (easy to read)
![Page 5: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/5.jpg)
foo_foo <- little_bunny()
foo_foo %>% hop_through(forest) %>% scoop_up(field_mouse) %>% bop_on(head)
# vs bop_on( scoop_up( hop_through(foo_foo, forest), field_mouse ), head )
![Page 6: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/6.jpg)
# From http://zevross.com/blog/2015/01/13/a-new-data-processing-workflow-for-r-dplyr-magrittr-tidyr-ggplot2/
library(dplyr) library(tidyr)
word_count <- shakespeare %>% group_by(word) %>% summarize(count=n(), total = sum(word_count)) %>% arrange(desc(total))
top8 <- shakespeare %>% semi_join(head(word_count, 8)) %>% select(-corpus_date) %>% spread(word, word_count, fill = 0)
![Page 7: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/7.jpg)
# Pipes for web scraping (hadley) library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/")
rating <- lego_movie %>% html_nodes("strong span") %>% html_text() %>% as.numeric()
cast <- lego_movie %>% html_nodes("#titleCast .itemprop span") %>% html_text()
poster <- lego_movie %>% html_nodes("#img_primary img") %>% html_attr("src")
![Page 8: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/8.jpg)
# Functional programming pipes (hadley + lionelgit) library(lowliner)
mtcars %>% split(.$cyl) %>% map(~ lm(mpg ~ wt, data = .)) %>% map(summary) %>% map_v("r.squared")
![Page 9: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/9.jpg)
# Control a digitalocean machine (sckott + hadley) library(analogsea)
droplet_create("my-droplet") %>% droplet_power_off() %>% droplet_snapshot() %>% droplet_power_on() %>%
![Page 10: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/10.jpg)
# Create, modify & delete gists (sckott) library(gistr)
gists("minepublic") %>% .[[1]] %>% add_files("~/alm_othersources.md") %>% update()
# http://recology.info/2015/01/gistr-github-gists/
![Page 11: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/11.jpg)
# Ensure objects are of correct type (smbache) library(ensurer) the_matrix <- get_matrix() %>% ensure_that(is.numeric(.), NCOL(.) == NROW(.))
![Page 12: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/12.jpg)
Goal: Solve complex problems by combining simple pieces.
![Page 13: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/13.jpg)
http://brickartist.com/gallery/pc-magazine-computer/. CC-BY-NC
![Page 14: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/14.jpg)
Principles
• Pure: each function is easy to understand in isolation.
• Predictable: once you've understood one, you've understood them all.
• Pipeable: combine simple pieces with a standard tool (%>%).
![Page 15: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/15.jpg)
PureGoal: each function can be easily understood in isolation
![Page 16: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/16.jpg)
A function is pure if:(a) Its output only depends on
its inputs(b) It makes no changes to the
state of the world
1 minute: what common R functions are impure?
![Page 17: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/17.jpg)
# Lots of important functions are impure:
# Outputs don't depend only on inputs runif(10) read.csv() Sys.time()
# Make changes to the world library() write.csv() plot() options() source() reference classes environments
![Page 18: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/18.jpg)
Why?
• Easier to reason about because you can understand them in isolation
• Trivial to parallelise
• Trivial to memoise (cache)
![Page 19: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/19.jpg)
How?• There are a lot of useful things you can’t
do with purity
• But you usually can isolate impurity to a handful of functions
• Doing so leads to code that’s easier to understand and easier to repurpose
• Case study: plot.lm() vs. fortify.lm()
![Page 20: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/20.jpg)
fortify.lm <- function(model, data = model$model, ...) { infl <- influence(model, do.coef = FALSE) data$.hat <- infl$hat data$.sigma <- infl$sigma data$.cooksd <- cooks.distance(model, infl)
data$.fitted <- predict(model) data$.resid <- resid(model) data$.stdresid <- rstandard(model, infl)
data }
See also https://github.com/dgrtwo/broom
![Page 21: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/21.jpg)
PredictableGoal: once you’ve mastered one member of a class you’ve mastered them all
![Page 22: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/22.jpg)
#rstats #wat
c(1, 2, 3) c("a", "b", "c") c(factor("a"), factor("b"))
diag(4:1) diag(4:2) diag(4:3) diag(4:4)
nchar("NA") nchar(NA)
![Page 23: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/23.jpg)
# But more problematic grepl(pattern, x) gsub(pattern, replacement, x) gregexpr(pattern, replacement, x) strsplit(x, pattern)
# CSVs read.csv(file) write.csv(x, file)
# Tabular data read.table(file) write.table(x, file)
# RDS files readRDS(file) saveRDS(x, file)
![Page 24: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/24.jpg)
dcast() melt()
spread() gather()
dplyr vs ggplot2 vs ggvis
![Page 25: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/25.jpg)
Why?• Because once you’ve understood one
member of a family you understand them all
• You don’t need to memorise special cases
• Easier to teach general principles which can be applied in multiple places
![Page 26: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/26.jpg)
How?• Punctuation
(snake_case or camelCase: pick one!)
• Function names• Verb vs. noun
• Tense, plural vs. singular
• UK english
• Think about autocomplete
• Argument names & order
• Object types
![Page 27: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/27.jpg)
# It's not possible to consistent in every direction
# Would be nice if first argument was file read.csv(file) write.csv(x, file)
# Would be nice if first argument was a data frame mutate(x) filter(x) write.csv(x, file)
# You can't reconcile conflicting notions # of consistency. But important to be aware and # consciously make tradeoff.
![Page 28: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/28.jpg)
PipeableGoal: combine simple pieces with a standard tool
![Page 29: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/29.jpg)
Why?
• Adds predictability across packages and across authors
• Learn once and apply in many situations
• Package doesn’t need to know about %>% to make use of it
![Page 30: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/30.jpg)
How?• The object being transformed should
passed in as the first argument and returned from each object
• Must identify what the key object is.
• Can transform from one type of object to another, but must be clearly signposted
![Page 31: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/31.jpg)
Examples• dplyr: data frames
• ggvis: visualisations
• gistr: GitHub gists
• lowliner: vectors
• tidyr: messy data → tidy data
• rvest: html page, lists of nodes, single nodes
• analogsea: droplets, actions
![Page 32: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/32.jpg)
magrittrExtensions and limitations
![Page 33: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/33.jpg)
Linear sequence is easiest to understand
Sometimes other objects are merged in
Can bring in sub-pipes, in moderation
YES YES YES
![Page 34: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/34.jpg)
There is one primary path One output No cycles
NO NO NO
![Page 35: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/35.jpg)
One exception: replace the initial element with %<>%
mtcars %<>% mutate( displ = displ * 0.0164 )
# Shorthand for mtcars <- mtcars %>% mutate( displ = displ * 0.0164 )
![Page 36: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/36.jpg)
Frequently repeating the same pipeline with different
input data?
.
Make a function!
to_metric <- . %>% mutate( displ = displ * 0.0164 ) mtcars %>% to_metric() mtcars %<>% to_metric()
# Shorthand for to_metric <- function(x) { x %>% mutate( displ = displ * 0.0164 ) }
![Page 37: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/37.jpg)
Conclusion
![Page 38: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/38.jpg)
To make your own fluent interfaces
• Make simple functions that are easily understood in isolation
• Make sure they all work the same way. Think about verbs & nouns.
• Combine them together with %>%
![Page 39: Pure, predictable, pipeable: creating fluent interfaces with Rfiles.meetup.com/1225993/Hadley_Wickham_pipe-dsls.pdf · write.csv(x, file) # Would be nice if first argument was a](https://reader033.fdocuments.net/reader033/viewer/2022050223/5f68a753d6ee5d758930fb7b/html5/thumbnails/39.jpg)
Questions?https://github.com/smbache/magrittr
More about magrittr