sparklyr - Jeff Allen
-
Upload
jo-fai-chow -
Category
Technology
-
view
301 -
download
0
Transcript of sparklyr - Jeff Allen
THE R LANGUAGE
• 5th most popular programming language in the world1
• Data analysis, statistical modeling, visualization
• Great interface to your data
• Historically limited to in-memory data
1 http://spectrum.ieee.org/computing/software/the-2016-top-programming-languages
WHAT IS SPARK?
• Open-source Apache computing engine
• Bigger-than-memory data, low-latency distributed computing
• Can integrate with the Hadoop ecosystem
• Built-in machine learning
• New open-source R package from RStudio
• Complete dplyr back-end for Spark
• Integrated with the RStudio IDE
• Extensible foundation for Spark + R
sparklyr
KICK THE TIRESlibrary(sparklyr)spark_install()sc <- spark_connect(“local")my_tbl <- copy_to(sc, iris)
Driver Program
Cluster Manager
Worker
Spark Context
Task Task
Executor Cache
library(dplyr)
# use standard verbs to filter and aggregate select( filter(my_tbl, Petal_Width < 0.3), Petal_Length, Petal_Width )
# use magrittr pipes for a cleaner syntax my_tbl %>% filter(Petal_Width < 0.3) %>% select(Petal_Length, Petal_Width)
USE DPLYR TO WRITE SPARK SQL
RUN IN PRODUCTION
Spark Cluster
Master Node
Driver Program
Spark ContextWorker Node
Task Task
Executor Cache
Cluster Manager
Worker Node
Task Task
Executor Cache
Use RStudio Server on the Spark cluster master node> spark_connect(“spark://spark.company.org:7077”)> my_tbl <- tbl(sc, “tblname”)
SPARKLYR FUNCTIONALITY• Full dplyr back-end for Spark DataFrames
• R wrappers for all MLlib functions
• Easily leverage from R Markdown, Shiny, etc.
• IDE integration
• Windows, too!
SPARKLYR & H2O
• rsparkling: R interface to Sparkling Water Spark package
• A sparklyr extension
• http://spark.rstudio.com/h2o.html
RELATIONSHIP TO SPARKR
• Working together to establish a common extension API
• Some differences in approach:
• CRAN distribution
• dplyr compatibility
APPLICATIONS
http://spark.rstudio.com/examples.html
QUESTIONS?http://spark.rstudio.com
@trestleJeff
https://github.com/trestletech/user2016-sparklyr