Post on 22-Dec-2015
SAS to R Migration
Richard Pugh
Commercial Director
rpugh@mango-solutions.com
rich@mango-solutions.com
Agenda
• What is SAS?• Why Migrate from SAS to R?• Case Study: Major Financial Company• How to Migrate from SAS to R?• Questions
rich@mango-solutions.com
rich@mango-solutions.com
> fortune("SUV")
When talking about user friendliness of computer software I like the analogy of cars vs. busses: [...]
Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.
R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS. -- Greg Snow R-help (May 2006)
rich@mango-solutions.com
Why Migrate to R?
rich@mango-solutions.com
rich@mango-solutions.com
rich@mango-solutions.com
Why NOT migrate?
rich@mango-solutions.com
rich@mango-solutions.com
rich@mango-solutions.com
Case StudyMajor Financial Firm
rich@mango-solutions.com
rich@mango-solutions.com
Background
• SAS User Base• Mature SAS Processes• Leverage Oracle Investment
Can Oracle R Enterprise replace SAS?
rich@mango-solutions.com
Initial PoC
• Key SAS process: Credit Risk• 1,625 lines of SAS Code
• 79 “data steps” • 66 “procedure” calls• 29 macros
• Passionate SAS User Community
rich@mango-solutions.com
Initial PoC
rich@mango-solutions.com
Initial PoC
Theme Question
Capabilities
Does R/ORE provide all the SAS capabilities required?
What gaps, if any, exist between R and SAS?
Where, and why, do results differ between R and SAS?
WorkflowHow does the “style” of coding differ between R and SAS?
How easy, or not, was it to implement the existing SAS workflow?
Skills
How much learning is required in order to become proficient in R?
How much learning is required to take on and manage the
R implementation of the modelling macros?
ExtensionsWhat areas of “value add” arise from using R for these tasks?
What areas of “value add” outside of the current scope are enabled by using R?
rich@mango-solutions.com
Oracle R Enterprise
• In-database implementation of R• Very appealing: take R to the database• Features of ORE
• ROracle Implementation• Transparency Layer• Publish functions to database (access via R, SQL)
• Learn more at www.oracle.com/goto/R
rich@mango-solutions.com
How to Migrate from SAS to R?
rich@mango-solutions.com
What to Migrate?
• Doesn’t happen overnight!• Choose a key first step
• Functional Area• Capability (e.g. graphics, time series)
rich@mango-solutions.com
Step 1Analyse
SAS Code
rich@mango-solutions.com
SAS Code Analysis%Macro1;
data a; set b; run;
%mend;
…
1,500 lines of code
…
%Macro2;
data c; set a; run;
%mend;
• Can be complex• Scoping rules in
particular can be a challenge
rich@mango-solutions.com
Use R to Analyse SAS Code
rich@mango-solutions.com
Use R to Analyse SAS Code
rich@mango-solutions.com
SAS Dependencies with functionMap
rich@mango-solutions.com
Step 2Tame the SAS Code
rich@mango-solutions.com
• Version Control• Unit Tests• Continuous
Integration
Tame your SAS Code
rich@mango-solutions.com
Step 3Translate the Code
rich@mango-solutions.com
Translate the Code
• Translate the Unit Tests first• Then, translate macros one at a time• Proc translates can be partially-automated, but
care must be taken
rich@mango-solutions.com
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
rich@mango-solutions.com
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
rich@mango-solutions.com
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
rich@mango-solutions.com
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = “HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
rich@mango-solutions.com
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = “HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
17 SAS Lines > 8 R Lines
rich@mango-solutions.com
Step 4Use Oracle R Enterprise
rich@mango-solutions.com
Oracle R Enterprise
• Remove code to import/export from database• Replace with links to the database• Look for other opportunities (e.g. using in-
database GLM vs standard)
rich@mango-solutions.com
Oracle R Enterprise
library(ORE) # Load the libraryore.connect(…) # Make the connection
…
ore.create(newData, table = "X") # Create new db tableX[1:5, ] # Simple Command
…
# Define function to runtheFun <- function(x, F, ...) step(ore.glm(F, data = x,
family = "binomial"), direction = "both")
# Run the modelstepOut <- ore.tableApply(X, theFun, F = as.formula("DV ~ *"))
…
ore.disconnect()
rich@mango-solutions.com
Review
AnalysisCode
UnitTests
UnitTests
AnalysisCode
Oracle REnterprise
SQLInterface
rich@mango-solutions.com
Findings
• A formal migration process allows for a clear and accurate transition
• SAS code conversion to R at a rate of ~200 lines per day
• Code base reduces by ~55%
rich@mango-solutions.com
Challenges
• More relaxed formal scoping of SAS• Differences in statistical algorithms• The danger of migrating poor code flows
rich@mango-solutions.com
Code Migration isn’t just technical …
rich@mango-solutions.com
SAS Migration is more about people …
rich@mango-solutions.com
Why are these business users so defensive? It’s just a computer language!!
Taking away SAS means taking away
my ability to do analysis!!
rich@mango-solutions.com
Convincing People to move to R
• Concede some ground …• Show quick wins• Teach the basic data structures early
rich@mango-solutions.com
SAS to R Brain Dump …
Questions?