Agenda
• What is SAS?• Why Migrate from SAS to R?• Case Study: Major Financial Company• How to Migrate from SAS to R?• Questions
> fortune("SUV")
When talking about user friendliness of computer software I like the analogy of cars vs. busses: [...]
Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.
R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS. -- Greg Snow R-help (May 2006)
Why Migrate to R?
Why NOT migrate?
Case StudyMajor Financial Firm
Background
• SAS User Base• Mature SAS Processes• Leverage Oracle Investment
Can Oracle R Enterprise replace SAS?
Initial PoC
• Key SAS process: Credit Risk• 1,625 lines of SAS Code
• 79 “data steps” • 66 “procedure” calls• 29 macros
• Passionate SAS User Community
Initial PoC
Initial PoC
Theme Question
Capabilities
Does R/ORE provide all the SAS capabilities required?
What gaps, if any, exist between R and SAS?
Where, and why, do results differ between R and SAS?
WorkflowHow does the “style” of coding differ between R and SAS?
How easy, or not, was it to implement the existing SAS workflow?
Skills
How much learning is required in order to become proficient in R?
How much learning is required to take on and manage the
R implementation of the modelling macros?
ExtensionsWhat areas of “value add” arise from using R for these tasks?
What areas of “value add” outside of the current scope are enabled by using R?
Oracle R Enterprise
• In-database implementation of R• Very appealing: take R to the database• Features of ORE
• ROracle Implementation• Transparency Layer• Publish functions to database (access via R, SQL)
• Learn more at www.oracle.com/goto/R
How to Migrate from SAS to R?
What to Migrate?
• Doesn’t happen overnight!• Choose a key first step
• Functional Area• Capability (e.g. graphics, time series)
SAS Code Analysis%Macro1;
data a; set b; run;
%mend;
…
1,500 lines of code
…
%Macro2;
data c; set a; run;
%mend;
• Can be complex• Scoping rules in
particular can be a challenge
Use R to Analyse SAS Code
Use R to Analyse SAS Code
SAS Dependencies with functionMap
Step 2Tame the SAS Code
Step 3Translate the Code
Translate the Code
• Translate the Unit Tests first• Then, translate macros one at a time• Proc translates can be partially-automated, but
care must be taken
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = ‘HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = “HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
%macro sampler(DS=); data random; set datalib.&DS.; xxx=ranuni(54321); origorder + 1; run; proc sort data=random ; by xxx; run; data datalib.&DS.; set random nobs=numg; if _n_ le &DEVPERC.*numg then Holdout=0; else Holdout=1; run; proc sort data=datalib.&DS.; by origorder; run; proc freq data= datalib.&DS.; tables Holdout /missing; weight weight; run; %mend sampler;
sampler <- function(ds, DEVPERC = .8, hCol = “HOLDOUT”) { N <- nrow(ds) holdTest <- runif(N) > DEVPERC ds[[hCol]] <- as.numeric(holdTest) outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length) print(transform(outDf, Percent = round(100 * Freq / N, 2))) invisible(ds)}
17 SAS Lines > 8 R Lines
Step 4Use Oracle R Enterprise
Oracle R Enterprise
• Remove code to import/export from database• Replace with links to the database• Look for other opportunities (e.g. using in-
database GLM vs standard)
Oracle R Enterprise
library(ORE) # Load the libraryore.connect(…) # Make the connection
…
ore.create(newData, table = "X") # Create new db tableX[1:5, ] # Simple Command
…
# Define function to runtheFun <- function(x, F, ...) step(ore.glm(F, data = x,
family = "binomial"), direction = "both")
# Run the modelstepOut <- ore.tableApply(X, theFun, F = as.formula("DV ~ *"))
…
ore.disconnect()
Review
AnalysisCode
UnitTests
UnitTests
AnalysisCode
Oracle REnterprise
SQLInterface
Findings
• A formal migration process allows for a clear and accurate transition
• SAS code conversion to R at a rate of ~200 lines per day
• Code base reduces by ~55%
Challenges
• More relaxed formal scoping of SAS• Differences in statistical algorithms• The danger of migrating poor code flows
Code Migration isn’t just technical …
SAS Migration is more about people …
Why are these business users so defensive? It’s just a computer language!!
Taking away SAS means taking away
my ability to do analysis!!
Convincing People to move to R
• Concede some ground …• Show quick wins• Teach the basic data structures early
SAS to R Brain Dump …
Questions?
Top Related