Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author:...
Transcript of Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author:...
![Page 1: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/1.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics
![Page 2: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/2.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
![Page 3: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/3.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
1
2
3
Massive Predictive Modeling
Use cases
Enabling technologies
4
![Page 4: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/4.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Quick Survey: How many models have you built? in your lifetime
> 10
> 100
> 1000
> 10000
>100000
>1000000
5
![Page 5: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/5.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7
# Models
Data Size (rows)
1 millions
billions
100s
Massive Predictive Modeling
“Specialized” “Generalized”
![Page 6: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/6.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 8
# Models
Data Size (rows)
1 millions
billions
100s
“Broad coverage”
“Targeted”
# Models per Entity
1
1000s
![Page 7: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/7.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Massive Predictive Modeling - Goals
• Build one or more models per entity, e.g., customer
• Understand and/or predict entity behavior
• Aggregate results across entities, e.g., to assess future demand
9
model
model
model
model
model
model
model
model
model
Σ cust=1
n
Demand over time
![Page 8: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/8.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Massive Predictive Modeling - Challenges
• Effectively dealing with “Big Data” – Hardware, software, network, storage
• Algorithms that scale and perform with Big Data
• Building “many” models in parallel
• Production deployment
• Storing and managing models
• Backup, recovery, and security
10
![Page 9: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/9.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Use Cases
14
![Page 10: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/10.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Predicting Customer Electricity Usage
15
![Page 11: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/11.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Motivation: Energy Theft Detecting patterns of meter tampering
SA country loses
US$4 billion per year due
to energy theft
Storage of information about
which meters have been
tampered with
Analysis and decision making
Forecast future behavior
16
![Page 12: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/12.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Motivation: Different customers, different demands
Each customer has different demand and consumption
patterns
Storage of information about the consumption
of each customer in different periods of day
Creation of a demand and consumption
curve for each customer
Analysis: in which period will company have to deliver more energy?
Price electricity in a
given period
Customer decides when to use energy to reduce cost
Company redirects the
energy to where it is most needed at the moment, saving on the generation
![Page 13: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/13.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensor Data Analysis
• Model each customer’s usage to understand behavior and predict individual usage and overall aggregate demand
• Consider 200K customers, each with a utility “smart meter”
• 1 reading / meter / hour
• 200K x 8760 hours / year 1.752B readings
• 3 years worth of data 5.256B readings
• 26280 readings per customer
• 10 seconds to build each model 555.6 hours (23.2 days) …with 128 DOP 4.3 hours
![Page 14: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/14.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
f(dat,args,…) {
}
Oracle Database
Data c1 c2 ci cn
R Script build model
f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…)
Model c1
Model c2
Model cn
Model ci
R Datastore R Script Repository
Database-centric architecture Smart meter scenario
![Page 15: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/15.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
scores c1
scores c2
scores ci
scores cn
f(dat,args,…) { }
Oracle Database
Data c1 c2 ci cn
R Script score data
f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…)
Model Model Model Model R Datastore R Script Repository
Database-centric architecture Smart meter scenario
![Page 16: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/16.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How many lines of code do you think it should take to implement this?
![Page 17: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/17.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Build models and store in database, partition on CUST_ID
ore.groupApply (CUST_USAGE_DATA,
CUST_USAGE_DATA$CUST_ID,
function(dat, ds.name) {
cust_id <- dat$CUST_ID[1]
mod <- lm(Consumption ~ . -CUST_ID, dat)
mod$effects <- mod$residuals <- mod$fitted.values <- NULL
name <- paste("mod", cust_id,sep="")
assign(name, mod)
ds.name1 <- paste(ds.name,".",cust_id,sep="")
ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=TRUE)
TRUE
},
ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE
)
14 lines
22
![Page 18: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/18.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Score customers in database, partition on CUST_ID
ore.groupApply(CUST_USAGE_DATA_NEW,
CUST_USAGE_DATA_NEW$CUST_ID,
function(dat, ds.name) {
cust_id <- dat$CUST_ID[1]
ds.name1 <- paste(ds.name,".",cust_id,sep="")
ore.load(ds.name1)
name <- paste("mod", cust_id,sep="")
mod <- get(name)
prd <- predict(mod, newdata=dat)
prd[as.integer(rownames(prd))] <- prd
res <- cbind(CUST_ID=cust_id, PRED = prd)
data.frame(res)
},
ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE,
FUN.VALUE=data.frame(CUST_ID=numeric(0), PRED=numeric(0))
)
16 lines
23
![Page 19: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/19.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Execution Examples (with DOP=24)
• 1000 Models
– Data: 26,280,000 rows
– Total build time: 65.2 seconds
– Total scoring time: 25.7 seconds (all data)
• 10,000 Models
– Data: 262,800,000 rows
– Total build time: 516 seconds
– Total scoring time: 217 seconds (all data)
24
• 50,000 Models
– Data: 1,314,000,000 rows
– Total build time: 55.85 minutes
– Total scoring time: 18 minutes (all data)
1
10
100
1000
10000
26.3 262.8 1314
Exe
cuti
on
(se
c)
# rows (millions)
Build Time
Score Time
1 Model/Customer
![Page 20: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/20.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Simulation
25
![Page 21: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/21.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compute distribution of generated random normal values simulation <- function(index, n) {
set.seed(index)
x <- rnorm(n)
res <- data.frame(t(matrix(summary(x))))
names(res) <- c("min","q1","median","mean","q3","max")
res$id <- index
res
}
(res <- simulation(1,1000))
26
![Page 22: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/22.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Simulation with sample size 1000 over 10 trials res <- ore.indexApply(10, simulation, n=1000, FUN.VALUE=res[1,], parallel=TRUE)
stats <- ore.pull(res)
library(reshape2)
melt.stats <- melt(stats, id.vars="id")
boxplot(value~variable, data=melt.stats, main="Distribution of Stats - sample 1000, 10 trials")
27
![Page 23: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/23.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Simulation with sample sizes 101:6 and 100 trials
num.trials <- 100
for(n in 10^(1:6)){
t1 <- system.time(stats <- ore.pull(ore.indexApply(num.trials, simulation, n=n,
FUN.VALUE=res[1,], parallel=TRUE)))[3]
cat("n=",n,", time=",t1,"\n")
melt.stats <- melt(stats, id.vars="id")
boxplot(value~variable, data=melt.stats,
main=paste("Distribution of Stats - sample",n,",", num.trials, "trials"))
gc()
}
28
![Page 24: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/24.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Plot Results: sample sizes 101:6 and 100 trials
![Page 25: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/25.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Scalable Performance varying number of trials 200..5000
(10^x)
![Page 26: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/26.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enabling Technologies
32
![Page 27: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/27.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise • Oracle Advanced Analytics Option to Oracle Database
• Eliminate memory constraint of client R engine
• Minimize or eliminate data movement latency
• Execute R scripts through database server machine for scalability and performance
• Achieve scalability and performance by leveraging Oracle Database as HPC environment
• Enable integration and management of R scripts through SQL
• Operationalize entire R scripts in production applications – eliminate porting R code
• Avoid reinventing code to integrate R results into existing applications
Client R Engine
ORE packages
Oracle Database User tables
Transparency Layer
In-db stats
Database Server Machine
SQL Interfaces SQL*Plus, SQLDeveloper, …
34
![Page 28: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/28.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle’s R Technologies
• Oracle R Distribution
• ROracle
• Oracle R Enterprise
• Oracle R Advanced Analytics for Hadoop
Software available to R Community for free
35
Come to our booth to learn more…
![Page 29: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/29.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Resources
• Oracle R Distribution • ROracle • Oracle R Enterprise • Oracle R Advanced Analytics for Hadoop
• Book: Using R to Unlock the Value of Big Data
• Blog: https://blogs.oracle.com/R/
• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397
http://oracle.com/goto/R
47
![Page 30: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/30.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
FastR
• New implementation of R in Java
– Uses the new Truffle interpreter framework and Graal optimizing compiler in conjunction with the HotSpot™ JVM for high performance, scalability and portability
– Dynamically compiles, adaptively optimizes and deoptimizes at run time
– Joint effort: Oracle Labs (Germany, USA, Austria), JKU Linz (Austria), Purdue University (USA), TU Dortmund (Germany)
• Open-source project (research prototype!)
– GPLv2
– https://bitbucket.org/allr/fastr
• More info at the poster session
48
![Page 31: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/31.jpg)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 49
![Page 32: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30](https://reader033.fdocuments.net/reader033/viewer/2022050107/5f4552bc1adab36b44647536/html5/thumbnails/32.jpg)