Microsoft R Server for Data Sciencea
-
Upload
data-science-thailand -
Category
Data & Analytics
-
view
689 -
download
2
Transcript of Microsoft R Server for Data Sciencea
Data Science Team
Data Engineering
Data Science
Application Development
Business Acumen
Data Management
Data
Dividend
Typical advanced analytics lifecycle
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
Data Scientist should be creating / testing models
Data scientist are rare and expensive
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
But the reality is different …
Data scientist focus time
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
80%
5%
15%
Decisions
OperationizePreparation
Model
• Embrace Open Source
• Evolutionary Path to Cloud
• Democratize Data Science
• Skill Re-Use
• Transparent Scaling
• Facilitate Collaboration
• Decouple Data Science from Platforms
• Leverage Hybrid Cloud Architecture
• Accelerate Experimentation
• Streamline Deployment
Broaden The
Talent Pool
Increase
Productivity
Modernize
Infrastructure
Maximize
Innovation
Drive Down
TCO
People
+
Data Sources
Apps
Sensors and devices
From Data To Action On Premises
INTELLIGENCEDATA ACTION
Automated SystemsMicrosoft R Server & SQL R Services
Apps
Cortana Intelligence
Challenges posed by open source R
??
Lack of Commercial
Support
InadequateModeling
Performance
Complex DeploymentProcesses
Limited Data Scale
R from Microsoft brings
Peace of mind
Efficiency Speed and scalability
Flexibility and agility
High-performance, Scalable R
Linux, Windows, Hadoop & Teradata
R Server Technology
CommercialOpen Community
Revolution R Open
R Open
Revolution R Enterprise
R Server
Escapes R’s traditional memory limits
Scales predictive modeling using parallelization
Distributes computation cores & nodes
Minimizes data movement using in-database, in-MapReduce and in-Apache Spark execution
• Remote Execution
• Transparent Parallelization:
• Shared Resource Management
Data
Nodes
Corporate
Applications
Desktops &
Servers
direct web services
Microsoft R
Server
Hadoop
Distributed R - How Does Remote Compute Context ?
Algorithm
Master
Predictive
Algorithm
Big
Data
Analyze
Blocks In
Parallel
Load Block
At A TimeDistribute Work,
Compile Results
“Pack and Ship”
Requests to
Remote
Environments
Results
Microsoft R Server functions
• A compute context defines where to process.
• E.g. remote context like Hadoop Map Reduce
• Microsoft R functions prefixed with rx
• Current set compute context determines processing
location
Copyright Microsoft Corporation. All rights reserved.
Microsoft R Server “Client” Microsoft R Server “Server”
Console
R IDE or
command-
line REMOTE
CONTEXT
### SETUP HADOOP ENVIRONMENT VARIABLES ###
myHadoopCC <- RxHadoopMR()
### HADOOP COMPUTE CONTEXT ###
rxSetComputeContext(myHadoopCC)
### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###
hdfsFS <- RxHdfsFileSystem()
hdfsFS
### ANALYTICAL PROCESSING ###
### Statistical Summary of the data
rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)
### CrossTab the data
rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)
### Linear Model and plot
hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet)
plot(hdfsXdfArrLateLinMod$coefficients)
### SETUP LOCAL ENVIRONMENT VARIABLES ###
myLocalCC <- “localpar”
### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext(myLocalCC)
### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###
localFS <- RxNativeFileSystem()
AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”,
fileSystem = localFS)
Local Parallel processing – Linux or Windows In – Hadoop
ScaleR models can be deployed from a server or edge node to run in Hadoop
without any functional R model re-coding for map-reduce
Compute
context R script
– sets where the
model will run
Functional
model R script –
does not need
to change to run
in Hadoop
Copyright Microsoft Corporation. All rights reserved.
DeployR• Web services software development kit for
integration analytics via APIs :
• Java
• JavaScript
• .NET Integrates R Into application infrastructures
Capabilities:
• Enterprise authentication & security
• Horizontal scaling
• Invokes R Scripts from web services calls
• RESTful interface for easy integration
• Works with:
• Web & mobile apps
• Leading BI & Visualization tools
• Business rules and streaming engines
DeployR DevelopR
19
On-demand sales forecasting
Real-time social
media analysisLeveraging the
power of Office365
Microsoft R Server provides a unique opportunity to deliver advanced analytics capabilities to customers who have already invested in storing their data on non Microsoft platforms like Hadoop, Teradata and Linux
Hadoop
- Cloudera CDH, Hortonworks HDP, and HDInsight
Write Once – Deploy Anywhere
R Server portfolio
Cloud
RDBMS
Desktops & Servers
Hadoop & Spark
EDWR Server Technology
Included in SQL Server 2016
Reuse and optimize existing R code
Eliminate data movement
In-database deployment
Memory and disk scalability
No R memory limits
Write once, deploy anywhere
Enterprise speed and scale
Near-DB analytics
Parallel threading and processing
Reuse SQL skills for data engineering
Cost effectiveness
Scalability and choice
Simplicity and agility
• The industry’s broadest R-based platform
• Enterprise scale atop spark, Hadoop, RDBMSs & EDWs
• Freedom from memory limits
• Choice of Windows and Linux IDEs
• Stable deployment
• Write-once-deploy-anywhere portability
• Investment protection
• Hybrid cloud evolution
Introduces the following topics:
1. Creating an R Server on Spark HDInsight cluster
2. Installing RStudio for the cluster
3. Running R using Rstudio on web
Reference: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-server-get-started/
Get Essentials Microsoft Developer Resources and R Server Developer Edition: aka.ms/ch9.th
Microsoft R Server on-premises: www.microsoft.com/R-Server
Microsoft R Server on Azure (Cloud): https://azure.microsoft.com/en-us/marketplace/partners/microsoft-r-products/microsoft-r-server/
What is
• A statistics programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 7000+ free algorithms in CRAN
• Scalable to big data
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
Convergence with Flexibility
Scalable Algorithms
R: Write Once Deploy Anywhere
Templates & Samples
Microsoft R Server Family
R & Python to AML Interop.
Cortana Intelligence
DistributedR
ScaleR
ConnectR
DevelopR
Code Portability Across Platforms
In the Cloud Azure HDI/ Spark
Workstations & Servers LinuxWindows
Clustered SystemsLinux Clusters (LSF For Now)Microsoft HPC
EDW Teradata
HadoopHortonworksClouderaMapR &HDInsight
DI
R+
CR
AN
Mic
roso
ft R
DistributedR
DeployR DevelopR
ScaleR
ConnectR
Delivers High Performance Parallel Distributed Analytics Across Individual and Clustered Systems
• Cloudera
• Hortonworks
• MapR
• Apache Spark
• IBM Platform LSF
• Microsoft HPC Clusters
• Teradata Database
• Red Hat
• SuSE Servers
• Windows
DistributeR
RevoDeployR Web Services
Client libraries (JavaScript, Java, .NET)
Desktop
Applications
(i.e. Excel)
Business
Intelligence
PowerBI
Interactive Web or
Mobile
Applications
HTTP/HTTPS – JSON/XML
Session
ManagementAuthentication
Data/Script
ManagementAdministration
RR
R scripts
End User
Application
Developer
Admin
Data Scientist
Grid Node
R