Advanced analytics with R and SQL

25
Advanced Analytics with R and SQL Stéphane Fréchette Data Platform Solution Architect Twitter: @sfrechette

Transcript of Advanced analytics with R and SQL

Page 1: Advanced analytics with R and SQL

Advanced Analytics with R and SQL

Stéphane Fréchette

Data Platform Solution Architect

Twitter: @sfrechette

Page 2: Advanced analytics with R and SQL
Page 3: Advanced analytics with R and SQL

3

SQL Server

enables

data mining

using SSAS

Computers

work on

users behalf,

filtering junk

email

Microsoft

search

engine built

with machine

learning

Bing Maps

ships with

ML traffic-

prediction

service

1999 20082004 2005

Microsoft

Kinect can

watch users

gestures

Microsoft

launches

Azure

Machine

Learning

Successful,

real-time,

speech-to-

speech

translation

2012 20142010

Microsoft

launches R

server for

scalable,

enterprise

grade

analytics

SQL ‘16

supports

advanced

analytics in-

DB using R

2015 2016

I believe over the next decade computing will become even more ubiquitous and

intelligence will become ambient. This will be made possible by an ever-growing network of

connected devices, incredible computing capacity from the cloud, insights from big data, and

intelligence from machine learning.

Machine learning is pervasive throughout Microsoft products.

Page 4: Advanced analytics with R and SQL

Value

DataActionDecisions

Advanced

AnalyticsPredictive & Prescriptive

Analytics

Business

IntelligenceDescriptive &

Diagnostic Analytics

Page 5: Advanced analytics with R and SQL

Large computers and related products/services

Page 6: Advanced analytics with R and SQL

Advanced Analytics Process

OperationalizeModelPrepare

Page 7: Advanced analytics with R and SQL

Intro to RThe Language of Advanced Analytics

Page 8: Advanced analytics with R and SQL

R Usage GrowthRexer Data Miner Survey, 2007-2015

Language PopularityIEEE Spectrum Top Programming Languages, 2015

76% of analytic professionals report using R

36% select R as their primary tool

Page 9: Advanced analytics with R and SQL

• R is an open source (GNU) version of the S language developed by John Chambers et al. at Bell Labs in 80’s History of R

• R was initially written in early 1990’s by Robert Gentleman and Ross Ihaka then with the Statistics Department of the University of Auckland

• R is administered and controlled by the R Foundation

• Microsoft is founding member and Platinum Sponsor of R Consortium

R Reference Card from CRAN

Page 10: Advanced analytics with R and SQL

Open Source “lingua franca”

Analytics, Computing, Modeling

CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/

More packages on Github and BioConductor project

Page 11: Advanced analytics with R and SQL

Works With Open Source R

Enterprise Scale & Performance

– Scales from workstations to large clusters

– Scales to large data sizes

– Growing portfolio of Parallelized algorithms

Secure, Scalable R Deployment/Operationalization

Write Once Deploy Anywhere for multiple platforms

– RDBMS: SQL Server & TeraData

– Windows, Linux: RedHat & SUSE

– Hadoop: HortonWorks, Cloudera, MapR

– Cloud: AzureVMs, Azure HDInsight

R Tools for Visual Studio IDE

DeployRRTVS

R Open Microsoft R Server

Page 12: Advanced analytics with R and SQL

• Microsoft R Server for Redhat Linux

• Microsoft R Server for SUSE Linux

• Microsoft R Server for Teradata DB

• Microsoft R Server for Hadoop on Redhat

Microsoft R Server

Page 13: Advanced analytics with R and SQL

R Open Microsoft R Server

DeployRRTVS

ConnectR• High-speed & direct

connectors

Available for:• High-performance XDF

• SAS, SPSS, delimited & fixed format text data files

• Hadoop HDFS (text & XDF)

• Teradata Database & Aster

• EDWs and ADWs

• ODBC

ScaleR• Ready-to-Use high-performance

big data big analytics

• Fully-parallelized analytics

• Data prep & data distillation

• Descriptive statistics & statistical tests

• Range of predictive functions

• User tools for distributing customized R algorithms across nodes

• Wide data sets supported – thousands of variables

DistributedR• Distributed computing framework

• Delivers cross-platform portability

R+CRAN• Open source R interpreter

• R 3.1.2

• Freely-available huge range of R algorithms

• Algorithms callable by RevoR

• Embeddable in R scripts

• 100% Compatible with existing R scripts, functions and packages

Microsoft R Open• Based on open source R

• High-performance math library to speed up linear algebra functions

• Checkpoint package to easily share R code and replicate results using specific R package versions

DeployR• RESTful APIs for easy

integration from Java, JavaScript, .NET

• Enterprise authentication & security

• Horizontal scaling

R Tools for Visual Studio• State of the art, R Tools for Visual Studio IDE

Page 15: Advanced analytics with R and SQL

SQL + RIn-Database Advanced Analytics

Page 16: Advanced analytics with R and SQL

Relevant data available in real-time Ingest

All relevant data available in real-time Query

All relevant data available for analytics in real-time Analytics

These are 3 key ingredients to build an Intelligent Application

OperationalizeModelPrepare

Page 17: Advanced analytics with R and SQL

0100101010110

In-memory ColumnStore

In-memory OLTP

Real-time business problem

detection

HTAP with SQL Server 2016In-memory built-in

Missio

n critica

l OLT

PUp to 30x faster transactions with in-memory OLTP

Up to 100x faster analytical queries

Queries from minutes to seconds

Page 18: Advanced analytics with R and SQL

Demo: SQL + R

Page 19: Advanced analytics with R and SQL

Working from my R IDE on my workstation, I can execute an R script that runs in-database, and get the

results back.

Microsoft R Open

Microsoft R Server

R IDE

Data Scientist WorkstationSQL Server 2016

Script

Results

Execution1 2

3

sqlCompute <- RxInSqlServer()

rxSetComputeContext(sqlCompute)

linModObj <- rxLinMod()

Microsoft R Open

Microsoft R Server

Advanced Analytics

Extensions

Page 20: Advanced analytics with R and SQL

I can call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-

database. Results are then returned to my application (predictions, plots, etc).

Application

Call System Stored Procedure

Results: scores, plotsThe stored procedure

contains R code and

executes in-database.

1

3

exec sp_execute_external_script

@ languague = ‘R’

, @script =

-- R code --

SQL Server 2016

2

Microsoft R Open

Microsoft R Server

Advanced Analytics

Extensions

Page 21: Advanced analytics with R and SQL

Recap

Page 22: Advanced analytics with R and SQL

Operationalize R scripts and models

SQL Server 2016 extensibility

mechanism allows secure execution

of R scripts on the SQL Server

Use familiar T-SQL stored procedures

to invoke R scripts from your application.

Embed the returned predictions and

plots in your application.

Enterprise Performance and scale

Use SQL Server’s in-memory querying

and Columnstore Indexes

Leverage RevoScaleR support for large

datasets and parallel algorithms with SQL

Server 2016 Enterprise Edition.

Bring compute to data

with In-Database analytics

Page 23: Advanced analytics with R and SQL
Page 24: Advanced analytics with R and SQL

Microsoft R ServerBig-data analytics and distributed computing on Linux,

Hadoop and Teradata

SQL Server 2016

R ServicesBig-data analytics integrated with SQL Server database

Visual StudioR Tools for Visual Studio: integrated development

environment for R

R Sample ProgramsGithub repository of data and samples to learn capabilities

of Open Source R and Microsoft R Server

SQL Server 2016Learn about the full suite of capabilities in the latest version

of SQL Server

Page 25: Advanced analytics with R and SQL

Thank you