Microsoft Data Science Technologies 201608

52
Microsoft Technologies for Data Science Mark Tabladillo, Ph.D. Solution Architect (Data Scientist) Microsoft August 2016: SQL Saturday Columbus GA

Transcript of Microsoft Data Science Technologies 201608

Microsoft Technologies for Data Science

Mark Tabladillo, Ph.D.

Solution Architect (Data Scientist)

Microsoft

August 2016: SQL Saturday Columbus GA

Networking

Interactive

Terms Definition

Data Science

Machine Learning

Data Mining

Applied Statistics

the automated or semi-

automated process of

discovering patterns in

data

Applied scientific method

http://www.kdnuggets.com/polls/2015/analytics-

data-mining-data-science-software-used.html

http://products.office.com/en-us/excel

http://www.microsoft.com/en-us/server-cloud/products/sql-server/

http://pytools.codeplex.com/

http://azure.microsoft.com/en-us/services/hdinsight/

http://www.revolutionanalytics.com/

Technology Choices

SQL SERVER ANALYSIS SERVICES Enterprise

Business Intelligence

EXCEL ADD-IN FOR SSAS Office 365

Office 2013 or Higher x64

SEMANTIC SEARCH Enterprise

Business Intelligence

Standard

Web

Express with Advanced Services

MICROSOFT AZURE ML Free (Size Limited)

Paid (Web Service): Experiment + Query

F# Open Source

SQL SERVER R SERVICES SQL Server 2016 or higher

http://download.microsoft.com/download/F/C/2/FC21C981-

4351-4434-A78A-

3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D

ata_White_Paper.pdf

SS

SQL

AS

NoSQL

Data mining add-in for business analysts

• Ease of use

• Rich data mining

• Scalable

Rowset

Output

with Scores

Varchar

NVarchar

Office

PDF

Documents

Full-Text

Keyword

Index

“FTI”

iFilters

Semantic Document

Similarity Index “DSI”

Semantic

Database

Semantic

Key Phrase

Index –

Tag Index

“TI”

Simplified Chinese

British English

Portuguese

Chinese (Hong Kong SAR, PRC)

Spanish

Chinese (Singapore)

Chinese (Macau SAR)

Time in Seconds vs. Number of Documents

(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)

http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf

FeaturesMicrosoft R Open

R Distribution (Free)

Microsoft R Client

Free

Microsoft R Server

Commercial

Big Data

In-memory bound

Can only process datasets that fit

into the available memory

In-memory bound

Can process datasets that fit into the available

memory

Operates on large volumes when connected

to R Server

Disk scalability

Operates on bigger volumes &

factors

Speed of

Analysis

Multi-threaded when MKL is

installed for non-ScaleR functions

Multi-threaded with MKL for non-ScaleR

functions

Up to 2 threads for ScaleR functions with a

local compute context

Full parallel threading &

processing

Enterprise

ReadinessCommunity support Community support Commercial support

Analytic

Breadth

& Depth

8000+ open source packagesLeverage & optimize open source R packages

plus 'Big Data'-ready ScaleR packages

Leverage & optimize open source

R packages plus 'Big Data'-ready

+ Multithreaded ready ScaleR

packages

Commercial

Viability

Risk of deployment to open

sourceFree for everyone Commercial licenses

DeployR

EnterpriseNot available Not available Included

Microsoft R Server Editions Description Install ScaleR Get Started

R Server for Hadoop

Scale your analysis transparently

by distributing work across

nodes without complex

programming

Doc Doc

R Server for Teradata DB

Run advanced analytics in-

database for seamless data

analysis

Doc Doc

R Server for Linux

Bring predictive and prescriptive

analytics power to your Linux

environments

Doc Doc

http://datacamp.com

Mutable Immutable

Classic Open

Source

Java Scala

.NETNow Open Source

C#, C++,

VB.NET

F#

https://www.microsoft.com/en-us/cloud-platform/what-is-cortana-intelligence-suite

Capabilities Products

Preconfigured solutions •Business scenarios •Forecasting, churn, etc.

Intelligence

•Integration with Cortana

•Bot services

•Cognitive services

•Cortana

•Bot Framework

•Cognitive Services

Dashboards and visualizations •Dashboards and visualizations •Power BI

Machine learning and advanced

analytics

•Machine learning

•Hadoop

•Distributed analytics

•Complex event processing

•Machine Learning

•HDInsight (Data Lake service)

•Data Lake analytics

•Stream Analytics

Big data stores•Big Data repository

•Elastic data warehouse

•Data Lake store, Blobs

•SQL Data Warehouse

Information management

•Data orchestration

•Data catalog

•Event ingestion

•Data Factory

•Data catalog

•Event Hubs

https://github.com/jakevdp/sklearn_pycon2015

http://www.bing.com/explore/predicts

https://techcrunch.com/2016/07/07/microsoft-now-helps-businesses-use-the-data-that-powers-bing-predicts/

https://academy.microsoft.com/en-US/professional-degree/data-science/

https://borntolearn.mslearn.net/b/weblog/posts/announcing-the-microsoft-professional-degree-mpd-program

http://www.kdnuggets.com/2015/09/free-data-science-books.html

https://channel9.msdn.com/Blogs/Windows-Azure

https://mva.microsoft.com/

http://blogs.technet.com/b/machinelearning/

http://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning

http://sqlserverdatamining.com

http://marktab.net

http://curah.microsoft.com/342704/azure-machine-learning-videos-february-2015

http://datascience.sqlpass.org/

https://www.youtube.com/channel/UCqB3xWdwjA9soFV6EOu7qfg