Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習

Azure Machine Learning

– 其他篇

台灣微軟

技術傳教士

吳宏彬

8/25/2016

什麼是R語言

Open Source “lingua franca”

Analytics, Computing, Modeling

Global Community

Millions of users 7000+ Algorithms, Test Data & Evaluations

Can be Scaled to Big Data,

Big Analytics

Ecosystem

Scalability

Polls of data miners and analytics professionals on their software

choices since 2007Source: http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.html

R is developed and contributed by open source community

CRAN – the Comprehensive R Archive Network Package repository of R

7500+ packages, covering all aspects of statistical analysis, machine learning, natural language processing …

Still exponentially growth

Free!

Source: http://r4stats.com/2014/04/07/r-continues-its-rapid-growth/

1.Seasonal ARIMA

2.Non Seasonal

ARIMA

3.Seasonal ETS

4.Non -Seasonal ETS

5.Average of Seasonal

ETS and Seasonal

ARIMA

Mean Error (ME) - Average forecasting error (an error is the difference between the

predicted value and the actual value) on the test dataset

Root Mean Squared Error (RMSE) - The square root of the average of squared errors of

predictions made on the test dataset.

Mean Absolute Error (MAE) - The average of absolute errors

Mean Percentage Error (MPE) - The average of percentage errors

Mean Absolute Percentage Error (MAPE) - The average of absolute percentage errors

Mean Absolute Scaled Error (MASE)

Symmetric Mean Absolute Percentage Error (sMAPE)

DatasizeIn-memory

In-memory In-Memory or Disk Based

Speed of

AnalysisSingle threaded Multi-threaded

Multi-threaded, parallel

processing 1:N servers

SupportCommunity Community Community + Commercial

Analytic

Breadth &

Depth

7500+ innovative analytic

packages7500+ innovative analytic

packages

7500+ innovative packages

+ commercial parallel high-

speed functions

LicenseOpen Source

Open Source

Commercial license.

Supported release with

indemnity

Microsoft

R Open

Microsoft

R Server

Support standard Python library types such as

Pandas data frames and NumPy arrays.

Execute the Python code is based on Anaconda

2.1, It comes with close to 200 of the most

common Python packages (as NumPy, SciPy and

Scikits-Learn )

Output generate images from MatplotLib

21

What is Spark?

Data is growing faster than processing

speeds

Only solution is to parallelize data

processing on large clusters

Example: HDInsight

Fast, expressive cluster computing system compatible with Apache

Hadoop

• Works with any Hadoop-supported storage system (HDFS, S3, Avro, …)

Improves efficiency through:

• In-memory computing primitives

• General computation graphs

Improves usability through:

• Rich APIs in Java, Scala, Python

• Interactive shell

Spark was initially started by Matei Zaharia at UC Berkeley AMPLab

in 2009, was open sourced in 2010 and donated to Apache in 2013

Up to 100× faster

Often 2-10× less code

What is Spark?

Spark for Azure HDInsight

Spark Node

Spark Node

Spark Node

Spark Node

Spark Node

Storage Layer

Decision Maker

Decision Maker

Decision

Maker

Spark Cluster

clients

Spark Notebooks

Using the Spark shell to run

interactive queries

Using the Spark shell to run Spark

SQL queries

Using a standalone Scala program

Spark

Notebooks

Zeppelin – for

Scala users

Zupyter – for

Python users

Programming

Spark

2015 System

Human Error Rate 4%

Speech Recognition could reach human parity in the next 3 years

Microsoft 透過深度學習技術贏得 ImageNet 2015所有比賽項目冠軍

28.225.8

16.4

11.7

7.3 6.73.5

ILSVRC 2010NEC

America

ILSVRC 2011Xerox

ILSVRC 2012AlexNet

ILSVRC 2013Clarifi

ILSVRC 2014VGG

ILSVRC 2014GoogleNet

ILSVRC 2015MSRA

ResNet

ImageNet Classification top-5 error (%)

Microsoft had all 5 entries being the 1-st places this year: ImageNet

classification, ImageNet localization, ImageNet detection, COCO

detection, and COCO segmentation

CNTK At the Heart: Computational Networks

•A generalization of machine learning models that can be described as a series of computational steps.

• E.g., DNN, CNN, RNN, LSTM, DSSM, Seq2Sqe, Log-linear model

•Representation: • A list of computational nodes denoted as

n = {node name : operation name}

• The parent-children relationship describing the operands

{n : c1, · · · , cKn }• Kn is the number of children of node n. For leaf nodes Kn = 0.

• Order of the children matters: e.g., XY is different from YX

• Given the inputs (operands) the value of the node can be computed.

•Can flexibly describe deep learning models. • Adopted by many other popular tools as well

35

36

•A generalization of machine learning models that can be described as a series of computational steps.

• E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model

•Representation: • A list of computational nodes denoted as

n = {node name : operation name}

• The parent-children relationship describing the operands

{n : c1, · · · , cKn }• Kn is the number of children of node n. For leaf nodes Kn = 0.

• Order of the children matters: e.g., XY is different from YX

• Given the inputs (operands) the value of the node can be computed.

•Can flexibly describe deep learning models. • Adopted by many other popular tools as well

“CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.”

Theano only supports 1 GPU

Achieved with 1-bit gradient quantizationalgorithm

0

10000

20000

30000

40000

50000

60000

70000

80000

CNTK Theano TensorFlow Torch 7 Caffe

speed comparison (samples/second), higher = better[note: December 2015]

1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs)

* TensorFlow add distributed compute support in April 2016

Micrsoft Reacher SLAWEKSMYL win in CIF 2016 byusing LSTM Neural Network

Powered by CNTK

CIF Competition 2016 – Final Results• Contestant 1 – Slawek Smyl (LSTM-based

NN on deseasonalized data)

• Contestant 2 – Slawek Smyl (weighted average of my 3 methods)

• Contestant 3 – prof. Sven Crone (Multilayer Perceptron with a thorough feature search)

• Contestant 4 - Mikhail Artyukhov (previous competition winner, ensemble models)

• Contestant 5 - Joerg Wichard, Bayer Healthcare AG (Adaptive Forecasting Strategy with Hybrid Ensemble Models)

• Contestant 6 – Slawek Smyl (LSTM-based NN)

CNTK Demo

CNTK Architecture

41

CNBuilder

LambdaCN

Description Use Build

ILearnerIDataReaderFeatures &

Labels Load Get data

IExecutionEngine

CPU/GPU

Task-specific

reader

SGD, AdaGrad,

etc.

Evaluate

Compute Gradient

(1) Kai Chen and Qiang Huo, “Scalable training of deep learning machines by incremental block training with intra-block

parallel optimization and blockwise model-update filtering”,

in Internal Conference on Acoustics, Speech and Signal Processing , March 2016, Shanghai, China.

CNTK is a powerful tool that supports CPU/GPU and runs under Windows/Linux

CNTK is extensible with the low-coupling modular design: adding new readers and new computation nodes is easy with a new reader design

Network definition language, macros, and model editing language (as well as Python and C++ bindings in the future) makes network design and modification easy

Compared to other tools CNTK has a great balance between efficiency, performance, and flexibility

microsoft.com/cognitive

Mahout Spark ML Azure ML R Server

Shared Service No No Yes No

Deployment Model PaaS PaaS PaaS IaaS

Extensibility High High Medium High

Deployment Complexity Medium High Low Medium

Cost High High Low High

Programming Languages Java/Scala Scala/Java/Python Python/R R

Algorithms Limited (growing) MLlib/scikit Many (scikit/CRAN) Many (CRAN)

Scalability High High Medium Medium

xgboost Vowpal Wabbit

Rattle

CNTK

*Copy

https://portal.azure.com/#create/microsoft-ads.standard-data-science-vmstandard-data-science-vm

雲端隨選隨用各式資料快速上線服務資料分享跟協同合作

開放支援完整資料分析流程

https://gallery.cortanaintelligence.com/

唯一一家提供從資料匯入到產生行動及資料呈現完整的解決方案

ConnectR• High-speed & direct

connectors

Available for:• High-performance XDF

• SAS, SPSS, delimited & fixed format text data files

• Hadoop HDFS (text & XDF)

• Teradata Database & Aster

• EDWs and ADWs

• ODBC

ScaleR• Ready-to-Use high-performance

big data big analytics

• Fully-parallelized analytics

• Data prep & data distillation

• Descriptive statistics & statistical tests

• Range of predictive functions

• User tools for distributing customized R algorithms across nodes

DistributedR• Distributed computing framework

• Delivers cross-platform portability

Available on:

• Windows Servers

• Red Hat and SuSE Linux Servers

• Teradata Database

• Cloudera Hadoop

• Hortonworks Hadoop

• MapR Hadoop

R+CRAN• Open source R interpreter

• R 3.2.2

• Freely-available huge range of R algorithms

• Algorithms callable by RevoR

• 100% Compatible with existing R scripts, functions and packages

RevoR• Performance enhanced R

interpreter

• Based on open source R

• Adds high-performance math library to speed up linear algebra functions

R Open Microsoft R Server

DeployRDevelopR

Gradient Boosted Decision Trees

Naïve Bayes

Data import – Delimited, Fixed, SAS, SPSS,

OBDC

Variable creation & transformation

Recode variables

Factor variables

Missing value handling

Sort, Merge, Split

Aggregate by category (means, sums)

Min / Max, Mean, Median (approx.)

Quantiles (approx.)

Standard Deviation

Variance

Correlation

Covariance

Sum of Squares (cross product matrix for set

variables)

Pairwise Cross tabs

Risk Ratio & Odds Ratio

Cross-Tabulation of Data (standard tables & long

form)

Marginal Summaries of Cross Tabulations

Chi Square Test

Kendall Rank Correlation

Fisher’s Exact Test

Student’s t-Test

Subsample (observations & variables)

Random Sampling

Data Step Statistical Tests

Sampling

Descriptive Statistics Sum of Squares (cross product matrix for set

variables)

Multiple Linear Regression

Generalized Linear Models (GLM) exponential

family distributions: binomial, Gaussian, inverse

Gaussian, Poisson, Tweedie. Standard link

functions: cauchit, identity, log, logit, probit. User

defined distributions & link functions.

Covariance & Correlation Matrices

Logistic Regression

Classification & Regression Trees

Predictions/scoring for models

Residuals for all models

Predictive Models K-Means

Decision Trees

Decision Forests

Cluster Analysis

Classification

Simulation

Variable Selection

Stepwise Regression

Simulation (e.g. Monte Carlo)

Parallel Random Number Generation

Combination rxDataStep

rxExec

PEMA-R API Custom Algorithms

Additional Resources

• CNTK: • https://github.com/Microsoft/CNTK

• Contains all the source code and example setups

• You may understand better how CNTK works by reading the source code

• New features are added constantly

• How to contact:• CNTK team: ask a question on CNTK GitHub!

• Alexey: • Email: [email protected]

• : https://www.linkedin.com/in/alexeykamenev

59

https://github.com/Microsoft/CNTK

mailto:[email protected]

https://www.linkedin.com/in/alexeykamenev

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習

Data & Analytics

Transcript of Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習