SAP HANA SPS08 Predictive Analysis Library

24
Use this title slide only with an image SAP HANA SPS 08 - What’s New? Predictive Analysis Library SAP HANA Product Management May, 2014 (Delta from SPS 07 to SPS 08)

description

SAP HANA SPS 08 - What’s New? Predictive Analysis Library

Transcript of SAP HANA SPS08 Predictive Analysis Library

Page 1: SAP HANA SPS08 Predictive Analysis Library

Use this title slide only with an image

SAP HANA SPS 08 - What’s New? Predictive Analysis Library

SAP HANA Product Management May, 2014

(Delta from SPS 07 to SPS 08)

Page 2: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 2Public

Agenda

Release Theme

List of Algorithms

New Algorithms• Distribution Fit

• Cumulative Distribution Function

• Quantile Function

• Random Distribution Sampling

• ARIMA

• FP-Growth

• CART

• K-Medoid Clustering

Enhancements

Documentation

Page 3: SAP HANA SPS08 Predictive Analysis Library

Release Theme

Page 4: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 4Public

HANA Predictive Analysis Library – What’s New in SPS 08?Release Theme

The SPS 08 version of the predictive Analysis Library includes many new algorithms as well as several enhancements to existing algorithms.

These new features were chosen based on the prioritization of customer and other stakeholder requests.

Page 5: SAP HANA SPS08 Predictive Analysis Library

List of Algorithms

Page 6: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 6Public

SAP HANA In-Memory Predictive Analytics Predictive Analysis Library (PAL) - Algorithms Supported

Association Analysis Apriori Apriori Lite FP-Growth *

Classification Analysis CART * C4.5 Decision Tree Analysis CHAID Decision Tree Analysis K Nearest Neighbour Logistic Regression Naïve Bayes Support Vector Machine

Regression Multiple Linear Regression Polynomial Regression Exponential Regression Bi-Variate Geometric Regression Bi-Variate Logarithmic

Regression

Outlier Detection Inter-Quartile Range Test

(Tukey’s Test) Variance Test Anomaly Detection

Statistic Functions (Univariate) Mean, Median, Variance,

Standard Deviation Kurtosis Skewness

Link Prediction Common Neighbors Jaccard’s Coefficient Adamic/Adar Katzβ

* New in SPS 08

Data Preparation Sampling

Random Distribution Sampling * Binning Scaling Partitioning

Statistic Functions (Multivariate) Covariance Matrix Pearson Correlations Matrix Chi-squared Tests:- Test of Quality of Fit- Test of Independence

F-test (variance equal test)

Other Weighted Scores Table Substitute Missing Values

Cluster Analysis ABC Classification DBSCAN K-Means K-Medoid Clustering * Kohonen Self Organized Maps Agglomerate Hierarchical Affinity Propagation

Time Series Analysis Single Exponential Smoothing Double Exponential Smoothing Triple Exponential Smoothing Forecast Smoothing ARIMA *

Probability Distribution Distribution Fit * Cumulative Distribution Function * Quantile Function *

Page 7: SAP HANA SPS08 Predictive Analysis Library

New Algorithms

Page 8: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 8Public

HANA Predictive Analysis Library – What’s New in SPS 08?Distribution Fit

Distribution fits aim to fit a probability distribution for a variable according to a series measurements to this variable.

In PAL, users need to choose one probability distribution type from a supporting list (Normal, Gamma, Weibull, and Uniform) and then PAL will calculate the optimized parameters of this probability distribution which fits the observed variable best.

There are two distribution fitting interfaces: DISTRFIT and DISTRFITCENSORED. DISTRFIT fits un-censored data while DISTRFITCENSORED fits censored data.

Two methods are provided for finding the optimized parameters, Maximum-Likelihood and Median-Rank. In SPS 08, Maximum-Likelihood method supports all distribution types in supporting list for un-censored data. Median-Rank method supports Weibull distribution fitting for both censored and un-censored data.

Page 9: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 9Public

HANA Predictive Analysis Library – What’s New in SPS 08?Cumulative Distribution Function

The cumulative distribution function in PAL evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.

Page 10: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 10Public

HANA Predictive Analysis Library – What’s New in SPS 08?Quantile Function

In PAL, quantile function evaluates the inverse F^(-1) (x) of cumulative distribution function (CDF) or the inverse F H (-1) (x) of complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.

Page 11: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 11Public

HANA Predictive Analysis Library – What’s New in SPS 08?Random Distribution Sampling

Random generation function with a given distribution (Normal, Gamma, Weibull, and Uniform).

Page 12: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 12Public

HANA Predictive Analysis Library – What’s New in SPS 08?ARIMA

Autoregressive integrated moving average (ARIMA) algorithm is famous in econometrics, statistics and time series analysis. An ARIMA model can be written as ARIMA (p, d, q), where p refers to the auto regressive order, d refers to integrated order and q refers to the moving average order. It can help understand the time series data better and predict future data in the series. Both training and forecast functions are provided.

Page 13: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 13Public

HANA Predictive Analysis Library – What’s New in SPS 08?FP-Growth

FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset. In PAL, FP-Growth algorithm is extended to find association rules. In the first step, the algorithm converts the transactions into a compressed frequent pattern tree (FP-Tree). In the second step, the algorithm recursively find frequent patterns from the FP-Tree. In the last step, the PAL generates association rules based on the frequent patterns that found in the second step.

Page 14: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 14Public

HANA Predictive Analysis Library – What’s New in SPS 08?CART

Classification And Regression Tree (CART) is invented by Breiman et al. (1984). It only supports binary split, and it can be used for classification or regression. CART is similar with C4.5, and it is a recursive partitioning method. It uses GINI index or TWOING for classification, and least square error for regression. Surrogate split method is used to support missing values when creating the tree model

Page 15: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 15Public

HANA Predictive Analysis Library – What’s New in SPS 08?K-Medoid Clustering

K-Medoid algorithm is a clustering algorithm related to the K-Means algorithm. Both K-Medoids and K-Means algorithms partition n observations into k clusters in which each observation is assigned to the cluster with the closest center. In contrast to K-Means algorithm, K-Medoids algorithm doesn’t calculate means, but medoids to be the new cluster centers. A medoid is defined as the center of a cluster, whose average dissimilarity to all the objects in the cluster is minimal. Compared to K-Means algorithm, it is said to be more robust to noise and outliers.

Page 16: SAP HANA SPS08 Predictive Analysis Library

Enhancements

Page 17: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 17Public

HANA Predictive Analysis Library – What’s New in SPS 08?Enhancements (1 of 2)

Logistic regression• Support cancellation at runtime.• Support multi-nomial classification. In many business scenarios we want to train a classifier with

more than two classes. Multi-class logistic regression (also referred to as multi-nomial logistic regression) extends binary logistic regression algorithm (two classes) to multi-class cases. The input and output of multi-class logistic regression are similar to that of logistic regression.

 K-Means

Determine best k given a range according to the slight Silhouette. 

Apriori• Add prefix tree implementation for potential performance improvement with regards to memory

consumption and time cost.• Add rule filter to define some items only allowed in the left-/right-hand side of the association rules

 

Page 18: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 18Public

HANA Predictive Analysis Library – What’s New in SPS 08?Enhancements (2 of 2)

Forecast SmoothingAuto-detect the best model among single/double/triple models 

Hierarchical clusteringSupport categorical attribute as input feature 

Univariate statistics• Support population variance and standard deviation• Calculate lower/upper quartile for the data

 Decision tree

Treat missing values as a separate class, not only to replace the NULL values

Page 19: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 19Public

Disclaimer

This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP.

SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice.

This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.

Page 20: SAP HANA SPS08 Predictive Analysis Library

Documentation

Page 21: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 21Public

Important Note

The SAP Note 2022080 has been created for missing EXECUTION privilege to call AFL_WRAPPER_GENERATOR/ERASER during HANA SPS 08 upgrade.

Page 22: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 22Public

How to find SAP HANA documentation on this topic?

SAP HANA Platform SPS What’s New – Release Notes

Installation– SAP HANA Server InstallationGuide

Administration– SAP HANA Administration Guide

Development– SAP HANA Predictive Analysis Library (PAL) Reference

– SAP HANA Developer Guide

References – SAP HANA SQL Reference

• In addition to this learning material, you find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform.

• The knowledge center is structured according to the product lifecycle: installation, security, administration, development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference in the Development section and so forth …

Page 23: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Thank youContact information

Mark HouraniSAP HANA Product [email protected]

To get the best overview of what’s new in SAP HANA SPS 08, read this blog.

Page 24: SAP HANA SPS08 Predictive Analysis Library

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 24Public

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.