Predictive Analytics

8
- Parag Shrungarkar Note: Viewpoint is based on SAP Predictive Analysis version 1.0.7 Predictive Analytics, also known as Advanced Analytics, is one of the most sought- after areas within BI space. It is a set of technologies that includes statistical analysis, data mining and advanced visualization. SAP’s Predictive Analysis tool offers advanced analytics features for traditional sources of data, but most importantly for SAP HANA. When used in combination with HANA, it can leverage the high-performance in-memory capabilities of HANA to provide powerful analytics and visualization at superfast speeds on Big Data. Predictive Analytics – Powered by SAP HANA An Overview VIEW POINT

description

predictive analytics

Transcript of Predictive Analytics

Page 1: Predictive Analytics

- Parag Shrungarkar

Note: Viewpoint is based on SAP Predictive Analysis version 1.0.7

Predictive Analytics, also known as Advanced Analytics, is one of the most sought-after areas within BI space. It is a set of technologies that includes statistical analysis, data mining and advanced visualization.

SAP’s Predictive Analysis tool offers advanced analytics features for traditional sources of data, but most importantly for SAP HANA. When used in combination with HANA, it can leverage the high-performance in-memory capabilities of HANA to provide powerful analytics and visualization at superfast speeds on Big Data.

Predictive Analytics – Powered by SAP HANAAn Overview

VIEW POINT

Page 2: Predictive Analytics

What is Predictive Analytics?Here are some of the popular definitions of Predictive Analytics:

Fig 1: Key Characteristics of Predictive Analytics

From the above definitions, we can draw the following key characteristics of Predictive Analytics, and its inferences.

High Data Volumes

Variety of Sources

Find insights and patterns

Predict outcomes

Business rules and models

Business specific parameters

Intuitiveness and visualizations

Lower IT support

Analysis of historical and current data

Predicting the “unknown”

Contextual to business domain

Ease of use

“Predictive analytics describes any approach to data mining with four attributes: an emphasis on prediction (rather than description, classification or clustering), rapid analysis measured in hours or days (rather than the stereotypical months of traditional data mining), an emphasis on the business relevance of the resulting insights (no ivory tower analyses) and (increasingly) an emphasis on ease of use, thus making the tools accessible to business users.”

Gartner1

“Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict trends and behaviour patterns. Often the unknown event of interest is in the future, but predictive analytics can be applied to any type of unknown whether it be in the past, present or future.”

Wikipedia2

“Predictive analytics is the technology that learns from data to make predictions about what each individual will do -- from thriving and donating to stealing and crashing your car.”

Eric Siegel author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die”.3

Page 3: Predictive Analytics

• Analysis of historical and current data – Predictive analytics typically deal with a variety of different source of data like transaction systems, data warehouse, external feeds, events, competitor data, etc. Combined with Big Data, it provides immense opportunities for advanced analytics on multi-structured and semi-structured data.

• Predicting the “unknown” – Although it relies on the historical and current data, the objective of predictive analytics is to predict the unknown. It involves mining and analysing existing data to arrive at

How is Predictive Analytics different from Traditional BI?Traditional BI and Predictive Analytics both rely on data created/generated from historical business events. However, there are fundamental differences between the two.

Optimized for answering questions which are already known.

Designed to discover unknown patterns, insights and outcomes.

Suited for tracking and monitoring historical and current business performance.

Primarily relies on internal and external sources of data generating business transactions (primarily structured data).

Ideal for predicting future outcomes based on learning gathered from historical and current data.

Relies on variety of internal and external data sources for business transactions, external events, real-time feeds, opinions, etc. (both structured and unstructured data).

Structures of data storage, intake, and dissemination need to be pre-de�ned based on known business requirements.

Statistical models need to be built by mining existing data and applying business rules to establish predictions.

Information presented using a mix of tabular data, charts, and visualizations.

Information analysed using advanced visualizations and visual discovery tools.

Uses a mix of purpose-built reports, dashboards and ad-hoc analysis.

Primarily ad-hoc analysis by business and domain experts.

Traditional BI Predictive Analytics

1

2

3

4

5

6

expected future outcome. Examples include predicting a business outcome, revenue or cost forecast, customer behaviour, equipment failure prediction, etc.

• Contextual to business domain – This is the heart (and brain) of predictive analytics. It involves applying statistical analysis and data mining on the dataset to create business models to be able to do predictions. One needs a strong combination of business/domain knowledge and statistical skills to define the business models with different

business rules and parameters. Examples include applying techniques such as clustering, classification, association, regression, etc. to create predictive business models.

• Ease of Use – Predictive analytics is predominantly a business tool used by analysts, data scientists, and advanced business users, in an iterative and exploratory approach. The very nature of the tool requires it to be highly intuitive, visual, easy to use, and business owned with minimal IT involvement.

Page 4: Predictive Analytics

Predictive Analysis Tool from SAPSAP has recently launched its own tool called “SAP Predictive Analysis” (PA). This tool is targeted for advanced analytics needs of business analysts, data scientists, and advanced business users. Currently it’s available as a desktop-based tool which combines capabilities of data mining, statistical modelling and visualization.

Typical process of performing predictive analysis is shown the below:

1. Organize Data – Data from various sources is extracted and organized in the required format. PA allows connecting to any data source such as Oracle, SQL Server, MySQL, CSV, Excel, BusinessObjects universes and SAP HANA. It also supports “Freehand SQL” where advanced users can directly write complex SQL statements for data access. PA shares its underpinnings with SAP Visual Intelligence Tool for data acquisition and manipulation. Allows users to view data in grid (row-column) format or as facets, where only unique values for column are shown. Users can add filters to data, hide and rearrange columns, and create time and geographical hierarchies. They can also create new measures or manipulate data in existing columns using the available formulas and functions.

a. Association – finding items that have affinity or association

b. Clustering – identifying groups or clusters

c. Time Series – for time based predictions

d. Regression – predicting continuous variables based on other variables in dataset

e. Decision trees – predicting continuous variables based on other variables in dataset

f. Outlier detection – detecting outliers in the data

g. Neural Network – for forecasting, classification and pattern recognition

Fig 2: Predictive Analysis execution steps

Organize Data

De�ne Model

Run and Analyze

Publish/Share

• Extract data from sources

• Examine and sample data

• Join, �lter, merge datasets

1 2 3 4

• De�ne the predictive model by applying business rules using analysis functions

• Execute the model

• Analyze results using Advanced Visualization

• Store model and results in database

• Access using BI tools

• Share with other predictive tools using PMML

the tool, HANA predictive library and using “R” open source library. Together it provides an extremely powerful collection of algorithm and statistical functions. Following types of algorithms are available:

2. Define Model – Once the data is in place, user can create analysis and build predictive models. Generally one would apply a combination of algorithms to analyse the data. PA supports three types of algorithm libraries – native library within

Page 5: Predictive Analytics

Predictive Analysis tool can work as a standalone desktop tool connecting with various sources of data and run “In-Process” analysis within the tool itself. However the real differentiator of PA tool is its ability to work with HANA as a source of data and run complex and heavy algorithms “In-DB” on HANA in-memory platform.

There are many frequently-used algorithms available on PA tool and some of them are also available on HANA PA library (PAL). Some advanced algorithms like Neural Networks, Linear Programming, Monte Carlo Method, etc. are not yet available. However, PA tool can be integrated with “R” open source language to get access to thousands of predictive algorithms from “R” library.

The diagram below depicts the “In-Process” and “In-DB” modes of usage:

Fig 3: PA tool – usage modes

SAP Predictive Analysis (Desktop)

Built-in Algorithms

SBusiness Suite BI Tools

R Open Source Library

Data

SAP Predictive Analysis (Desktop)

Built-in Algorithms

SBusiness Suite BI Tools

R Open Source Library

Data HANA

PAL

In-Process (Standalone PA Tool)

In-DB (PA tool with HANA)

3. Run and Analyze – Once the predictive model is defined, next step is to execute on the dataset and review results. For complex algorithm and high-data volumes, running on a desktop (In-Process mode) can be limiting. Integration with HANA makes a big differentiator here with its ability to handle large data volumes and complex processing on its in-memory platform.

PA also provides powerful advanced

visualizations similar to SAP’s Visual Intelligence Tool. These are highly intuitive drag-and-drop type charts targeted for business and analyst community. Some examples of visualization include

• Scatter Matrix Chart

• Parallel Coordinates Chart

• Decision Tree

• Regression Tree

• Time Series Chart

• Cluster Chart, etc.

4. Publish / Share – Final step of the analysis process is to publish and share the results with larger audience. The results of analysis can be stored in any database, including HANA, and can be integrated into BI tools for reports and dashboards.

One other useful feature of PA is to be able to save and share analysis models with other predictive tools using Predictive Model Mark-up Language (PMML).

Page 6: Predictive Analytics

Predictive Analysis is a powerful tool with lots of features however there are still some gaps.

1. Some advanced predictive algorithms Neural Networks, Linear Programming, Monte Carlo Method, etc. are not yet available.

2. HANA Predictive Analysis Library (PAL) has only a subset of predictive algorithms available.

3. PA tool is currently only a desktop tool and does not have a web-based or application server based deployment.

4. Analysis models need to be run as standalone jobs and cannot be scheduled or queued.

5. Limited integration with other BusinessObjects tools and SAP Business Suite applications.

Some Limitations...

Page 7: Predictive Analytics

References1. Source Gartner - IT Glossary http://www.gartner.com/it-glossary/predictive-analytics/

2. Source Wikipedia - Predictive Analytics http://en.wikipedia.org/wiki/Predictive_analytics#Definition

3. Source Huffington Post http://www.huffingtonpost.com/phil-simon/predictive-analytics_b_2802994.html

Author Profile

Summary

Parag ShrungarkarPrincipal Technology Architect, Information Transformation, Manufacturing

Parag leads the Center of Excellence (CoE) for SAP HANA, BusinessObjects and Mobile BI for Manufacturing vertical within Infosys, and is responsible for providing thought leadership, consulting, architecture, IP development, and pre-sales support. His current areas of focus include conceptualizing and developing solutions using HANA, Advanced Visualizations, Mobile BI and Predictive Analytics.

Parag has been working in IT space for over 16 years in the areas ranging from BI/DW to Portals to Web Applications and System Administration. He has led and delivered several data warehouse/ BI projects for clients in auto, high-tech, semicon, utilities, and telecom industries.

Predictive Analytics technology enables business analysts and data scientists to extract, organize and analyze large sets of structured and unstructured data, and predict business outcomes through statistical modeling, data mining and advanced visualization.

SAP’s Predictive Analysis Tool provides powerful options through a combination of native algorithms, integration with HANA predictive library and support for open source “R” language.

Predictive Analysis tool can work in a standalone “In-Process” mode or integrate with HANA and run complex and heavy algorithms “In-DB” on HANA in-memory platform.

Page 8: Predictive Analytics

© 2013 Infosys Limited, Bangalore, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.

About InfosysInfosys is a global leader in consulting, technology and outsourcing solutions. We enable clients, in more than 30 countries, to stay a step ahead of emerging business trends and outperform the competition. We help them transform and thrive in a changing world by co-creating breakthrough solutions that combine strategic insights and execution excellence.

Visit www.infosys.com to see how Infosys (NYSE: INFY), with $7.4B in annual revenues and 155,000+ employees, is Building Tomorrow's Enterprise® today.

For more information, contact [email protected] www.infosys.com