KNIME Meetup 2016-04-16

88

Transcript of KNIME Meetup 2016-04-16

Page 1: KNIME Meetup 2016-04-16
Page 2: KNIME Meetup 2016-04-16

Creating Insights

at the

Speed of Business

W. Daniel Cox, III CPA, CMA, CFM

Chief Executive Officer

Page 3: KNIME Meetup 2016-04-16

WELCOMEto

Meet Up Group

Page 4: KNIME Meetup 2016-04-16

Energise Organisational

Advantage through

Awareness and Insight

Registration & Networking

Keynote – Dan Cox, CEO of Data Transformed

KNIME & Harvest Analytics – Tom Park

Office of State Revenue Case Study – Anand Antony

Using Spark with KNIME – Chhitesh Shrestha

Networking & Drinks

Page 5: KNIME Meetup 2016-04-16

Journey to Best in Class AnalyticsWe Help our Clients along this Path

Time

Value

ProactiveDiscover and

Predict Performers

ReactiveMonitor and Alert FollowersStatic

Report and Drill-down

Laggards

DynamicAnalytics-enabled

business processes

Innovators

Page 6: KNIME Meetup 2016-04-16

YOUR DATA. CLEARLY

Source

Your

Data

Realise

Data

Value

Prepare

Your

Data

Data Preparation

Plan

With

Data

Budget/Planning

Visualise

All

Data

Visualisation

Page 7: KNIME Meetup 2016-04-16

BUDGET PLANNING Budgeting

Forecasting

Planning

Demand Planning

Workforce Management

Accounting

Financing

Cashflow

Sales Forecasting

Modelling

Campaign Forecasting

DATA PREPARATION

Data GovernanceData QualityMaster Data ManagementData WarehousingData ScienceETL ApplicationsData AnalyticsSQL LanguagePython LanguageScriptingDatabase ManagementApplication DevelopmentDatabase DevelopmentTextual ETLText AnalyticsHadoop EcosphereAnalytical DatabasesRelational DatabasesMicrosoft Analysis ServerOLAPOLTPMulti-Dimensional DatabasesData Vault ArchitecturesStar-Schema ArchitecturesData Marting

Data Transformed Skill Sets

VISUALISATION30%

BUDGET PLANNING

20%

DATA PREPARATION

50%

VISUALISATIONDashboarding

Reporting

Charting

Location Analytics

Statistical Analytics

Data Analytics

Business Analysis

Story Telling

Symmantic Layer

Presentation Layer

Collabration

Page 8: KNIME Meetup 2016-04-16

Slow Fast

Immature

IndustrialStrength

Ente

rpri

se R

ead

ine

ss

Performance

Good Enough

ProductionReady

TraditionalOperational

Open Source

Vortex

Actian – Fast, Industrialized, Open

Superior Big Data SQL with Industrialized strength

Page 9: KNIME Meetup 2016-04-16

Do YOU

Have a

BIG DATA Role

Page 10: KNIME Meetup 2016-04-16

Global Data Snapshot

7,254,549,796Total World Population

3,035,749,340Internet Users

2,078,680,860Active Social Network Users

6,572,950,124Mobile Subscribers

Page 11: KNIME Meetup 2016-04-16

• Challenges• Constrains data to app

• Can’t manage new data

• Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012

2.8 Zettabytes

2020

44 Zettabytes

LAGGARDS

INDUSTRY

LEADERS

1

2New Data

ERP CRM SCM

New

Traditional

Traditional systems under pressure

12 Zettabytes

Page 12: KNIME Meetup 2016-04-16

Volume Exponential Growth

Variety New Data Types

Velocity Time To Value

The Digital Floodgates have opened…

and will never be turned off again

Page 13: KNIME Meetup 2016-04-16

Big Data equals Big Opportunity

Data Source & Type Untouched

Value New Possibilities

88OF BIG DATA

15TRILLION

$

Universal Access Time To Value

OF COMPANIES

%

%

1

Page 14: KNIME Meetup 2016-04-16

Trends for BIG DATA

In the Cloud

Page 15: KNIME Meetup 2016-04-16

Trends for BIG DATA

Personal ETL

Page 16: KNIME Meetup 2016-04-16

Trends for BIG DATA

NoSQL

Page 17: KNIME Meetup 2016-04-16

Trends for BIG DATA

Hadoop

Page 18: KNIME Meetup 2016-04-16

Trends for BIG DATA

Data Lake

Page 19: KNIME Meetup 2016-04-16

Trends for BIG DATA

Ecosystem

Page 20: KNIME Meetup 2016-04-16

Trends for BIG DATA

Internet of

Things

Page 21: KNIME Meetup 2016-04-16

Big Data Trends

1. Big Data in the Cloud

2. Personal ETL

3. NoSQL

4. Hadoop

5. Data Lakes

6. Big Data Ecosystem

7. Internet of Things

Page 22: KNIME Meetup 2016-04-16

BIG DATA

is STILL just

Data

It needs to be translated into Answers

Page 23: KNIME Meetup 2016-04-16

Acquire, Grow & Retain Customers

Who are your best customers

and how can you keep them

satisfied?

Where can you find more

customers like them?

Big data holds the insights into

who your customers are and

what motivates them.

Page 24: KNIME Meetup 2016-04-16

Optimise Operations & Reduce Fraud

Are your operational processes

and systems as efficient as

they could be?

Could you reduce waste and

fraud if you had real-time

visibility into your business?

Adopting a big data and analytics

strategy can help you plan,

manage and maximise

operations, supply chains and the

use of infrastructure assets.

Page 25: KNIME Meetup 2016-04-16

Transform Financial Processes

Do you have real-time access

to reliable information about all

aspects of your business?

Do you have the visibility,

insight and control over

financial performance to better

measure, monitor and shape

business outcomes?

Analysing all of your data,

including big data, can drive

enterprise agility and provide

insights to help you make better

decisions

Page 26: KNIME Meetup 2016-04-16

Manage Risk

How can you mitigate the

financial and operational risks

that could devastate your

organisation?

How can you manage

regulatory change and reduce

the risk of non-compliance?

Proactively identifying,

understanding and managing

financial and operational risk can

enable more risk-aware,

confident decision making

Page 27: KNIME Meetup 2016-04-16

Create New Business Models

Are your competitors making

bigger strides in changing your

industry or creating new markets

than you?

Does your organisation’s culture

support innovative thinking and

exploration?

Explore strategic options for

business growth, using new

perspectives gained from exploiting

big data and analytics

Page 28: KNIME Meetup 2016-04-16

Improve IT Economics

Is your existing IT infrastructure

able to provide the insights that

decision makers need?

Are you doing enough to protect

your data centre and data from

potential criminal activity or

fraud?

Lead the creation of new value

and agility for your business by

optimising big data and analytics

for faster insight at a lower cost

Page 29: KNIME Meetup 2016-04-16

Analytics Trends

1. Data Governance

2. Social Intelligence

3. Analytics Organisation-Wide

4. Community Collaboration

5. Integration of Everything

6. Cloud Analytics

7. Conversational Data

8. Journalism Data

9. Mature Mobility

10.Smart Analytics

Page 30: KNIME Meetup 2016-04-16

Areas BIG DATA is Helping

1. Operations & Optimising

2. Product Development

3. Customer Experience

4. Understanding and Targeting Customers

Page 31: KNIME Meetup 2016-04-16

Performance Examples

Actian is Helping These Companies Achieve Leadership

Digital Marketing: Hyper-segmentation every hour

Banking: Enterprise Risk every 2 minutes

Retail: Enterprise Market Basket Analysis every minute

Defense: Network intrusion models every second

Fraud: Adjustments every nano-second

Amazon Redshift – Actian Matrix Cloud-based, Petabyte Scale Data Warehouse

Page 32: KNIME Meetup 2016-04-16

The Value of Business Intelligence

Organisations competing with Analytics

Substantially OUTPERFORMtheir peers by

220%

Page 33: KNIME Meetup 2016-04-16

Data Transformed

Page 34: KNIME Meetup 2016-04-16

Actian Vector: Example

https://youtu.be/dYTF5ZNioEI

Identical 150 Million Transaction Query

Comparison between Actian Vector & Oracle DBMS

Page 35: KNIME Meetup 2016-04-16

Harvest Analytics

Tom Park

Page 36: KNIME Meetup 2016-04-16

Overview KNIME & Big Data

Tom Park

Page 37: KNIME Meetup 2016-04-16

Gartner 2016 Magic Quadrant Advanced Analytics Platforms

Niche Players (5):FICOLavastormMegaputerPrognozAccenture

Leaders (5):SAS IBM KNIME RapidMinerDell

Visionaries (4):Microsoft Alteryx Alpine Data Labs Predixion

Challengers (2):SAP Angoss

Page 38: KNIME Meetup 2016-04-16

Changes from 2015 to 2016

X Salford & TIBCODropped due to not satisfying the visual composition

Page 39: KNIME Meetup 2016-04-16

Main Big Data Technologies

NO SQL

Page 40: KNIME Meetup 2016-04-16

Big Data Architecture

Page 41: KNIME Meetup 2016-04-16

KNIME Big Data Extensions

Page 42: KNIME Meetup 2016-04-16

Future Trends

Page 43: KNIME Meetup 2016-04-16

Missing Ingredient to Success?

Page 44: KNIME Meetup 2016-04-16

www.dataroos.com

Page 45: KNIME Meetup 2016-04-16
Page 46: KNIME Meetup 2016-04-16

Office of State

Revenue

Anand Antony

Page 47: KNIME Meetup 2016-04-16

KNIME @ OSRAnand Antony

Senior Data AnalystOperations Analytics and Intelligence

Office of State Revenue

[email protected]. 0414491765

Page 48: KNIME Meetup 2016-04-16

OSR: Who are we?

As NSW’s principal revenue agency, OSR administers state taxation and revenue for, and on behalf of, the people of NSW

◦ Payroll tax

◦ Land tax

◦ Duties

◦ Grants such as First Home Benefits

Page 49: KNIME Meetup 2016-04-16

Data Analytics Team: Who are we?

Operations Analytics & Intelligence is the analytics wing of the Operations Division in OSR◦ Three teams – Business Intelligence, Data Analytics and

Data Team

Data Analytics team consists of 10 analysts

Supports tax auditors by detecting possible non-compliant clients◦ Via matching data from various sources and analysing

them

◦ 60+ data sources

Page 50: KNIME Meetup 2016-04-16

Data Analytics Scenario - Past

Data matching, preparation and analysis◦ SPSS Clementine, SAS Enterprise Guide

Data mining◦ Salford Systems

Reporting/Dashboards◦ Excel

Fuzzy data matching◦ SSA Name (Informatica)

Page 51: KNIME Meetup 2016-04-16

Data Analytics Scenario - Current

Data matching, preparation and analysis◦ KNIME (around 70% transitioned from

Clementine/SAS)

Data mining◦ Salford Systems◦ Will be evaluating KNIME

Reporting/Dashboards◦ Excel

Fuzzy data matching◦ SSA Name (Informatica)

Page 52: KNIME Meetup 2016-04-16

Inte

rnal &

Ext

ern

al D

ata

Sourc

es

Data Governance

Data Quality

Data Matching

MetadataManagement

MapR Hadoop Distribution

Data Lake

VortexMapR

Advanced Data AnalyticsActian/Knime

Machine LearningH2O/ Spark

Actian/Knime

Future: Unified Analytic & Data Management Platform

Governance

Visualisation

Presentation Layer

Datamart

On the fly / Sandpit

Spotfire/Tableau/

Graph DBs

Page 53: KNIME Meetup 2016-04-16

Why KNIME?

Enrich with coding via coding snippets◦ Mostly Java snippet at the moment

Start with canvas programming

Fast and easy learning curve for data scientists

Can tackle almost any analytic task

Page 54: KNIME Meetup 2016-04-16

KNIME - Having the best of both worlds!

◦ Canvas programming Coding

Page 55: KNIME Meetup 2016-04-16

What do we use KNIME for?

Pretty much for everything! (except reporting and datamining)◦ Data reading (text files, databases, non-standard formats)

◦ Data merging (potentially fuzzy matching too in future)

◦ Data manipulation

◦ Creating new variables

◦ Data Output

◦ Modelling (possibly in future)

Page 56: KNIME Meetup 2016-04-16

Key nodes/functionalities

◦ Sorter, Column Reorder, Column Filter, Column Rename

◦ Concatenate, Joiner, Reference Row Filter (anti-join)

◦ Missing value◦ Math Formula, String Manipulation, Rule Engine, Java Snippet

◦ GroupBy (aggregate, dedupe)◦ Value Counter, Pivoting◦ Looping◦ Regular expressions/wildcards in various nodes

Page 57: KNIME Meetup 2016-04-16

Data Preparation Example

Page 58: KNIME Meetup 2016-04-16

Case study 1

Officers fill in a questionnaire on the entity audited – one excel spreadsheet for one entity

Collate all the spreadsheets stored in a location

Massage the data to produce an analysis dataset with one row per entity

Key KNIME nodes/functionalities used◦ List files

◦ Table Row to Variable Loop Start, Loop End

◦ Java Snippet

Page 59: KNIME Meetup 2016-04-16

Questionnaire data for one client

Page 60: KNIME Meetup 2016-04-16

Overview of Knime flow

Page 61: KNIME Meetup 2016-04-16

Bring data to tabular form

Within this Meta node, there is one Java Snippet for each question in the questionnaire

Page 62: KNIME Meetup 2016-04-16

Details of a Java Snippet

Page 63: KNIME Meetup 2016-04-16

Result of the Meta Node

To get a single record for a client- Just take the last row for a “client

block”!- Explained in the next slide

Page 64: KNIME Meetup 2016-04-16

For each “client block” aggregate the variables

Page 65: KNIME Meetup 2016-04-16

End result

1000 spread-sheets 1000 rows

Page 66: KNIME Meetup 2016-04-16

Case study 2 – Use of Flow variables

Technique ◦ Input metadata rules into a file

◦ Read and convert into flow variables

Example ◦ Reorder variables in a dataset as per the

order in the data dictionary

◦ We use “Flow variables” tab in Column Reorder tab to achieve this

Page 67: KNIME Meetup 2016-04-16

Use of flow variables

Use this tab

Do not use this “manual” tab

Page 68: KNIME Meetup 2016-04-16

KNIME wishlist! Offset function in some nodes

eg. Rule Engine, Math formula Offset function gives the value of a variable in a previous row.

Eg. In SPSS Clementine @OFFSET(var,1) gives the value in the previous row.

Note:- Within Java Snippet this is readily achieved since a variable retains its value until it is over-written. Therefore we can conveniently first utilise the value populated from the previous row inside a formula.Then we can update the value from the current row so as to be used in the next row.

Page 69: KNIME Meetup 2016-04-16

Questions?

Page 70: KNIME Meetup 2016-04-16
Page 71: KNIME Meetup 2016-04-16

Data Transformed

Chhitesh Shrestha

Page 72: KNIME Meetup 2016-04-16

Apache Spark on KNIME

Unleash the power of Big Data on Hadoop

Page 73: KNIME Meetup 2016-04-16

The Big Data Problem: Data Volume

1. Storage are getting cheaper

2. Data sources are increasing

3. Thus, data is growing faster

YARN

But, Still processing them is a problem. Why ?

Page 74: KNIME Meetup 2016-04-16

The Big Data Problem: Processing

Now, as the memory is cheaper.

Page 75: KNIME Meetup 2016-04-16

Why Apache Spark ?

Apache Spark is an open source parallel

processing framework that enables users to

run large scale data analytics across clustered

computers.

• Speed

• Flexible with programming platform

• Generality

• Run Everywhere

Page 76: KNIME Meetup 2016-04-16

Spark Components

Page 77: KNIME Meetup 2016-04-16

Spark Comparison on Calculation of Average

Page 78: KNIME Meetup 2016-04-16

List of Spark Nodes

Page 79: KNIME Meetup 2016-04-16

Getting the data in and out of Spark

Data into Spark Data out of Spark

Page 80: KNIME Meetup 2016-04-16

Statistics and Data Manipulation Nodes

Statistics Data Manipulation

Page 81: KNIME Meetup 2016-04-16

Mining Nodes

Learners Predictors

Page 82: KNIME Meetup 2016-04-16

Other Nodes

Page 83: KNIME Meetup 2016-04-16

KNIME Spark Executor Architecture

Page 84: KNIME Meetup 2016-04-16

Current Supported Hadoop and KNIME Versions

Hadoop Versions

• Hortonworks HDP 2.2 with Spark 1.2.x

• Hortonworks HDP 2.3 with Spark 1.3.x

• Cloudera CDH 5.3 with Spark 1.2.x

• Cloudera CDH 5.4 with Spark 1.3.x

KNIME Versions

• KNIME Analytics Platform 3.1

• KNIME Server 4.2

Page 86: KNIME Meetup 2016-04-16

Data TransformedYOUR DATA. CLEARLY.

[email protected]

02 9956 3781

Page 87: KNIME Meetup 2016-04-16

Actian Vortex on Hadoop 10 minute Demo

http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg

Demonstration of Vortex, Dataflow & Vector

Comparison between Actian Vortex & Cloudera Impala

Page 88: KNIME Meetup 2016-04-16

Actian Vector: Example

https://youtu.be/dYTF5ZNioEI

Identical 150 Million Transaction Query

Comparison between Actian Vector & Oracle DBMS