NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation

Data: Spark and the Data Federation

Leif Pedersen

Executive IT Specialist,

z Analytics, Europe

Email: Leif.Pedersen@dk.ibm.com

Systems of InsightSystems of RecordSystems of

Engagement

Look like a “déjà vu”?

In the new insight economy, winners infuse analytics everywhere to drive better outcomes!

Create new business models(CEO)

Attract, grow, retain customers(CMO)

Transform financial& management

processes(CFO)

Manage risk(CRO)

Prioritize IT investmentfor innovation(CIO, CDO)

Optimize operations

Fight fraud and counter threats

Systems of Insight

Systems of RecordSystems of

Engagement

All Data New Dev StylesNew Analytics More People

Business Value

Embrace all data

Run at the speed of business

1 Enable all analytics

IBM Analytics Point of View - Make DATA SIMPLE and ACCESSIBLE to ALL

DATA Professionals are

leading THE Transformation!

The Evolution in the Approach to Getting Value from Data

Operations Data Warehousing Self-service Analytics

New Business Imperatives

Maturity High

Data-Informed Decision Making

• Full dataset analysis (no more sampling)

• Extract value from non-relational data

• 360o

view of all enterprise data

• Exploratory analysis and discovery

Warehouse Modernization

• Data lake

• Data offload

• ETL offload

• Queryable archive and staging

Lower the Costof Storage

Ensure resiliency and availability

Business Transformation

• Create new business models

• Risk-aware decision making

• Fight fraud and counter threats

• Optimize operations

• Attract, grow, retain customers

We are here

Analytics evolution to support all Analytics Apps on all Data –The Mainframe Use case

Applications Data

HDFSMap / Reduce

SparkHistorical data in DB2 for z/OS &

IBM DB2 Analytics Accelerator

Other Data

BI Reporting Data Warehouse / Data Marts

The Data Lake Evolution

Operational Data stored in

VSAM, IMS, DB2

SoR Core Business supported by

CICS, IMS, WAS

z/OSRulesScore

execution

Machine LearningThe Predictive Analytics Evolution Score

Creation

IT Operational Data

z Systems Analytics Areas complement existing Analytics Environments.

Accele

In transaction rules and score execution

Intraday capability for ad-hoc queries & predictive analytics

Availability of historical data (in raw format)

Accelerated reporting to fulfill internal and regulatory

requirements

Ability to transform data before offload to

DWH or reportingAbility to create new models at any time

Quasi Real Time availability of data

for analytics

Instant access to raw data for new report generation in

hours instead of days

Load and merge of ANY non DB2 z/OS data

Scoring Rules

zDatazApps

Scoring

Explore data to uncover hidden

insights

� Opportunity to rethink business processes: analytics as an integral part of the process itself, rather than a separate activity performed after the fact

o Transform business processes, not just provide existing styles of analytics faster and without latency

� Enable business leaders to perform, in the context of operational processes, advanced and sophisticated real-time analysis of their business data

Hybrid transaction/analytical processing will empower application leaders toinnovate via greater situation awareness and improved business agility.

Gartner Research Note G00259033 28 January 2014: Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation

The integration of transactions and analytics is an emerging and important market segment

“”

Analytics as part of the

flow of business

Insights on every

transaction

Hybrid Transaction/Analytical Processing (HTAP) - with DB2 Analytics Accelerator

cessin

nalytic

DB2 for z/OS CPU savings target• Operational (in transaction)

analytics

• (complex) OLTP

Accelerator focus• Ad-hoc queries

• Complex queries scanning

large amount of data

• ETL acceleration/virtual

transformation

Complex queries (more history)

OLTP Transactions

High concurrency

Hybrid Transactional & Analytical Processing

Standard reports

Data Warehouse and Data Lake

A Data Lake is…

+An analytics sandbox for exploring data to gain insight

+An enterprise-wide catalog to find data across the enterprise and to link from business term to technical metadata

+An environment for enabling reuse data transformations and queries

+An environment where users can access vast amounts raw data

+An environment for developing and proving an analytics model and then moving into production; experience in production may drive further experimentation in the data lake

A Data Lake is not…

- A data warehouse or data mart of all of the data in an enterprise

- A high-performance production environment

- A production reporting application

- A purpose-built system to solve a specific problem

� Fast Runtime Environment– Interactive or batch processing

– Based on data in-memory processing• High performance for multi-step processes where Spark can

pass the data directly without using disk storage.

– Parallel processing

� Interface to Data – Accessing Hadoop based HDFS data, Cassandra,

Hbase, …

– Accessing any traditional databases using JDBC

� Interface for Applications – Ease of Use APIs supported by modern languages

– Stack of libraries including SQL, Machine Learning,

GraphX, and Spark Streaming

– Over 80 high-level operators that make it easy to build

parallel applications

– Many languages supported including Java, Scala, Python

and R• Spark is actually written in Scala

Spark, a Transaction Manager for Analytics Applications

Spark is NOT a datastore, NOT a replacement for Hadoop!

2. Spark lets you develop line-of-business applications faster

3. Spark learns from data and delivers in real time

With Hadoop, you ask a question and get back a batch of data. With Spark, you may say, “continue to give me answers to this question”…and when new data comes, the user is smarter.

1. Spark makes it easier to access and work

with all data

- Enables new data-based use cases

- All data: Internal/ External, Structured/

Unstructured

- Real-time insights, from all data

sources

- Automates analytics with Machine

Learning

- Clients that lead in data, lead their

industry

DesignDevelop

Science

Why Spark matters to a business?

z/OSKey

Business

Transaction

& Batch

Systems

Spark Applications: IBM

and Partners

AdabasIMSDB2 z/OS

Distributed

Teradata

Apache Spark Core

Stream

SQLMLib GraphX

Optimized data access

IBM z/OS Platform for Apache Spark

and *many* more . . .

Spark can run on z/OS close to z/OS-based Applications & Data

Values:Data-in place analytics, without need to ETL or move data for analytic purposes

Optimized access and z/OS governed ‘in-memory’ capabilities for core business data

Unique capability to access almost all z/OS sources with Apache Spark SQL & many non-z data sources

Almost all zIIP eligible

Integration of analytics across core systems, social data, website information, etc.

and *many* more including SMF, OPERLOG, SYSLOGs, . . .

Examples of Spark Use Case

� Client Insight Analytics over transactions & customer interactions

� Leverage data on z/OS (DB2, VSAM) & distributed (Oracle, SQL Server, HDFS) to enable real-time access from data

science teams focused on client insight to develop patterns, models

� Data Distillation - Hybrid Architecture

� Run Spark z/OS to access, aggregate, filter and *distill* large volumes of data

� Make available smaller, aggregated analytic results for access by: customer insight solutions, data science environments

� 360 Degree View: Customers, Payments, Transactions

� Leverage Spark z/OS to get real-time or near real-time view of current status of payments, transactions, customers combining data from OLTP, distributed sources, & streaming

� IT Analytics

� Analyze real-time streamed SMF data, combined with archived SMF data and syslog data, visualize and interact with data

science Jupyter Notebook to find patterns

Use Case Patterns

Distill the Data: • Use Spark z/OS for data blending, cleansing, transform, etc with data-

in-place• Store results in ‘Tidy’ Data Repository • Refresh as needed

Explore the results� Data exploration, investigationleveraging ‘Tidy’ Repository

Values:• Leverage most current business data for data science• Efficiencies in reducing ETL • Leverage common analytics ecosystem skill • Integrate Spark on multiple platforms for optimal analytics infrastructure

Use Case #1: Hybrid Data Science

Use Case #2: Optimized Customer Insight

Customer

Transaction Merchant

Spark Analytic Result Set

Call Center

Apache Spark Core

Stream

SQLMLib GraphX

Optimized data Layer

IBM z/OS Platform for Apache

Subset of Data: distilled, filtered, transformed

BIDashboard

Components

DataCube

AnalyticalEngines

WebPortal

Analytics

Pre-BuiltDashboards

Pre-BuiltData Models

Pre-BuiltAnalytical Models

Transform (if needed), &

populate BBCI staging area /

Input &

Output

Tidy Data

Values:• Avoid costly and ineffective wholesale copy of data• Frequent refresh of most relevant data elements to customer insights solution• Faster time to implementation for business solution to deliver insights on churn, cross-

sell, etc.

Customer Insight for Banking Solution

Use Case #3: Real-Time Application Event Analytics Use Case

Spark z/OS

Event Stream

� CICS Event triggers create an event stream that would

be captured by Spark running on its own z/OS LPAR

� Spark configured for high availability to avoid impacting CICS

� Real-Time Analytics with Spark z/OS:

� Real time analytics to provide feedback into the

Systems of Engagement or Monitoring Systems on types of banking services and frequency of

consumption

� Real time monitoring of core business processes and applications

� Historical Analysis leverages IDAA:

� Batch Load of Events for historical, trending and

reporting

Real Time

Analytics, can

include scoring

DB2 Analytics Accelerator

Loader

Channel

System of Engagement

CICS Transactions

Monitor

LogstreamLogstream

IBM DB2 Analytics Accelerator

Real-Time Consumption Batch Load Overnight

Historical

Analysis, Reporting

DB2 z/OS

Use Case #4: Surface Spark Results to JDBC / ODBC Applications

DB2 z/OS

Apache Spark Core

SQLMLib

DFStor

• Persist

specific Spark

Result

• Backed

by VSAM • Leverage

z/OS SAF,

Dataset

JDBC / ODBC / REST, noSQLClient accessing Spark RDDs, example: Cognos , Tableau, …

Optimized Data Layer

IMSVSAM

Use Case #5: Analyzing SMF Data with Spark

• Spark application is

agnostic to data source

and number of sources

• MDSS required on at

least one system, MDSS

agents required on all

systems. No IPL required

for installation

• Logstream recording

mode required for

realtime interfaces

MDSS Client

Realtime

LogstreamLogstream

Logstream SMF

Realtime

LogstreamLogstream

Logstream SMF

Realtime

LogstreamLogstream

Logstream

Spark Application using SparkSQL

Optimized Data Integration Layer (MDSS)

Realtime

LogstreamLogstream

Logstream

Dump Data Sets

�Analyze real-time in-memory SMF data, combined with archived data

�Analyze data across multiple LPARs

�Augment with SYSLOG and other sources for richer analytic outcome

�Efficiencies in avoiding data movement

Use Cases for Real Time SMF Analytics

� Detect excessive memory consumption – SMF30

Monitor high water mark for real memory usage for jobs and send alerts if usage exceeds normal consumption

� Detect security violations in real-time – SMF 80

Monitor volume of datasets/files accessed per user within a given time period and raise alerts for above normal access rates

� Real time monitoring resource usage in cloud environments (CPU, Memory, Disk)

A list of supported SMF record types can be found in the Redbook “Apache Spark Implementation on IBM z/OS” - page 78

http://www.redbooks.ibm.com/abstracts/sg248325.html

IBM Open Data Analytics for z/OS

Business Applications

CustomerTransaction Merchant

Distributed

Apache Spark

Distilled Insight

Acceleration

Leveraging IBM Z for Optimized Analytics

Federate analytics leveraging data in place for more current insights at scale,

optimized security, privacy and reduced costs

DataDataData PrepData Prep

ML AlgoML

AlgoModelModel DeployDeploy PredictPredict

Python

Distilled InsightAnalytic Result

Govern, Manage, Algorithm Assist…

Monitor, Feedback

Pauselss GC

New SIMD instructions 32 TB MemoryPervasive Encryption

IBM Machine Learning for z/OS

Optimized Data Integration Layer

IBM Open Data Analytics for z/OS: Offering Overview

What is in the Offering?

IBM Open Data Analytics for z/OS (IBM

product):• Apache Spark 2.1.1 enabled for z/OS

• Python 3.6.1

• All Pre-requisite libraries

• Select Anaconda Libraries (approx. 250 including

pandas, dask, numpy, scikit-learn, matplotlib…)

• Optimized Data Integration Layer: optimized for

Spark & Python db access to z/OS data

• Integration with WLM z/OS for resource

management aligned with job priority

• Integration with security (SAF) interfaces

• Support & Service available from IBM for a fee

–Very aggressive pricing for zIIPs (cores) and memory for

Open Data Analytics z/OS workload

Ecosystem

–GitHub zos-spark repository• Jupyter Notebooks (Scala, Python Workbenches)

• Kernel gateway, Jupyter client, kernel toree

• Sample data & code snippets

–Rocket: • Collaboration for Optimized Data Layer

• Industry vertical mappings, e.g. ISO8583-1, ACH,

SMF, etc.

–Continuum:

• Access to z/OS channel on Anaconda cloud for

updates / refreshes & Package management

• Option to license private mirrored environment

• Services & Consulting for Python

Value: Increase Integration �� through Persisting Analytic Results for Enterprise Collaboration

DF Store:• Specific

Spark &

Python

Result

• Backed by

• Leverage

z/OS SAF,

Dataset

mgmtOptimized Data Layer

Apache Spark Core

Stream

MLib GraphxSpark

Python 3.6.1Core Packages:• numpy• scikit-learn• dask• pandas• Matplotlib• Etc.

IMSDB2 z/OS

JDBC / ODBC / REST, noSQLClient accessing Spark RDDs, example: Cognos , Tableau, …

NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation

Devices & Hardware

Transcript of NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation

MAINFRAME - NRB · NRB IS THE #1 MAINFRAME SERVICES PROVIDER IN BELGIUM! ALMOST 30 YEARS OF EXPERIENCE ON MAINFRAME NRB was created as a shared mainframe infrastructure and service

Enriching Big Data with Mainframe Data Using Data ...With data virtualization big data and mainframe data can be integrated on demand and efficiently. In this case, many forms of reporting

Spotlight on Mainframe Security: Data Protection at …...3) Data Authenticity and Endpoint Security: Defending the Pervasively Connected Mainframe Mainframe modernization, via Service

Closing the Mainframe Skills Gap...mainframe security data center applicationconsulting network cloud managed infrastructure mainframe security ... the importance of properly integrating

Moving Mainframe Tape Outside The Data Center

NRB - LUXEMBOURG MAINFRAME DAY 2017 - IBM Z

CIMS Mainframe Data Collector and Chargeback Systempublib.boulder.ibm.com/tividd/td/CIMSMainF/mdccstr... · 1-2 CIMS Mainframe Data Collector and Chargeback System Installation and

Mainframe | Data | Replication | Migration | Integration ...

MAINFRAME COMMUNICATIONS LTD - MCL Data …€¦ · sales@mainframecomms.co.uk +44(0)1702 443800 MAINFRAME COMMUNICATIONS LTD Products Overview Mainframe Communications Ltd, Network

MAINFRAME - NRB€¦ · Some organisations – mainly inspired by the decreasing availability of mainframe skills and by periodically returning gloomy predictions in the media on

Oregon State Data Center Mainframe Software Consolidation

NRB DATA CENTRE SERVICES

NRB - Plateforme big data pour la Wallonie

IBM Mainframe-Integration Mainframe Change Data Capture

Integrating the Mainframe Liberating Enterprise Data.

MICHIGAN · enterprise* platforms, including the Unisys* mainframe, Bull* mainframe, Teradata* data warehouse*, and the data exchange gateway. In addition,

GM NRB: 2015 - Building and Construction Authority · GM NRB: 2015 GREEN MARK FOR NON-RESIDENTIAL BUILDINGS NRB: ... GM NRB:2015 Summarised Criteria ... Green Transport

Replicating Mainframe Tape Data for DR Best Practices€¦ · Replicating Mainframe Tape Data for DR ... •Hitachi Universal Replicator (Asynchronous) •HDS TrueCopy (Synchronous)

Maximize Return on Mainframe Data with Informatica · 4 Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business

ML & TF Risk Management Guidelines - NRB Bank NRB Bank