Presented by Tony Curcio Sponsored By - DataStagedsxchange.net/uploads/WhatNew912.pdf · Bonus...
Transcript of Presented by Tony Curcio Sponsored By - DataStagedsxchange.net/uploads/WhatNew912.pdf · Bonus...
1
Luncheon Webinar SeriesNovember 18th, 2013
What’s new in IS 9.1.2 – Presented by Tony Curcio
Sponsored By:
What’s new in IS 9.1.2 Presentation
Questions and suggestions regarding presentation topics? - send to
Downloading the presentation
• http://www.dsxchange.net/2013IOD.html
• Replay will be available within one day with email with details
Pricing and configuration - send to [email protected] Subject line : Pricing
For those that stay through the entire presentation, we have a extra give away!
Bonus Offer – Free premium membership for your DataStage Management! Submit
your management’s email address and we will offer him access on your behalf.
• Email [email protected] subject line “Managers special”.
• Join us all at Linkedin http://tinyurl.com/DSXmembers
2
Delivering business value via an integrated platform
Agile IntegrationWherever your integration resides, integrate it quickly and flexibly
Business Driven GovernanceMake decisions with confidence using trusted data at the point of impact
Sustainable Quality Ensure information accuracy and quickly adapt to strategic business changes
3
Critical Capabilities: Data Delivery Styles for Data Integration Tools,
December 2012
Analyst Surveys in the past 12 months
Magic Quadrant for Data Integration Tools
July 2013
http://www.gartner.com/technology/reprints.do?id=1-1DASS01&ct=121218&st=sb
http://www.gartner.com/technology/reprints.do?id=1-1HYC2VJ&ct=130731&st=sb
4
Prior Highlights
2010 2011 2012 2013 2014
5
Information Server Recent Activity
8.5 FP1
8.7 FP1
9.1
FP2 FP3
FP2
FP1 9.1.2
Data Integration Acceleration- Advanced transformation
features (looping/v.pivot)- zOS File Stage- Integrated Balanced
Optimizer capabilities
Robust Enterprise Support- New Suite Installer- Active/Passive
High Availability support- Source Code Control
Integration
Simple Data Quality- Standardization Quality
Assessment- Match Specification
Report- Match Designer Updates
Stronger Governance- Operations Console- Business Glossary Workflow- Blueprint Task Management- Metadata Asset Manager
Product Integration- Leverage Data Validation
Rules in DataStage Jobs- Advanced Data Replication
integration- Next Generation Netezza
Connectivity & Optimization- HDFS Integration
Advanced Admin & Productivity- Parallel Debugger- New Backup/restore tooling- Maintenance Mode- Stronger Encryption
Agile integration- InfoSphere Data Click- Enhanced Workload Mgmt- ODM Integration- Hadoop Balanced Optimization- HDFS Extensions- InfoSphere Streams Integration
Business Driven Governance- Policy and rules support for
information governance- Web-based blueprints- Integrated metadata mgmt
enhancements
Sustainable Quality- Data Quality Console- Standardization Rules Designer- Data Rules Advancements
Business Driven Governance- IDA 8.5 - Additional Workflow Roles- Data Rules Meetadata- Bulk metadata import
Sustainable Quality- Profiling Big Data- Exception Stage- New QS standardization rulesets
(Thailand , Ireland , India update)
Anywhere Integration- Big Data Features
* JSON support* JDBC connector
- DB2 on z/OS load optimization- Data Click new data sources/targets
Prior Highlights
Operations ConsoleBrowser-based project/job monitoring to view and analyze run-time environment
Information Server Recent Activity
6
Workload ManagementProvide prioritization of mission critical tasks or specific project of job workloads
InfoSphere Data Click self-service data integration on-demand through a simple web-based interface for any user
Information Server & Hadoop Leverage the same designer UI to read/wrtie to Hadoop and automatically generate MapReduce
8.7
9.1
9.1
8.7
Prior Highlights
Operations Console
• Provides quick answers for the operators, developers and other stakeholders as they view and analyze run-time environment
• Dashboard style graphs representing current job activity, success/failed jobs status, system health and resource consumption
• Provides control over job state (allows authorized users to run, stop and reset jobs)
• Visual cues alert users to potential issues• Server wide and project specific views • High level summary and detailed information for
both current and historical activity• Prebuilt Cognos reports explore operational
metrics
Information Server Recent Activity
7
Prior Highlights
Workload Management
• Allows the proactive mgmt of system resources where multiple teams share a common hardware infrastructure.
• Optimize hardware utilization for better performance in a busy system
• Provide prioritization of mission critical tasks or specific project of job workloads
• Throttle job activity where system resources exceed admin-specified thresholds
• Assign the priority of any submitted job. Configurable across various Projects as well as at the job level.
• Manual overrides by privileged user to promote specific jobs to the top of the queue
16.1
54.75
0 10 20 30 40 50 60
With WLM
Original
Runtime
Information Server Recent Activity
8
Prior Highlights
InfoSphere Data Click self-service data integration on-demand
• Feature of Information Server that provides a simple web-based interface for any user
• Move data in batch or real-time just in two simple clicks
• User Interface presents policy choices which are then automated without any coding
• Optimized runtime with InfoSphere DataStage and InfoSphere Data Replication
• Automatically captures metadata for built-in governance
9
Information Server Recent Activity
Prior Highlights
10
Information Server and Hadoop
Read/Write into HDFS•Uses same paradigm as sequential files - just add your hadoop server name and port number•Parallelization techniques for massive scale•Recent tests with PureData for Hadoop clock above 15+ TB/hr
Balanced Optimization for Hadoop•Leverage the same UI and stages to build MapReduce.•Drag and drop stages to the canvas, rather than have to learn MapReduce programming.•Push the processing to Hadoop for patterns when you don’t want to transport the data on the network.
Information Server Recent Activity
10
Prior Highlights
11
Connector Accelerators
•New connectors available for download on developerWorks
•Plugs into InfoSphere DataStage and QualityStage and operates just like any other stage.
•Includes features to exploit specific data sourceso Mongoo Cassandrao Hbaseo Avroo Hiveo JMSo Lotus Noteso WebSphere Transformation Extendero And more …
https://www.ibm.com/developerworks/community/files/app?lang=en#/folder/4645e12a-7bdb-40ed-a103-f1160b707758
Information Server Recent Activity
11
Prior Highlights
Standardization Rules DesignerWeb based data classification rule design through intuitive drag and drop paradigm
Information Server Recent Activity
12
Data Quality ConsoleProvides data stewards ability to assess and monitor key data quality performance stats
Enhanced Data RulesCommon data validation rules for analyst and developer tools with lineage to support governance
Match Acceleration easy to use wizard for basic US name and address data, individuals and businesses matches
9.19.1
9.1 8.7
Data IntegrationData Governance Data Quality InfrastructureOverview
General Notes
• 9.1.2 Released August 29. 2013 to Fix Central
• Image in Passport Advantage updated the following week
• Information Server 9.1.2 becomes the primary maintenance branch for the 9.1 version (any subsequent fix pack for 9.1 will be building on this branch)
• This is a true maintenance branch, including about 200 fixes from across the suite
• Fix list can be reviewed here…http://www-01.ibm.com/support/docview.wss?uid=swg21640382
• A few things we wanted to include in 9.1.2 shipped later in Q4, and will install as a patch on top of 9.1.2
13
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
14
Support for Information Data Architect 8.5
• Builds on the new metabroker introduced at 9.1 for Information Data Architect which:- introduced better performance at lower resource cost- removed Windows only dependency
• Certification of IDA v 8.5 added• Tolerance for orphaned and invalid objects (ability to ignore those that don’t impact rest of
model)• Improved error/warning logging
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
15
Business Glossary Enhancements
Data Rules Metadata
• Display data rule asset types (including unpublished rules) from InfoSphere Information Analyzer in the Browse All Assets page, can be searched, assigned to terms, governance rules, business labels and data stewards
• Drill down from a GovernanceRule to a Data Rule to the Database column to which its applied
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
16
Business Glossary Enhancements (continued)
Workflow Roles• Development Log now captures every history event including creation and reviewer
comments• Security roles have been changed to provide a higher degree of granularity for
existing roles: Author, Published and Reader• Two new workflow roles:
• Reviewer: can review changes and make comments• Approver: can approve changes to a new or existing term (but no edit abilities
themselves)• Can now add comments at every step of the workflow process.
Export Development Glossary• Can now export either development or published glossary
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
17
Metadata Workbench Enhancements
JDBC Connector support• Display details for JDBC Connector stage, including URL Definition, Schema, Table and SQL
statements• Includes ability to stitch JDBC into lineage flows
XML/JSON Support• Browse, query and detail display for XML/JSON• Displays column level information within asset page• Can be linked via manual binding for lineage
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
Executive Governance DashboardBusiness-Driven & Measured Governance
Innovation
• Measurements for policies and KPIs
• Rapid creation of tailored dashboards
• Leverages SQL Views across data quality metadata and profiling results
Value
• Immediate insight into governance policy status
• Interception of issues when they start, right at the source
Usage
• Raises data confidence with visual governance status
18
What’s New in Information Server v9.1.2
1000sof data points
and policies visualized
Data IntegrationData Governance Data Quality InfrastructureOverview
What’s New in Information Server v9.1.2
SQL Views
• Information Governance Dashboard includes a set of SQL views that provide access to Information Server metadata
• Three sets of views:• Common Metadata [CMVIEWS]
subset of common metadata object types that represent database metadata that is usually imported by bridges or connectors (hosts, databases, schemas, tables, views, etc…)
• Data Quality [IAVIEWS]object types from the data quality domain from IA, including projects, data rules, data rule sets, data rule definitions, and data rule set definitions.
• Information Governance [IGVIEWS]object types from InfoSphere Business Glossary, including categories, terms, labels, IG policies and IG rules. The schema also includes views that link information governance objects to the data rules that implement them and to other assets.
• HTML documentation: http://www.ibm.com/support/docview.wss?uid=swg27039651
Data IntegrationData Governance Data Quality InfrastructureOverview
20
Profiling Big Data in Hadoop
• Information Analyzer has been extended to support assessing data quality of Hive sources via the business analyst interface.
• Includes the following data quality capabilities for Hive
• Table and column level profiling for detailed inspection of values, including frequency distribution, cardinality, completeness, etc…
• Advanced analysis and monitoring provides source system profiling and analysis capabilities to help you classify and assess your data.
• Integrated rules analysis uses data quality rules for greater validation, trending and pattern analysis (same rules that are being applied via the Data Rules stage in DataStage and QualityStage)
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
21
New Standardization Rules
• Country specific rule sets for India, Ireland and Thailand
• Provide for data standardization of names (individual and organizational), addresses, phone and locality (varies per country)
• All rule sets can be used with the Investigate and Standardize stages
• Delivered as archive files in the QSRules folder of the install directory
• Client = ./InformationServer/Clients/Classic/QSRules
• Server = ./InformationServer/Server/PXEngine/QSRules
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
Exception Stage
• Collect exception data from any process which allows any data integration and quality process to capture exceptions and monitor over time
• Promote consistency in the way data stewards and business analysts can investigate data issues.
• Insert good data quality controls and governance practices into each project.
• Support clerical review for one-source and two-source variants to support business analyst review and tuning match algorithms
• Data steward dashboarding provides the same charting, searching, reporting, and monitoring as data rules, to facilitate integrated data remediation processes
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
23 IBM CONFIDENTIAL
What’s New in Information Server v9.1.2
InfoSphere Data Click self-service data integration on-demand
Overview•Business users need quick and easy access to information to support their analytical projects.•Organizations need to avoid data sprawl, so governance best practices must be ensured
New in this release•Universal Connectivity via ODBC to now support DB2, Netezza, Oracle, Teradata, Sybase, SQL Server, Greenplum, and others…. as source or target•Automatic filtering of columns with data types not supported by the target data store•Leverages connector framework enhancement for data sampling via “row limits”•http://www.youtube.com/watch?v=hUGGudh2iWI&feature=youtu.be
Data IntegrationData Governance Data Quality InfrastructureOverview
24
JSON Document Support
• Derive metadata format automatically from sample JSON documents…
• Supports hierarchical formats with simple fields, objects and arrays
• Schema views
• New Parsing and Composing steps provide for complex hierarchical data in JSON syntax; with value and structure validation options
• Multiple options for reading/writing data
- files directly from disk
- as part of a long string
- passed in/out as a LOB
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
25
JDBC Connector
• JDBC Connector provides Information Server products with access to JDBC data sources
• Supports data read and write operations and metadata import operations
• Certified in this release with Apache Derby and IBM Big Insights Big SQL drivers
• Simple setup – introduces isjdbc.config file to track JDBC drivers to be used
• Create in the DSEngine subdirectory
• Include CLASSPATH=semicolon separated driver classpaths
• CLASS_NAMES=semicolon separated driver class names (needed only for JDBC 3.0 drivers and older)
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
26
JDBC Connector (continued)
Metadata support
• Managed metadata import provided through new capabilities in InfoSphere Metadata Asset Manager (IMAM)
• Filtering by asset type and name patterns
• Express or managed import providing flexibility and rigor
• Staged assets can be analyzed, previewed and then persisted to the repository
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
27
JDBC Connector (continued)
Properties
• URL string is in a driver specific format and always starts with the jdbc: prefix
• Apache Derby driver: jdbc:derby://server1:1527/testdb
• Big SQL driver: jdbc:bigsql://server2:7052/myschema
• User name and Password for authentication/authorization, when supported by the driver and the back end data source
• Attributes are for driver-specific special connection properties, for example SSL configuration. It is a multi-line property with each line specified in property=value format
• Connection definitions can be tested from the connector stage editor (Test button) and they can be persisted to the metadata repository (Save button) for reuse in other JDBC Connector stages (Load button)
• Supports Unicode
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
28
JDBC Connector (continued)
Read behavior supports… • auto-generated & user-defined SELECT statements
including quoted identifiers
• schema validation to assist with the job design
• row limits, fetch size configuration, etc…
• reading SELECT statement from the specified file on the engine tier host
• emitting end of wave (EOW) marker records
• Runtime Column Propagation (RCP)
• partitioned reads (user-defined SQL only) [[node-count]] , [[node-number]] and [[node-number-base-one]] placeholders are replaced with the values before running the statement.For example:SELECT * FROM TABLE1 WHERE MOD(C1, [[node_count]]) = [[node_number]]
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
29
JDBC Connector (continued)
Write behavior supports…
• auto-generated and user-defined SQL statements (including quoted identifiers) and DDL statements
• schema validation (auto-generated statements only)
• multiple write modes: Insert, Update, Delete, Insert then update, Update then insert, Delete then insert, Insert new rows only and Custom
• batch processing (Insert, Update, Delete and Custom write modes only, and only when reject links are not present)
• reading statements from specified files
• management of unmatched input link fields
• multiple input links and choice of the record processing order across input links (All records, First record or Ordered)
• parallel writes
What’s New in Information Server v9.1.2
Data IntegrationData Governance Data Quality InfrastructureOverview
JDBC Connector (continued)
Lookup behavior supports…
• Normal (in-memory) and Sparse (direct) lookup types (modes) are supported
• auto-generated and user-defined statements for both lookup types
• quoted identifiers
• schema validation to assist with the job design
Reject behavior supports…
• Reject limits, error conditions and error information columns can be configured
• Each reject link must be associated with a dedicated input link
What’s New in Information Server v9.1.2
30
Data IntegrationData Governance Data Quality InfrastructureOverview
DB2 Z Bulk Load Optimization
Huge Performance Gains For Load• Moved from a single load stream to parallel streaming via Z pipes.• Multiple LOAD utilities targeting separate partitions which performs
faster than a single LOAD utility targeting all partitions.• 9.1.2 is 80 to 160% faster than 9.1 (depending on number of partitions)• Performance scales almost linearly as you increase the number of
partitions, regardless of load method.• Internal testing loading almost 1TB per hour using 16-way load
Huge Performance Gains For Read• connector determines the number of partitions in the table and dynamically configures the
number of DataStage nodes to match the number of partitions• Parallel read using the 9.1.2 DB2 connector is 40% faster than the 9.1.2 DB2Z stage,
regardless of the number of partitions.Resilience• When Retry on connection failure is set to Yes the connector will try to establish an FTP
connection again when the initial attempt to connect fails.
What’s New in Information Server v9.1.2
31
Data IntegrationData Governance Data Quality InfrastructureOverview
Metadata Import Performance Optimization (IMAM)
• Performance benefits of BI Simplification in 9.1- 46% reduction in execution time of Express Import
• Performance benefits of physical model import (Erwin)- 44-60% reduction in execution time of Express Import (Erwin)
• IMAM Express Import in 9.1 is 7 - 1200% faster than in 8.7 for the following workloads • IDA:
- Small workload (55K assets): +1200% (Throughput: 450 objects/s)- Large workload (119K – 430K): significantly lower resource requirements
• Erwin (124K assets): +50% (Throughput: 351 objects/s)• BO import (175K assets): +18% (Throughput: 318 objects/s) • DB2 PDR (108K assets): +7% (Throughput: 149 objects/s)• Cognos (141K assets): -20% (Throughput:120 objects/s)
• MITI in 9.1 extracts more metadata (+45% reports &+27x more models, etc) than MITI in 8.7
Note: Performance results may vary in other environments
What’s New in Information Server v9.1.2
32
Data IntegrationData Governance Data Quality InfrastructureOverview
Connector Enhancements
Limit number of returned rows• New property to support database sampling(required for feature of Data Click)• Applies to the following Connectors: ODBC, DB2, Netezza, Oracle, Teradata and JDBC
ODBC Connector expanded binary support• The ODBC Connector now supports automatically generated 'CREATE TABLE' statements for
types Binary, VarBinary or LongVarBinary
Name Label Description Default value
LimitRows Limit number of returned rows.
Select Yes to limit the number of rows that are returned by the connector.
False (No)
Limit Limit Enter the maximum number of rows that will be returned by the connector.
1000
What’s New in Information Server v9.1.2
33
Data IntegrationData Governance Data Quality InfrastructureOverview
Big Data Connectivity for Hive
• High-performance and throughput with support for Hive2 and concurrent connections
• Improved authentication for increased data security• Full driver metadata• Support for parameter arrays, processing the arrays as a
series of executions, one execution for each row in the array
• Support for standard SQL functionality, including Create Index, Create Table, Create View, Drop Index, Drop Table, Drop View
• Support for a wide range of data types: Int, TinyInt, SmallInt, BigInt, String, Double, Binary, Boolean, Float, and Timestamp
What’s New in Information Server v9.1.2
34
IBM IOD 2013 Presentation
Questions and suggestions regarding presentation topics? - send to
Downloading the presentation
• http://www.dsxchange.net/2013IOD.html
• Replay will be available within one day with email with details
Pricing and configuration - send to [email protected] Subject line : Pricing
For those that stay through the entire presentation, we have a extra give away!
Bonus Offer – Free premium membership for your DataStage Management! Submit
your management’s email address and we will offer him access on your behalf.
• Email [email protected] subject line “Managers special”.
• Join us all at Linkedin http://tinyurl.com/DSXmembers
35