INFORMATICA Differences Between Versions 7 & 8
description
Transcript of INFORMATICA Differences Between Versions 7 & 8
1Informatica confidential. For discussion purposes only.
Informatica User GroupPowerCenter : Differences Between v 7 & v 8Mark Murray - Senior Sales Consultant October, 19th 2006
2Informatica confidential. For discussion purposes only.
Goals for New Architecture
• Enterprise Deployment• Improved Service Orientation• High Availability• Grid Deployments
• Centralized Services• Administration• Logging & Auditing
• Single Point of Administration• Traditional Configuration• HA Configuration • Grid Configuration
3Informatica confidential. For discussion purposes only.
What do customers want?
• High Availability and Failover was a top 10 request in the 2004 User Group surveys
• Database Pushdown Optimization was 10th out of 66 features in the 2005 Surveys
• Improved logging capabilities was 2nd out of over 60 feature requests in the 2004 surveys
• Looping support within the Designer
4Informatica confidential. For discussion purposes only.
Informatica Data Integration PlatformContinually Raising the Bar
PowerCenter 8.1.1Now
PowerCenter 7Advanced Edition
Mission-CriticalEnterprise Deployment
One Product,Single Install
On-Demand Platform for the Enterprise
Hercules2007
5Informatica confidential. For discussion purposes only.
6:35
3:36
“With PowerCenter continually leapfrogging on performance and scalability, we are never concerned about our ability to handle increasingly large data volumes in our data integration environment.”--- Kevin Smith, CRM Strategies Manager,
AAA Carolina
PipeliningERP ConnectivityUNICODE
PartitioningDebuggerXMLMetadata connectivity
PipeliningERP ConnectivityUNICODE
RealtimeWorkflowData quality3-tier architectureEnterprise metadata
PartitioningDebuggerXMLMetadata connectivity
PipeliningERP ConnectivityUNICODE
SOAWeb servicesGrid, 64-bitTeam developmentEnterprise securityMainframe Data Server and CDC Impact analysis
RealtimeWorkflowData quality3-tier architectureEnterprise metadata
PartitioningDebuggerXMLMetadata connectivity
PipeliningERP ConnectivityUNICODE
Session On GridAdaptive Load BalancingHigh AvailabilityDynamic PartitioningPushdown OptimizationUnstructured DataData Federation
SOAWeb ServicesGrid, 64-bitTeam developmentEnterprise securityMainframe Data Server and CDC Impact analysis
RealtimeWorkflowData quality3-tier architectureEnterprise metadata
PartitioningDebuggerXMLMetadata connectivity
PipeliningERP ConnectivityUNICODE
V4.x V5.x V6.x V7.x V8.x
1 TB Transform and Load Test HR: Min
0:37
Informatica DeliversContinuous Innovation
<18 min
6Informatica confidential. For discussion purposes only.
What else is in the Informatica product family?
PowerCenter Options
Data Cleanse and Match
Data Federation (EII)
Enterprise Grid
Pushdown Optimization
High Availability
Unstructured Data
New
Mapping Generation
PowerCenter 8StandardEdition
PowerCenter 8Advanced
EditionMetadata Manager
Data AnalyzerTeam Based Development
Data Profiling
Partitioning
Real-Time
Updated
Metadata Exchange
PowerCenter ConnectsBroader
7Informatica confidential. For discussion purposes only.
PowerCenter 8 Base ImprovementsDelivering Value for Installed Base Customers
Reduce Time To Results• Java transformation support• User defined functions• Extended expression library• Mapping generation and templates• Improved Data Profiling
Cost Effectively Scale• Centralized administration web-based console• Extended recovery options• Connection resilience (RDMS, Network, PC)• Flat File Performance Optimization• Enhanced, centralized logging• Enhanced Team-Based Development• Unicode repository option
PowerCenterStandardEdition
PowerCenterAdvanced
EditionMetadata Manager
Data Analyzer
Team Based Development
8Informatica confidential. For discussion purposes only.
PowerCenter 8 Release Themes
• Service Oriented Architecture• 24x7 Availability of PowerCenter services• Order of magnitude performance improvements• Unlimited scalability• Improved developer productivity
9Informatica confidential. For discussion purposes only.
PowerCenter 8.x Update –Setting the Standard for Data Integration across the Enterprise
• Infrastructure and Server Enhancements
• Services based Architecture• High Availability• Grid Enhancements• Easy Grid Configuration• Centralized administration web-based
console• Centralized configuration
• Developer Enhancements• Functions and Expressions• User Defined Functions• Java Transformation• Dynamic Target Creation• Visio Template – mapping generation
and templates• Upgrade Wizard
• Expand the definition of universal data access
• Data Federation Option• Unstructured Data Option• Data Quality Option –• Extended PowerExchange
• Performance Enhancements• Pushdown Optimization• Flat Files• Partitioning• Auto Cache• Connection resilience (RDMS,
Network, PC)
10Informatica confidential. For discussion purposes only.
PowerCenter 8 Architecture
11Informatica confidential. For discussion purposes only.
Machine
PowerCenter 6 and 7 Architecture
PowerCenterConnects
Data Servers (pmserver)
Repository Server
Client Tools
Repository DatabaseWeb Services
Hub
PowerExchange
Repository Manager
Designer
Workflow Manger
Workflow Monitor
Repository Server Admin Console
12Informatica confidential. For discussion purposes only.
Node & Domain
PowerCenter 8 Architecture
PowerCenterConnects
Core Services
Client Tools
Repository Database
PowerExchange
Repository Manager
Designer
Workflow Manger
Workflow Monitor
Administration Console
Application Services
Integration Service
Web Services Hub
Repository Service
SAP BW Service
Log ServiceRepository ServiceDomain/Gateway Services• Administration & Authorization• Configuration• Domain• Licensing
*
.
13Informatica confidential. For discussion purposes only.
PowerCenter 8 Terminology
• Services• A service is a resource that provides specialized functions.• PowerCenter has two types of services. Application and
Core Services.• PowerCenter Application Services – represents server based functions such
as Repository, Integration, SAP BW, and WebService Hub services.• PowerCenter Core Services – represents functions that manage and
maintain the environment in which PowerCenter operates.
14Informatica confidential. For discussion purposes only.
Introducing PowerCenter 8 Terminology
• Node• A node is a logical representation of a physical machine. It has
physical attributes such as a hostname and port number.• Each node runs a Service Manager which is responsible for the
application and core services.• Is started when you start “Informatica Services”
• Domain• A domain is the fundamental unit of PowerCenter Services
administraion. • A domain is a logical collection or set of nodes and services that
you can group in a “folder like” deployment.
15Informatica confidential. For discussion purposes only.
PowerCenter 8 Terminology
• Service Manager• On the gateway node, the Service Manager is responsible
for • Controlling the domain• Manage services running on the domain• Provide service lookup
• On all nodes, the Service Manager • Controls the core services and application services
16Informatica confidential. For discussion purposes only.
PowerCenter Services Framework
RepositoryDatabase
Master Gateway(Domain
Controller)
RepositoryService
PowerCenterDomain
AdministrationConsole
Client Tools
DomainMetadata
Logs
Checkpoint
Integration Service
Monitor
Workflow Manager
RepositoryManager
Designer
17Informatica confidential. For discussion purposes only.
High Availability (HA)
18Informatica confidential. For discussion purposes only.
High Availability in PC8
• Failover• Restart for data integration, repository and other services• Primary and backup servers
• Recovery • Workflow and sessions will be recovered on running servers on
the grid during server failure• Checkpoint recovery
• Repository recovery
• Resilience• PowerCenter jobs will sustain transient failure
• Network errors• DB connection failures
19Informatica confidential. For discussion purposes only.
Resilience
• DB Connection Resilience• When connecting/disconnecting from a DB• Oracle, DB2, Sybase, SQL Server and Teradata• Retry interval based on timeout setting
• FTP Resilience• For connections to FTP server• Read/write will recover if connection lost based on timeout
parameter
• Internal Resilience• PowerCenter components (integration service, clients etc.)
resilient to Repository service failure
20Informatica confidential. For discussion purposes only.
Simple High Availability/Failover Scenario
• Simple environment• 1 Domain which consists of:
• 2 nodes for Integration Services • node01 - Primary• node02 - Backup
• 1 server for repository.
Node01(Int_Svc01)
Node02(Int_Svc02)
Repository DB
21Informatica confidential. For discussion purposes only.
Simple High/Failover Availability Scenario
• node01 Integration Service goes down
node01(Int_Svs01)
node02(Int_Svs02)
Repository DB
• Node01 Integration Service “fails over” to node02
ComponentFailure
(HW/SW)
Automatic FailoverRestart
Recovery
22Informatica confidential. For discussion purposes only.
Grid Enhancements
23Informatica confidential. For discussion purposes only.
Domain Overview DashboardSimplified, Web-based Administration
Services
Domain
Nodes
Example Primary
& BackupRepository
Service
Services ConfigurationRemember pmserver config file?
24Informatica confidential. For discussion purposes only.
Mission-critical Enterprise DeploymentCost-effective Scalability with PowerCenter on a Grid
Automatically recover, restart on live server
Distributed processing of sessions
PowerCenterDomain on Server Grid
FailedHardware
Server
PowerCenterDomain
Controller
25Informatica confidential. For discussion purposes only.
Grid Enhancements
Grid Object• Configured from admin console• Services can be assigned to grid• Workflows are assigned to be run by services
• Workflow distributed on Grid (WOnG)• Same as version 7• Distribute Sessions of a Workflow across multiple nodes
• Session distributed on Grid (SOnG)• New in version 8• Can partition sessions to run on multiple nodes
Dynamic Partitioning• # of partitions dynamically determined at runtime• Less configuration for users
Resource Maps• Configure available resources on nodes in grid through admin console• Load balancer dispatch jobs based on resource availability on nodes
26Informatica confidential. For discussion purposes only.
Grid – PC 7 vs. PC 8
PowerCenter 7• ServerGrid is collection of
pmservers
• Work is directed to individual pmservers
• Work distributed across Grid in round-robin manner
• Session/task is lowest unit of work
27Informatica confidential. For discussion purposes only.
Grid Capabilities in 7.x vs. 8.x 8.X
• Grid object• Collection of nodes
• Workflows assigned to Integration Service
• Integration Service assigned to Grid (can run on any node in grid)
• If one node fails, another Integration Service process on another node in grid takes over running the workflow
• A session can be partitioned across nodes
• Load balancer takes into account resource availability on nodes and resource requirements of sessions for dispatch.
7.x• ServerGrid Object
• Collection of pmservers• Workflows explicitly assigned
to pmservers• Pmservers belonging to a
ServerGrid will dispatch to other pmservers
• Pmservers could fail causing workflows to fail
• Can’t split sessions across multiple nodes
• Load balancer is round robin only
28Informatica confidential. For discussion purposes only.
Performance Improvements
29Informatica confidential. For discussion purposes only.
Pushdown Optimization
30Informatica confidential. For discussion purposes only.
Introduction
• What is pushdown optimization?• Push transformation processing to data sources & targets
w/o moving data out
• Benefits• Reduce movement of data when source and target are the
same database instance• Utilize database-specific processing that may be more
optimal• Maintain metadata and lineage in PowerCenter
31Informatica confidential. For discussion purposes only.
Pushdown Optimization
• Full Pushdown:• Source and target are in the same RDBMS• All transformations can be processed in database
• Partial Source:• One or more transformations can be processed in source database
• Partial Target :• One or more transformations can be processed in target database
• Generated SQL: • INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))
SourceDB
TargetDB
LoadTransformExtract
32Informatica confidential. For discussion purposes only.
Example – Full PushdownSQL & Business Logic Maintained in Repository
33Informatica confidential. For discussion purposes only.
Flat File Performance & Parameter and Variable Enhancements
34Informatica confidential. For discussion purposes only.
Flat file enhancements
• FF Reader and Writer have been rewritten to optimize for performance• Delimited files with lots of decimal data will see the most
significant performance improvements• Out of box performance improvements should be between 30%-
300%
• Append to flat file targets• Session output can be appended to existing flat file
• Flat file source/target command support• Sources: use a command to generate source data or a file list
that references multiple source files. • Targets: use a command to process the target data or process
data for all partitioned targets in a session.
35Informatica confidential. For discussion purposes only.
Parameters and Variables Enhancements
• Parameter Enhancements• Table owner name for relational sources/targets• E-mail address• FTP remote file name
• Global section specification in parameter files for use across different workflows / sessions
36Informatica confidential. For discussion purposes only.
Partitioning Enhancements
37Informatica confidential. For discussion purposes only.
Partitioning Enhancements
• Flat File Partitioning• FF targets can now be partitioned• All partitions can write to a single file, a merge file or file list can
be created that contains the names of the individual files that were written
• Database Partitioning• Partitioned Oracle and DB2 sources can be read in parallel• No changes to targets. DB2 can be written to in parallel.
• Dynamic Partitioning• Based on # of partitions in database• Based on the # of nodes in a Grid
38Informatica confidential. For discussion purposes only.
Auto Cache
© Informatica Corporation, 2006. All rights reserved.
39Informatica confidential. For discussion purposes only.
AutoCache Overview
• Cache in PowerCenter v7• Default cache settings not adequate for all situations.• Default settings can underestimate new chip technologies.• Sometimes necessary to hand tune individual transformations.• Development did not always scale when deployed to different
production machines.
• Auto Cache in PowerCenter v8.x• Automatically distribute session memory to transformations.• Automatically scale memory usage based on resource available.• Automatically scale memory usage based on mapping
complexity.
40Informatica confidential. For discussion purposes only.
Memory Attributes
• PowerCenter has two types of memory attributes:• Transformation Memory Attributes • Session Memory Attributes
• Transformation Memory Attributes are for individual transformations:• Lookup, Aggregator, Rank, Joiner
• Index and Data Cache Size
• Sorter Cache Size• XML Target Cache Size
• Session Memory Attributes are for the session:• Default Buffer Block Size• DTM Buffer Size
41Informatica confidential. For discussion purposes only.
New Memory Attribute Specification
• Previously, only integer byte value were allowed for Memory Attributes. E.g, 1000000 or 2000000.
• Now also allow shortcuts: “KB”, “MB”, and “GB”. E.g, 100MB
• Also allow the value “Auto”• This indicates that the user wants PowerCenter to automatically
find a good value for that memory attribute• “Auto” supported for both session (e.g. DTM buffers/buffer block
size) and transformation memory attributes (e.g. lookup caches)
42Informatica confidential. For discussion purposes only.
AutoCache
• Allows the user to leave the calculations to PowerCenter
• User specifies total amount of memory AutoCache is allowed to use
• Automatically computes a value for ALL memory attributes that have the value “Auto”
• Will NOT affect any memory attributes where the value is not “Auto”
43Informatica confidential. For discussion purposes only.
Cache Calculator
• Click drop down
• Calculate based on the number of rows and the ports going into the object
• Value is propogatedinto the Cache value
44Informatica confidential. For discussion purposes only.
Developer Improvements
45Informatica confidential. For discussion purposes only.
Functions and Expressions
46Informatica confidential. For discussion purposes only.
Function Enhancements
• Over 20 new functions added in the 8.x release• Financial Functions, Regular Expression parsing/match,
IN(), Compression, Encryption, CRC, MD5 and more
• Custom Functions• Extend the functionality of the Expression Transformation
via a C API• All 20+ functions above were added via this API
47Informatica confidential. For discussion purposes only.
Function Enhancements
• User Defined Functions (UDF)• Ability for Designer users to create reusable functions
entirely within the Expression Language• UDFs are folder level objects• can use any valid functions (except aggregation
functions) as well as other UDFs (in the same folder)
48Informatica confidential. For discussion purposes only.
Java & SQL Transformations
49Informatica confidential. For discussion purposes only.
Java Transformation Use Cases
• Looping over data
• Walking data hierarchies
• Calling third-party APIs (Java based)• Calling RMI/EJB etc. • Other Java Packages
• Calling expression/UDF/unconnected widget (like lookup) from Custom Transformation
• Simple “Custom Transformation”
50Informatica confidential. For discussion purposes only.
Improved Developer Productivity Java Inline Coding Sample
51Informatica confidential. For discussion purposes only.
SQL Transformation Use Cases
• New SQL Transformation• Allows PowerCenter developers to execute SQL
statements midstream in a mapping.• You can insert, delete, update, and retrieve rows from a
database and returns database errors.• The SQL that is executed can be static SQL or can be
dynamic where the SQL statement is itself created on a row by row basis.
• The SQL transformation can also be used to execute SQL scripts from within a mapping – e.g. leverage SQL scripts that already exist
52Informatica confidential. For discussion purposes only.
XML
53Informatica confidential. For discussion purposes only.
XML Enhancements
• Filter data with query predicate
• Create a default namespace
• Import part of an XML schema
• Use anySimpleType
54Informatica confidential. For discussion purposes only.
Metadata Enhancements
55Informatica confidential. For discussion purposes only.
Metadata Exchange Enhancements
• New Data Model Support• Sybase Power Designer – bi-directional• Oracle Designer – bi-directional• ER Studio Design Tool – uni-directional (same as before)• CA Erwin – bi-directional
• Business Intelligence Support• Business Objects (bi-directional) – added 6.5 & XI & XI R2
XConnects• Cognos ReportNet Framework Manager (bi-directional) – added
2.0• Microstrategy (bi-directional) – added 8.0
56Informatica confidential. For discussion purposes only.
Dynamic Target Creation
57Informatica confidential. For discussion purposes only.
Dynamic Target creation
• Ability to dynamically create a target based on a transformation in the workspace or navigator
• Right click on transformation in workspace and selected Create and Add Target
• Drag a transformation and drop it in the Target folder
• Has same port definitions as transformation from which it was created
• Target type is same as repository you are using
• Can edit the target definition to change type or ports
• Creation dialog will be added in an upcoming release
58Informatica confidential. For discussion purposes only.
Improved Developer Productivity Target Generation
Simply Right-Click on an object…
…..Target is created! All you need to do is Auto link and you are ready to go
59Informatica confidential. For discussion purposes only.
Mapping Generation OptionVisio Client for PowerCenter
60Informatica confidential. For discussion purposes only.
Mapping Generation Option
• Bi-Directional “engine” for automatically generating mappings from Visio templates orreverse engineering PowerCenter mappings into Visio templates
• Leverages the Informatica Data Stencil and Velocity templates for Visio
61Informatica confidential. For discussion purposes only.
Visio Client for PowerCenter
Mapping Template
Template Inputs
62Informatica confidential. For discussion purposes only.
Upgrade Wizard
63Informatica confidential. For discussion purposes only.
PowerCenter Upgrade to 8.1
• A new Upgrade wizard in Admin Console• Integrated UI that takes the user through the various steps in the
upgrade• Provides a detailed upgrade summary report in the end• Allows user to switch in and out of the Upgrade UI to perform any
other administrative activities• Can handle multiple repositories (global /local) and multiple
PowerCenter Servers in one shot• Live feedback during repository upgrade as user goes through
the upgrade process
• A new post-upgrade reference guide
64Informatica confidential. For discussion purposes only.
Summary
65Informatica confidential. For discussion purposes only.
Summary - PC 7 vs. PC 8
PC 7.x• 3 Tier Architecture
• Basic Grid Deployment
• Introduction to Profiling
• Added Transformations• Union• XML
• Web Services
• Team Based Development
PC 8.x• Services Oriented Architecture
• Enhanced Grid Deployment• High Availability• Session on Grid• Resilience
• Enhanced Profiling
• Added Transformations• Java• SQL
• Enhanced Productivity• Mapping Generation• User Defined Functions
66Informatica confidential. For discussion purposes only.
Thank You Questions at the break