In for Ma Tic a Cookbook
-
Upload
api-3702057 -
Category
Documents
-
view
307 -
download
1
Transcript of In for Ma Tic a Cookbook
INFORMATICA COOKBOOK
INFORMATICA DEVELOPER’S GUIDE
Author : Sastry KolluruCreation Date :Last Modified :Version : 1.00
ApprovalsStephen Musgrove :
:
Informatica Cookbook
Change Record DATE Author Versio
nReference
19-Apr-2004 Sastry Kolluru 1.00 Added section 7.7 and 7.8
Reviewers NAME POSITION
Table of Contents
Version 1.00 Page 2 of 19
Informatica Cookbook
1.0 OVERVIEW......................................................................................................................................5
2.0 GETTING STARTED.....................................................................................................................5
2.1 ABOUT INFORMATICA.............................................................................................................................52.1.1 Version in use..................................................................................................................................5
3.0 INFORMATICA DEVELOPMENT CYCLE............................................................................5
3.1 STARTING A NEW PROJECT.....................................................................................................................53.1.1 Project Initialization........................................................................................................................53.1.2 Login................................................................................................................................................63.1.3 Folders and Groups setup...............................................................................................................6
3.2 DEVELOPMENT AND TESTING PROCESS..................................................................................................63.3 MIGRATION TO PRODUCTION..................................................................................................................6
3.3.1 Information to be provided..............................................................................................................63.3.2 Review before movement.................................................................................................................6
3.4 CHANGES TO AN EXISTING PROJECT......................................................................................................7
4.0 TRANSITION OF PROJECTS FOR SUPPORT..................................................................7
4.1 REQUIREMENTS FOR SUPPORT.................................................................................................................74.2 SUPPORT PROCESS ON FAILURE...............................................................................................................84.3 SUPPORT WINDOW..................................................................................................................................8
5.0 INFORMATICA ENVIRONMENTS.........................................................................................8
5.1 DEVELOPMENT........................................................................................................................................85.2 PRODUCTION...........................................................................................................................................8
6.0 ENGINE MANAGEMENT...........................................................................................................9
6.1 MANAGING THE ENGINE.........................................................................................................................96.2 RESTARTING THE ENGINE........................................................................................................................9
7.0 BEST PRACTICES.........................................................................................................................9
7.1 NAMING STANDARDS..............................................................................................................................97.1.1 Challenge.........................................................................................................................................97.1.2 Description......................................................................................................................................9
7.2 TEMPLATES...........................................................................................................................................127.2.1 Challenge.......................................................................................................................................127.2.2 Description....................................................................................................................................12
7.3 USAGE OF CONNECTION OBJECTS........................................................................................................147.3.1 Challenge.......................................................................................................................................147.3.2 Description....................................................................................................................................14
7.4 FAILURE SCRIPTS..................................................................................................................................157.4.1 Challenge.......................................................................................................................................157.4.2 Description....................................................................................................................................15
7.5 TRUNCATING DATA...............................................................................................................................157.5.1 Challenge.......................................................................................................................................157.5.2 Description....................................................................................................................................15
7.6 BUILT-IN RE-STARTABILITY..................................................................................................................167.6.1 Challenge.......................................................................................................................................167.6.2 Description....................................................................................................................................16
7.7 PROJECT DIRECTORY STRUCTURE IN UNIX...........................................................................................177.7.1 Challenge.......................................................................................................................................177.7.2 Description....................................................................................................................................17
7.8 PARAMETERIZATION OF SESSION INFORMATION...................................................................................177.8.1 Challenge.......................................................................................................................................17
Version 1.00 Page 3 of 19
Informatica Cookbook
7.8.2 Description....................................................................................................................................17
Version 1.00 Page 4 of 19
Informatica Cookbook
1.0 OVERVIEW
The objective of the Informatica Cookbook is to provide the Informatica user community at Fidelity Investments information regarding
Informatica infrastructure at FEB Processes for the development life cycle Best practices/ tips and techniques
The cookbook hopes to be a starting point for developers so that they can understand standards/processes and best practices before starting work on the FEB Informatica infrastructure. It also will act as a refresher for experienced developers for best practices and learning’s from other users.
We hope to update this document on a regular basis to incorporate better practices and learning’s.
2.0 GETTING STARTED
2.1 ABOUT INFORMATICA
Informatica PowerCenter is a data integration platform for building, deploying, and managing enterprise data warehouses, and other data integration projects. Informatica PowerCenter enables users to easily transform data from disparate enterprise systems and sources into reliable information to support strategic business initiatives.
2.1.1 Version in useThe versions of Informatica currently running are 5.1 and 6.2. All new developments should be done in version 6.2. All projects currently in Informatica 5.1 will be migrated to Informatica 6.2
3.0 INFORMATICA DEVELOPMENT CYCLE
3.1 STARTING A NEW PROJECT
3.1.1 Project InitializationA mail has to be sent to the Informatica Support team before the start of any project. The mail should contain the following information.
1. Project Name2. Project Contact3. Folder Name4. List of users accessing the folder5. Informatica version planned to be used6. Expected date of moving to Production
Version 1.00 Page 5 of 19
Informatica Cookbook
7. Expected number of sessions/mappings in the project
A minimum of 5 days notice should be given for code to be moved to production to help plan the same.
3.1.2 LoginEvery user should have a login into development as well as production. The Corp id will be used as login for individual users. In development users will be given access to both create and execute mappings/sessions whereas in production only read access will be given. The request for creating a new login may come as a part of the project initialization mail or a separate mail maybe sent to the Informatica Support Group. A selective execute privilege can be requested for some sessions or workflows.
3.1.3 Folders and Groups setupFolders will be created based on the information provided to the Informatica Support Group as a part of the project initialization process. Groups will be setup in Informatica to manage access of users to various folders.
3.2 DEVELOPMENT AND TESTING PROCESS
All development should be done in the development instance of Informatica and Oracle. Separate folders should be created in the same development repository for development, QA and SIT.
The folders marked as <folder_name>_prod will be moved to production on request. The naming convention to be followed will be as described in the Naming Convention best practice in section 7.1.
3.3 MIGRATION TO PRODUCTION
3.3.1 Information to be provided
After coding and testing has been done in development, the following information should be provided to the Informatica Development Team so as to facilitate movement of code. This could also be true for enhancements/Bugfixes existing mappings/Sessions
1. Project name2. Folder in development3. List of session/mapping names4. If any scripts need to be moved then the list of the same5. Date when the movement has to be made
3.3.2 Review before movement
The Informatica Support group will review mappings and Sessions before it is moved from development to Production, following are some of the important points
1. Check if existing database connector/FTP connectors/ External loaders have
Version 1.00 Page 6 of 19
Informatica Cookbook
been used2. Check if the failure scripts have been added, refer to the Best Practices
session for more details3. Location of scripts/intermediate files and any datafiles4. Location of lookup and other Caches5. If any intermediate files are being generated as a part of the process then
they should be deleted at the end of the process6. Check if any existing code or setting can cause a known bug7. Check if the Scheduling could effect the performance of existing sessions8. Suggest process improvements to improve efficiency. Any project team could
approach the Informatica Support Team during the initial stages of the project for process review. If the methods used to code may affect the existing systems then they will not be moved into production.
9. Restartability
3.4 CHANGES TO AN EXISTING PROJECT
If enhancements/Bugfixes have been made to existing mappings/Sessions, the same need to be tested in development and the following information should be provided to the Informatica Support Team
1. Project name2. Folder in development3. List of session/mapping names4. If any scripts need to be moved then the list of the same.5. Date when the movement has to be made
4.0 TRANSITION OF PROJECTS FOR SUPPORT
4.1 REQUIREMENTS FOR SUPPORT
1. A operations document on the functionality of sessions/Mappings to be supported. 2. A re-start and recovery document explaining the actions to be taken if there is a
failure for every session/batch. It would be recommended to create cleanup scripts so as to avoid unnecessary manual intervention.
3. Information on support contacts should be provided so that in case there is a need they can be contacted. The types of contacts to be provided should be a primary contact and a secondary contact.
4.2 SUPPORT PROCESS ON FAILURE
1. On failure the session will send out a mail/Page to the support team. The Informatica support team shall follow the Re-start and recovery process provided.
2. A mail will be sent to the primary and secondary contacts summarizing the reason for failure and the action taken
Version 1.00 Page 7 of 19
Informatica Cookbook
4.3 SUPPORT WINDOW
Support for Informatica jobs will be provided between the following hours
Monday to Friday
OnSite Support - 9:00AM to 6:00PMOffShore Support - 11:00PM to 6:00PM
Saturday/Sunday and HolidaysOffShore Support - 11:00PM to 6:00PM
5.0 INFORMATICA ENVIRONMENTS
5.1 DEVELOPMENT
The Informatica Development Engine is setup in webstatdev. The repository is in oracle and it has been hosted in smmk94 so that backup’s of it are taken from time to time. There are development instances in version 5.1 and 6.2.
Power center 5.1Repository name – EsiteDevRepository Database -
Power center 6.2Repository name – PMNEWHost Name - webstatdevPort number - 5031
5.2 PRODUCTION
The Informatica Development Engine is setup in smmk94. The repository is in oracle and it has been hosted in smmk94. There are production instances in version 5.1 and 6.2.
Power center 5.1Repository name – EsiteProdRepository Database -
Power center 6.2Repository name – eSite62testHost Name - smmk94Port number - 5031
Version 1.00 Page 8 of 19
Informatica Cookbook
6.0 ENGINE MANAGEMENT
6.1 MANAGING THE ENGINE
The development and production engines shall be managed by the Informatica support Team. Information regarding planned downtime/ upgrades shall be provided to the user community from time to time.
6.2 RESTARTING THE ENGINE
A mail shall be sent to the user community regarding the re-starting of the engine and after the engine has been brought up this confirmation will be sent so that users can double check status of their sessions. If the sessions have not been scheduled properly the uses should inform the Informatica support team.
7.0 BEST PRACTICES
7.1 NAMING STANDARDS
7.1.1 ChallengeDefine standards to be used during development in Informatica
7.1.2 Description
FoldersFolders are a collection of mappings, sources, targets, sessions, and batches.
Syntax:ProjectName_phase
Description:
Phase ‘DEV’ - Development‘SIT’ - Integration Testing‘UAT’ - Acceptance Testing‘PROD’ - Production
ProjectName Acronym of Group Project
Note: not all phases may be required by each development group. Additional folders can be created to meet the testing needs of the development teams.
PortsPorts are another name for fields. There are many kinds of Ports: Input, Output, Variable, Lookup etc.
Variable port names begin with the ‘v_’ prefix. Output ports that have been added during coding should begin with ‘o_’ prefix
Version 1.00 Page 9 of 19
Informatica Cookbook
All other port names are at the discretion of the programming team.
TransformsThe names of these objects should describe what the transform does. Be as clear and concise as possible. Prefixes are:
exp_ - Expressionsjnr_ - Joinersfil_ - Filterslkp_ - Lookupsagg_ - Aggregatorsseq_ - Sequence Generatorsq_ - Source Qualifierupd_ - Update Strategysp_ - Stored Procedurenrm_ - Normlizerrnk_ - Rankrtr_ - Routerxsq_ - XML Source qualifiersrt_ - Sorter
Sources and Targets
For databases tables, default Source and Target names are derived from the ODBC data source name and the table name/view name of the object in the DBMS.
For files, default Source names are derived from FLATFILE:name of file.
Mappings
There are no standards for this category of object. However, it is strongly suggested NOT to use the default name. It is suggested that all mappings begin with the letter m.
Sessions/Batches and workflows
Sessions and Batches are the descriptive components that wrap the mappings and provide the detail regarding how, when and with what sources/targets to use during a mapping execution.
Syntax :Qualifier_Batch/SessionName
Description:Qualifier - ‘s’ for Session
‘b’ for Batch‘wf’ for workflow‘wl’ for worklet
Batch/SessionName - Free form text, usually the Mapping Name without the prefix ‘m’.
Database Connections at the Server
Version 1.00 Page 10 of 19
Informatica Cookbook
The PowerMart™ engine requires database connections on the machine the engine is running. In order to establish clear connection names the following standard should be used:
For Oracle Connections:
Syntax:database_LogonID
Description:
database - The Oracle SchemaLogonID - The user id to use when logging into the source/target
Example:CAP1_powerm
For Sybase Connections:
Syntax:server_database_LogonID
Description:Server - The server nameLogonID - The user id to use when logging into the source/target
Example:dbp1_powerm
For MS-SQLServer Connections:
Syntax:Server_Database_LogonID
Description:Database - The Database nameLogonID - The user id to use when logging into the source/target
Example:dbp1_powerm
External loader at the Server
The PowerMart™ engine requires external loader on the machine the engine is running to use bulk loading utilities to load data to databases. In order to establish clear loader names the following standard should be used:
For Oracle loader:
Syntax:SQLLDR_Schema_LogonID
Description:
Version 1.00 Page 11 of 19
Informatica Cookbook
Schema - The Oracle SchemaLogonID - The user id to use when logging into the source/target
Example:CAP1_powerm
7.2 TEMPLATES
7.2.1 ChallengeDevelop a method by which the code in Informatica can be documented so that it is easy for development and transitioning to a support team.
7.2.2 Description
A template document has been created to document the logic in the Informatica transforms. This document will be a master list of all activities to be done. One template document will be created for every mapping. The template document consists of the following sections
SetupThis section would contain the details of source and target, the intermediate data elements and any comments at the template level.
Process over viewThis section would consist of the pictorial representation of the mapping for clarifying the data flow. Target to source mappingThis section would have details on transformations to be done between the source and the target fields. These transformations would be mapped with respect to each target field.
Error handlingThis section would contain the error conditions and the actions to be taken for each of the error conditions.
Re-start and RecoveryThis section would detail the restart and recovery strategy in cases of failure.
Setup
Setup has the following details
# Name Description1. Mapping Name The name of the mapping document. 2. Description Any detailed description found necessary for the
document.3. Source Details source for the mapping4. Target Details the target for the mapping
Version 1.00 Page 12 of 19
Informatica Cookbook
5. Initial Rows The average number of records expected to be processed; this will be used for database size estimation and load window.
6. Load Frequency The frequency of loads, this could be daily, weekly, monthly etc.
7. Load Window The time period during which the upload will take place
8. Pre-processor The activities to be done before processing the transformations. Any specific checks will have to be added here.
9. Post Processing The activities after the transformation process are complete. Any specific checks will have to be added here.
10. Remarks Any remarks applicable at the Mapping level.
Sources1. Tables The source table name, the schema/owner name
and any filter condition to be applied for the table. If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column.
2. File The source file name, the location of the file, the file type, the file format, relationship between various files and information regarding presence of header and footer.
Target1. Tables The target table name, the schema/owner name
If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column.
2. File The target file name, the location of the file, the file type, the file format, relationship between various files and information regarding presence of header and footer.
Lookups1. Look up name The name of the lookup.2. Lookup Table The source of data3. Table Owner The owner of the table4. Lookup Columns The columns that are to be included in the
lookup5. Filter The condition to be applied to the data to be
fetched from the table6. Comments The context of usage of the lookup
Source to target mapping
# Name Description1. Target Table name The table name of the ODS table2. Target field name Field name in the target field3. Target datatype The datatype of the Target field
Version 1.00 Page 13 of 19
Informatica Cookbook
4. Target mandatory To indicate if the field is mandatory5. Default value The default value if field is null6. Source Table/File name The table/file name of the source7. Source field name Field name in the source field8. Comments and detailed
transformationsThe details of all transformations to be done
Error HandlingAny specific error handling needs can be specified in this section of the template.
Re-start and RecoveryAny recovery needs of the mapping should be described in this section. If any special script needs to be run or data needs to be deleted before re-running a session it should be described here.
7.3 USAGE OF CONNECTION OBJECTS
7.3.1 ChallengeDefine and Use connection objects like database connectors, FTP connections and external loader connections so that redundancies are eliminated and management of these objects becomes easy.
7.3.2 Description When connecting to the database the administrative user should not be used, an
application specific batch user should be used The naming convention to be followed is as specified in the naming convention
section 7.1 The name of the connection object in QA and production should be the same When using the external loader, for the external loader executable name instead
of using /webstatmmk1/oracle/product/9.2.0.2/bin/sqlldr use the shell script /webstatmmk1/ia/pm47/sh_load or /webstatmmk1/ia/pm47/ sh_load_parallel_direct
7.4 FAILURE SCRIPTS
7.4.1 Challenge
Develop a mechanism by which errors can be tracked and comprehended
7.4.2 Description
Implementation of the failure script
Failure Scripts in Informatica 5.1
Failure Scripts in Informatica 6.2
Version 1.00 Page 14 of 19
Informatica Cookbook
General guidelines – from Failure perspective
All sessions should have a failure call in the post processing If there is a requirement to call an SQL block before or after a session it is better
to write it as a stored Procedure and call it than writing an SQL block It would be a good practice to call the stored procedure as a part of the mapping
than calling it in a shell script Run if previous successful should be set for every session so as to avoid run away
sessions. Fail parent if session fails property should be checked in every session when
coding in Informatica 6.2 The limit of number of acceptable errors should always be set. It should
preferably be 1000.
7.5 TRUNCATING DATA
7.5.1 Challenge
Truncate data before loading, when an application user is being used to connect to the database.
7.5.2 Description
If existing data needs to be truncated and re-loaded then a procedure should be written in Oracle to truncate the data instead of setting the property at the target as truncate before load. By this method data can be truncated even when the Informatica sessions are connecting to the database using a non DBA user. A sample of the procedure is as given under. Only batch id’s should have the access to execute this proc.
This procedure can then be called from Informatica within the mapping or in the preprocessing using a shell script.
PROCEDURE TruncateTable (p_tname in varchar2, p_towner in varchar2)is
v_ddl_line varchar2(1000) ;begin
v_ddl_line := 'truncate table '||p_towner||'.'||p_tname||' drop storage' ;
execute immediate v_ddl_line ;
exception when others then dbms_output.put_line('Error : '||to_char(SQLCODE)||' '||SQLERRM);end;
7.6 BUILT-IN RE-STARTABILITY
Version 1.00 Page 15 of 19
Informatica Cookbook
7.6.1 ChallengeDesign sessions such that the support and maintenance effort is low
7.6.2 Description
Sessions should be created with built in re-startability. Incase of failure it should be easy to re-start from the point of failure.
Incase aggregates are being populated data should be first deleted for the period for which data is being inserted before actually inserting the data.
Tasks should be broken into different sessions that calling all scripts as a part of one session. By this if a given script fails then re-starting would be easy.
Version 1.00 Page 16 of 19
Informatica Cookbook
7.7 PROJECT DIRECTORY STRUCTURE IN UNIX
7.7.1 ChallengeDefine a standard for organization of directories in Unix
7.7.2 Description
All examples are for a project named sample.
Following directories should be created inside the home directory for each project Bin – Directory for all the scripts used in the project (E.g.
/webstatmmk1/post/sample/bin) Env – Directory for parameter and environment settings files(E.g.
/webstatmmk1/post/sample/env) Incoming – Directory where the files that act as the source for the project
should reside (E.g. /webstatmmk1/post/sample/incoming) Outgoing – Directory where the output files created by various processes
should reside (E.g. /webstatmmk1/post/sample/outgoing) Temp – Directory for temporary files created by various processes, the
bad files and lookup cache files created by Informatica should also reside in this directory (E.g. /webstatmmk1/post/sample/temp)
Log – Directory for the log files generated by various processes in the project. The Informatica log files should be saved into this directory (E.g. /webstatmmk1/post/sample/log)
Archive – Directory for storing files that need to be archived as a part of the project (E.g. /webstatmmk1/post/sample/archive)
The Directory where the log files are stored should be added to the script in the crontab that checks for the # of errors and warnings in Informatica log files so that it would become easy to track sessions with many errors/warnings.
7.8 PARAMETERIZATION OF SESSION INFORMATION
7.8.1 Challenge Session information should be parameterized as far as possible so that migration of code between dev/qa and production can be done with minimum changes. The log files/bad files target files etc can be separated for each application so that they don’t affect each other.
7.8.2 Description
The session information that can be parameterized is
Srl. # Session Information1. Session log file Directory/name2. Source database connector3. Source file directory/name4. Target database connector
Version 1.00 Page 17 of 19
Informatica Cookbook
5. Target file directory/name6. Reject File directory/name7. $Source connection value in the
properties tab8. $Target connection value in the
properties tab
A sample parameter file
[SAMPLE.s_m_first_sample_session] $PMSessionLogFile=/webstatmmk1/post/sample/log/ s_m_first_sample_session.log $DBConnection_sample_source=sample_source $DBConnection_sample_target=sample_target $RejectFile_sample=/webstatmmk1/post/sample/temp $TargetFileDir_test=/webstatmmk1/post/sample/outgoing $SrcFileDir_test=/webstatmmk1/post/sample/incoming
Parameter file header
The header should be FolderName.SessionName, the folder name is not required but it is advised to add the same.
Session log file Directory/name
The session log file name and directory can be parameterized, if only the file name needs to be parameterized then the property “Session Log File Name” needs to set to $PMSessionLogFile. If the log file name and directory needs to be parameterized then the property “Session Log File directory“ should be left blank and then the property “Session Log File Name” should be set to $PMSessionLogFile.
Database connection
The source and target database connection information can be parameterized.
Source/Target/reject File Directory/Name
The Source/Target or Reject file names can be parameterized. If only the file name needs to be changed to $TargetFileDir_test and the value for the parameter can be set to a different file name. If the file as well as the directory needs to be changed then the property “Output file directory” should be left blank and in the file name should be populated as $TargetFileDir_test.
Session Information that cannot be parameterized using a value in the parameter file
1. Information in the transformation taba. Lookup and Stored proc connection information
i. The $Source and $Target that is defined in the properties tab can be used for the lookup and the stored proc connection information
b. Cache file locationi. Unix soft links should be used so that the same string can be
Version 1.00 Page 18 of 19
Informatica Cookbook
used in Development/QA and Production2. Parameter Filename in the properties tab, an exception being if the session is
being scheduled by pmcmd. When using pmcmd the parameter file name is taken as an input parameter.
Version 1.00 Page 19 of 19