Informatica shortcuts

25
SORTER __ sorting the tables in ascending or descending and aslo to obtain Distinct records. RANK __ Top or bottom 'N' analysis . JOINER __ Join two different sources cmng from different and same location . FILTER __ filters the rows that do not meet the condition. ROUTER __ It is useful to test multiple conditions . AGGREGATOR __ To perform group calculation such as count , max , min , sum , avg (mainly to perform calculation or multiple rows or group) NORMALIZER __ Reads cobol files ( denormalized format). Split a single row into multiple rows. SOURCE QUALIFIER __ It performs many tasks such as override default sql query , filtering records , join data from two or more table etc Represents the flatfile or relational data. UNION __ It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. EXPRESSION __ You can use the Expression transformation to calculate values in a single row before you write to the target. LOOK UP __ Use a Lookup transformation in a mapping to look up data in a flat file or a relational table, view, or synonym. STORED PROCEDURE __ stored procedures to automate tasks that are too complicated for standard SQL statements.You can call by using Stored Procedure Transformation. XML SOURCE QUALIFIER __ When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation.

Transcript of Informatica shortcuts

Page 1: Informatica shortcuts

SORTER __ sorting the tables in ascending or descending and aslo to obtain

Distinct records.

RANK __ Top or bottom 'N' analysis .

JOINER __ Join two different sources cmng from different and same location .

FILTER __ filters the rows that do not meet the condition. ROUTER __ It is useful to test multiple conditions .

AGGREGATOR__ To perform group calculation such as count , max , min , sum , avg

(mainly to perform calculation or multiple rows or group)

NORMALIZER __ Reads cobol files ( denormalized format).

Split a single row into multiple rows.

SOURCE QUALIFIER

__ It performs many tasks such as override default sql query , filtering records , join data from two or more table etcRepresents the flatfile or relational data.

UNION __ It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.

EXPRESSION__ You can use the Expression transformation to calculate values in a

single row before you write to the target.

LOOK UP __ Use a Lookup transformation in a mapping to look up data in a flat file

or a relational table, view, or synonym.

STORED PROCEDURE

__ stored procedures to automate tasks that are too complicated for standard SQL statements.You can call by using Stored Procedure Transformation.

XML SOURCE QUALIFIER

__ When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation.

UPDATE STRATEGY __ To flag rows for insert, delete, update, or reject..

Page 2: Informatica shortcuts

ARCHITECTURE Data Warehouse Architecture

STAR SCHEMA

Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summarizing, drill-downs and specifying criteria. Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de-normalized (second normal form).

Fact table

The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query response times. The fact table typically contains records that are ready to explore, usually with ad hoc queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data warehouse environment.The primary key for the fact table is a composite of all the columns except numeric values / scores (like QUANTITY, TURNOVER, exact invoice date and time).

Typical fact tables in a global enterprise data warehouse are (apart for those, there may be some company or business specific fact tables):

Page 3: Informatica shortcuts

sales fact table - contains all details regarding sales orders fact table - in some cases the table can be split into open orders and historical orders. Sometimes the values for historical orders are stored in a sales fact table. budget fact table - usually grouped by month and loaded once at the end of a year. forecast fact table - usually grouped by month and loaded daily, weekly or monthly. inventory fact table - report stocks, usually refreshed daily

Dimension table

Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily. The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. In a star schema design, there is only one de-normalized table for a given dimension.

Typical dimension tables in a data warehouse are:

time dimension table customers dimension table products dimension table key account managers (KAM) dimension table sales office dimension table Star schema example

An example of a star schema architecture is depicted below.

SNOWFLAKE SCHEMA

Snowflake schema architecture is a more complex variation of a star schema design. The main difference is that dimensional tables in a snowflake schema are normalized, so they have a typical relational database design.

Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains millions of rows, the use of snowflake schemas should significantly improve performance by moving out some data to other table (with BRANDS for instance).

Page 4: Informatica shortcuts

The problem is that the more normalized the dimension table is, the more complicated SQL joins must be issued to query them. This is because in order for a query to be answered, many tables need to be joined and aggregates generated.

An example of a snowflake schema architecture is depicted below.

GALAXY SCHEMA For each star schema or snowflake schema it is possible to construct a fact constellation schema. This schema is more complex than star or snowflake architecture, which is because it contains multiple fact tables. This allows dimension tables to be shared amongst many fact tables.That solution is very flexible, however it may be hard to manage and support.

The main disadvantage of the fact constellation schema is a more complicated design because many variants of aggregation must be considered.

In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level.

Use of that model should be reasonable when for example, there is a sales fact table (with details down to the exact date and invoice header id) and a fact table with sales forecast which is calculated based on month, client id and product id.

In that case using two different fact tables on a different level of grouping is realized through a fact constellation model.

Page 5: Informatica shortcuts

Source System

A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping

The definition of the relationship and data flow between source and target objects.

Meta data

Data that describes data and other structures, such as objects, business rules, and processes. For example, the

schema design of a data warehouse is typically stored in a repository as meta data, which is used to generate scripts

used to build and populate the data warehouse. A repository contains meta data.

Staging Area

A place where data is processed before entering the warehouse.

Cleansing

The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL

process.

Transformation

The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Page 6: Informatica shortcuts

Transportation

The process of moving copied or transformed data from a source to a data warehouse.

Target System

A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data

warehouse. Figure 1.12 : Sample ETL Process Flow

Informatica

Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration

software and ETL softwares.

The important Informatica Components are:

Power Exchange Power Center

Power Center Connect Power Channel

Metadata Exchange Power Analyzer

Super Glue

In Informatica, all the Metadata information about source systems, target systems and transformations are stored in

the Informatica repository. Informatica's Power Center Client and Repository Server access this repository to store and

retrieve metadata.

Note: To know more about Metadata and its significance, please click here.

Source and Target

Consider a Bank that has got many branches throughout the world. In each branch data may be stored in different source

systems like oracle, sql server, terradata, etc. When the Bank decides to integrate its data from several sources for

its management decisions, it may choose one or more systems like oracle, sql server, terradata, etc. as its data

warehouse target. Many organisations prefer Informatica to do that ETL process, because Informatica is more powerful

in designing and building data warehouses. It can connect to several sources and targets to extract meta data from

sources and targets, transform and load the data into target systems.

Page 7: Informatica shortcuts

Guidelines to work with Informatica Power Center

Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client

and the Repository Server would access this repository to retrieve, store and manage metadata.

Power Center Client: Informatica client is used for managing users, identifiying source and target systems

definitions, creating mapping and mapplets, creating sessions and run workflows etc.

Repository Server: This repository server takes care of all the connections between the repository and the Power

Center Client. Power Center Server: Power Center server does the

extraction from source and then loading data into targets. Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata

from source systems.

Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation

about the flow of data from source to target.

Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer

itself.

Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's

address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data

cleansing process and standardized data can be loaded in target systems (data warehouse).

Transformation: Transformations help to transform the source data according to the requirements of target

system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.

Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the

fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this,

workflows can be created to ensure the correct flow of data from source to target.

Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center

Server.

Page 8: Informatica shortcuts

Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's

MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's

MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Informatica

Power Exchange

Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data

by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed

data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and

windows systems.

Power Channel

This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer

files over FTP, etc.

Meta Data Exchange

Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have spent

much time and effort in creating the data model's data structures(tables, columns, data types, procedures,

functions, triggers etc). By using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for informatica developer to create these

data structures once again.

Power Analyzer

Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and

sharing enterprise data simple and easily available to

Page 9: Informatica shortcuts

decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.

With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a

dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database

that do not conform to the dimensional model.

Super Glue

Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this

superglue to analyze meta data.

Power Mart

Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports

global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when

compared to Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer

productivity through a codeless environment.

Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on Informatica in near future.

Informatica - Transformations

In Informatica, Transformations help to transform the source data according to the requirements of target system and it

ensures the quality of the data being loaded into target.

Transformations are of two types: Active and Passive.

Active Transformation

An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows

that do not meet the condition in transformation.

Passive Transformation

A passive transformation does not change the number of

Page 10: Informatica shortcuts

rows that pass through it i.e it passes all rows through the transformation.

Transformations can be Connected or UnConnected.

Connected Transformation

Connected transformation is connected to other transformations or directly to target table in the mapping.

UnConnected Transformation

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

List of Transformations

Following are the list of Transformations available in

PowerCenter:

Aggregator Transformation Expression Transformation

Filter Transformation Joiner Transformation Lookup Transformation

Normalizer Transformation Rank Transformation Router Transformation

Sequence Generator Transformation Stored Procedure Transformation

Sorter Transformation Update Strategy Transformation

XML Source Qualifier Transformation Advanced External Procedure Transformation

External Transformation Union Transformation

In the following pages, we will explain all the above Informatica Transformations and their significances in the

ETL process in detail.

Aggregator Transformation

Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform

calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to

calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG,

Page 11: Informatica shortcuts

FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.

Expression Transformation

Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to

calculate discount of each product or to concatenate first and last names or to convert date to a string field.

Filter Transformation

Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and

$1000.

Joiner Transformation

Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming

from two different locations or from same location. For example, to join a flat file and a relational source or to join

two flat files or to join a relational source and a XML source. In order to join two sources, there must be at least one matching port. While joining two sources it is a must to

specify one source as master and the other as detail. The Joiner transformation supports the following types of joins:

Normal Master Outer Detail Outer Full Outer

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.

Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source

and the matching rows from the master source.

Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It

discards the unmatched rows from the detail source.

Full outer join keeps all rows of data from both the master and detail sources.

Page 12: Informatica shortcuts

Lookup Transformation

Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up

data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in

another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data

for the product, with ID 10 in sales table.

Difference between Connected and UnConnected Lookup Transformation:

Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.

Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and

returns one column from each row.

Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined

values..

Normalizer Transformation

Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also,

Normalizer transformation can be used to create multiple rows from a single row of data.

Rank Transformation

Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.

Router Transformation

Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter

transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It

has input, output and default groups. For example, if we

Page 13: Informatica shortcuts

want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s

easy to route data to different tables..

Sequence Generator Transformation

Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique

primary key values or cycle through a sequential range of numbers or to replace missing keys.

It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add

ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or

target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.

Stored Procedure Transformation

Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to

automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the

space in database, a specialized calculation etc.

The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored

procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored

Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional

statements. In case of stored procedure transformation procedure will be compiled and executed in a relational data source. You need data base connection to import the stored

procedure in to your maping

Sorter Transformation

Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or

descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the

output rows should be distinct.

Source Qualifier Transformation

Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source

Page 14: Informatica shortcuts

Qualifier transformation. The Source Qualifier performs the various tasks such as overriding default SQL query, filtering

records; join data from two or more tables etc.

Update Strategy Transformation

Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either

to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update,

delete or data driven.

XML Source Qualifier Transformation

XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that

the Informatica Server reads when it executes a session with XML sources.

Advanced External Procedure Transformation

Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction

with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It

is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows

to be processed before emitting any output rows.

Union Transformation

The union transformation is used to merge multiple datasets from various streams or pipelines into one dataset. This

transformation works similar to the UNION ALL, it does not remove any duplicate rows. It is recommended to use

aggregator to remove duplicates are not expected at the target.

External Procedure Transformation

External Procedure transformation is an Active and Connected/UnConnected transformations. Sometimes, the

standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex

functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression

Page 15: Informatica shortcuts

transformations in a mapping.

Differences between Advanced External Procedure and External Procedure Transformations:

External Procedure returns single value, where as Advanced External Procedure returns multiple values.

External Procedure supports COM and Informatica procedures where as AEP supports only Informatica

Procedures