a Etl Intro

download a Etl Intro

of 32

Transcript of a Etl Intro

  • 8/2/2019 a Etl Intro

    1/32

    Global Business Services

    2007 IBM Corporation

    Informatica ETL Tool

    - By Srini VeeravalliSiebel Analytics Factory

    26th Sept, 2007

  • 8/2/2019 a Etl Intro

    2/32

    Global Business Services

    2007 IBM Corporation2

    Synopsis:

    Introduction to Informatica 7x

    Importance of Informatica for ETL Applications Informatica Architecture

    Informatica Client module

    Informatica Server

    Transformations in Informatica

    Working with Workflow Manager

    Working with Workflow Monitor

    Demonstration of Sample Informatica Mapping

    Informatica - ETL Tool

  • 8/2/2019 a Etl Intro

    3/32

    Global Business Services

    2007 IBM Corporation3

    Informatica PowerCenter Informatica PowerMart

    All product functionality,

    including the ability to registermultiple servers.

    A PowerCenter lets youcreate a single repository thatyou can configure as a global

    repository.

    Includes all features except

    distributed metadata, multipleregistered servers.

    Informatica Suite

  • 8/2/2019 a Etl Intro

    4/32

    Global Business Services

    2007 IBM Corporation4

    Extraction, Transformation, Loading

    Can extract large volumes of data from multipleplatforms, handle complex transformations on the data, andsupport high-speed loads.

    It can simplify and accelerate the process of moving data

    warehouses from development to test to production.

    Importance of Informatica for ETL Applications

  • 8/2/2019 a Etl Intro

    5/32

    Global Business Services

    2007 IBM Corporation5

    Sources

    PowerCenter and PowerMart access the following sources:

    Relational.

    File.

    Application.

    Mainframe.

    Other. Microsoft Excel and Access.

    Informatica Architecture

  • 8/2/2019 a Etl Intro

    6/32

    Global Business Services

    2007 IBM Corporation6

    Targets

    PowerCenter and PowerMart can load data into the followingtargets:

    Relational.

    File.

    Application.

    Other. Microsoft Access.

    You can load data into targets using ODBC or native drivers, FTP, orexternal loaders.

    Informatica Architecture

  • 8/2/2019 a Etl Intro

    7/32

    Global Business Services

    2007 IBM Corporation7

    Informatica Architecture

    Informatica provides the following integrated components:

    Informatica repository. The Informatica repository is at the center of theInformatica suite. The Informatica Client and Server access the repository tosave and retrieve metadata.

    Informatica Repository Server. The Informatica Repository Servermanages connections to the repository from client applications.

    Informatica Client. Use the Informatica Client to manage users, definesources and targets, build mappings and mapplets with the transformationlogic.

    Informatica Server. The Informatica Server extracts the source data,performs the data transformation, and loads the transformed data into thetargets.

  • 8/2/2019 a Etl Intro

    8/32

    Global Business Services

    2007 IBM Corporation8

    Informatica Architecture

    Server

    Sources Target

    Repository

    Repository

    ManagerDesigner

    WorkflowManager/MonitorInformatica Client

    Source Analyzer

    Warehouse Designer

    TransformationDevelope

    r

    Mapping Designer

    Mapplet Designer

  • 8/2/2019 a Etl Intro

    9/32

    Global Business Services

    2007 IBM Corporation9

    Connectivity

  • 8/2/2019 a Etl Intro

    10/32

    Global Business Services

    2007 IBM Corporation10

    Informatica Client

    Repository Manager. Use the Repository Manager to create and

    administer the metadata repository

    Designer. Use the Designer to create mappings that containtransformation instructions for the Informatica Server.

    Source Analyzer. Import or create source definitions.

    Warehouse Designer. Import or create target definitions. Transformation Developer. Develop reusable transformations to use inmappings.

    Mapplet Designer. Create sets of transformations to use in mappings.

    Mapping Designer. Create mappings that the Informatica Server uses toextract, transform, and load data.

    Workflow Manager. Use the Workflow Manager to create, schedule, andrun workflows.

    Workflow Monitor. Use the Workflow Monitor to monitor scheduled and

    running workflows for each Informatica Server.

  • 8/2/2019 a Etl Intro

    11/32

    Global Business Services

    2007 IBM Corporation11

    Informatica Client Rep Server Admin Console

  • 8/2/2019 a Etl Intro

    12/32

    Global Business Services

    2007 IBM Corporation12

    Use the Administration Console to add repository configurations to theConsole Tree.

    When you add a repository configuration, you can perform the followingactions:Create a repository in a database.

    Change the Repository Server managing the repository.

    Upgrade an existing repository from an earlier version.

    Informatica Client Rep Server Admin Console

  • 8/2/2019 a Etl Intro

    13/32

    Global Business Services

    2007 IBM Corporation13

    Informatica Client Repository Manager

  • 8/2/2019 a Etl Intro

    14/32

    Global Business Services

    2007 IBM Corporation14

    Repository Manager TasksYou can use the Repository Manager to perform the following tasks:

    Add a repository. You can add multiple repositories.

    Remove a repository. You can remove one or more repositories.

    Connect to a repository. You can connect to one repository or multiplerepositories in a domain.

    Export and import repository connection information. You can exportrepository connection information from the client registry to a file.

    Truncate session and workflow log entries. You can truncate the list of

    session and workflow logs.

    Search for target definitions containing a keyword. You can use a keywordto search for a target definition.

    Search for repository objects. You can search for repository objectscontaining specified text or keywords.

    Informatica Client Repository Manager

  • 8/2/2019 a Etl Intro

    15/32

    Global Business Services

    2007 IBM Corporation15

    The Designer has five tools to help you build mappings and mapplets so youcan specify how to move and transform data between sources and targets.The Designer helps you create source definitions, target definitions, andtransformations to build your mappings.

    The Designer allows you to work with multiple tools at one time and to workin multiple folders and repositories at the same time.

    Designer Tools

    The Designer provides the following tools:

    Source Analyzer. Use to import or create source definitions for flatfile, XML, COBOL, Application, and relational sources.Warehouse Designer. Use to import or create target definitions.Transformation Developer. Use to create reusable transformations.Mapplet Designer. Use to create mapplets.Mapping Designer. Use to create mappings.

    Informatica Client Designer

  • 8/2/2019 a Etl Intro

    16/32

    Global Business Services

    2007 IBM Corporation16

    Informatica Client Designer Source Analyzer

  • 8/2/2019 a Etl Intro

    17/32

    Global Business Services

    2007 IBM Corporation17

    Importing Source / Target Definition from Data Base

    Informatica Client Designer Source Analyzer

  • 8/2/2019 a Etl Intro

    18/32

    Global Business Services

    2007 IBM Corporation18

    Informatica Client Designer Warehouse Designer

    G B S

  • 8/2/2019 a Etl Intro

    19/32

    Global Business Services

    2007 IBM Corporation19

    Informatica Client Designer Mapping Designer

    Output

    Gl b l B i S i

  • 8/2/2019 a Etl Intro

    20/32

    Global Business Services

    2007 IBM Corporation20

    Designer WindowsThe Designer consists of the following windows:

    Navigator. Use to connect to and work in multiple repositories andfolders. You can also copy and delete objects and create shortcuts usingthe Navigator.

    Workspace. Use to view or edit sources, targets, mapplets,transformations, and mappings. You can work with a single tool at a timein the workspace.

    Status bar. Displays the status of the operation you perform.

    Output. Provides details when you perform certain tasks, such assaving your work or validating a mapping.

    Overview. An optional window to simplify viewing workbooks

    containing large mappings or a large number of objects.

    Informatica Client Designer Windows

    Gl b l B i S i

  • 8/2/2019 a Etl Intro

    21/32

    Global Business Services

    2007 IBM Corporation21

    Transformations

    Transformations are the manipulation of data from how itappears in the source system(s) into another form in the datawarehouse.

    This includes

    Data merging: Process of standardizing data types and fields.

    Cleansing: This involves identifying any changinginconsistencies or inaccuracies.

    Eliminating inconsistencies in the data from multiple sources.

    Converting data from different systems into single consistentdata set suitable for analysis.

    Aggregation: The process where by multiple detailed values arecombined into a single summary value typically summationnumbers representing dollars spend or units sold.

    Gl b l B i S i

  • 8/2/2019 a Etl Intro

    22/32

    Global Business Services

    2007 IBM Corporation22

    Transformations in Informatica

    Aggregator Transformation: The Aggregator transformation allows you toperform aggregate calculations, such as averages and sums.

    Expression Transformation: Expression transformation to calculate values in asingle row before you write to the target. You can use the Expression transformation toperform any non-aggregate calculations.

    Advanced External Procedure Transformations : Advanced External Proceduretransformations operate in conjunction with procedures you create outside of theDesigner interface to extend PowerCenter/PowerMart functionality.

    External Procedure Transformations : External Procedure transformationsoperate in conjunction with procedures you create outside of the Designer interface toextend PowerCenter/PowerMart functionality.

    Filter Transformation : Filter transformation provides allows you to filter rows in amapping.

    Rank Transformation: Allows you to select only the top or bottom rank of data.

    Router Transformation: Is similar to Filter transformation but it used for two ormore filter conditions.

    Global B siness Ser ices

  • 8/2/2019 a Etl Intro

    23/32

    Global Business Services

    2007 IBM Corporation23

    Joiner transformation : Joiner transformation joins two related heterogeneous sources

    residing in different locations or file systems. The combination of sources can be varied. Youcan use the following sources:

    Two relational tables existing in separate databases

    Two flat files in potentially different file systems

    Two different ODBC sources

    Two instances of the same XML source A relational table and a flat file source

    A relational table and an XML source

    Lookup transformation:Lookup transformation in your mapping to look up data in a relational table, view, orsynonym.

    Normalizer Transformation: Normalization is the process of organizing data. use theNormalizer transformation with COBOL sources, which are often stored in a denormalizedformat

    Sequence Generator Transformation: Is used for generates numeric values

    Stored Procedure Transformation: Is an important tool for populating and maintaining

    databases.

    Transformations in Informatica

    Global Business Services

  • 8/2/2019 a Etl Intro

    24/32

    Global Business Services

    2007 IBM Corporation24

    Transformations in Informatica

    Sorter transformation Transformation :Sorter transformation allows you to sort data. You can sort data from asource transformation in ascending or descending order according to aspecified sort key.

    Source Qualifier Transformation :Source Qualifier represents the rows that the Informatica Server reads whenit executes a session. The Source Qualifier displays the transformationdatatypes. The transformation datatypes in the Source Qualifier determinehow the source database binds data when the Informatica Server reads it.

    XML Source Qualifier Transformation:When you add an XML source definition to a mapping, you need to connect it

    to an XML Source Qualifier transformation.

    Update Strategy Transformation:To update the target based on the flag values.

    Global Business Services

  • 8/2/2019 a Etl Intro

    25/32

    Global Business Services

    2007 IBM Corporation25

    Work Flow Manager

    Global Business Services

  • 8/2/2019 a Etl Intro

    26/32

    Global Business Services

    2007 IBM Corporation26

    Workflow Manager ToolsThe Workflow Manager consists of three tools to help you develop aworkflow:

    Task Developer. Use the Task Developer to create tasks you want to

    execute in the workflow.

    Workflow Designer. Use the Workflow Designer to create a workflowby connecting tasks with links. You can also create tasks in the WorkflowDesigner as you develop the workflow.

    Worklet Designer. Use the Worklet Designer to create a Worklet.

    Work Flow Manager

    Global Business Services

  • 8/2/2019 a Etl Intro

    27/32

    Global Business Services

    2007 IBM Corporation27

    Work Flow Manager Server Editor

    Global Business Services

  • 8/2/2019 a Etl Intro

    28/32

    Global Business Services

    2007 IBM Corporation28

    The Informatica Server moves data from sources to targets based onworkflow and mapping metadata stored in a repository.

    A session is a type of workflow task. A session is a set of instructionsthat describes how to move data from sources to targets using amapping.

    When a workflow starts, the Informatica Server retrieves mapping,workflow, and session metadata from the repository to extract data fromthe source, transform it, and load it into the target.

    The Informatica Server uses the following processes to run aworkflow:

    The Load Manager process. Starts and locks the workflow, runsworkflow tasks, and starts the DTM to run sessions.

    The Data Transformation Manager (DTM) process. Performssession validations. Creates threads to initialize the session, read, write,and transform data.

    Informatica Server

    Global Business Services

  • 8/2/2019 a Etl Intro

    29/32

    Global Business Services

    2007 IBM Corporation29

    Work flow Monitor

    Navigator Window Time window Out put window

    Global Business Services

  • 8/2/2019 a Etl Intro

    30/32

    Global Business Services

    2007 IBM Corporation30

    Workflow Monitor is a tool that allows you to monitor workflows and tasks.You can view details about a workflow. You can run, stop, abort, and resumeworkflows from the Workflow Monitor.

    The Workflow Monitor consists of the following windows:Navigator window. Displays monitored repositories, servers, andrepositories objects.

    Output window. Displays messages from the Informatica Server and theRepository Server.

    Time window. Displays progress of workflow runs.

    Gantt Chart view. Displays details about workflow runs in chronological(Gantt Chart) format.

    Task view. Displays details about workflow runs in a report format,organized by task, folder, or status.

    Work flow Monitor

    Global Business Services

  • 8/2/2019 a Etl Intro

    31/32

    Global Business Services

    2007 IBM Corporation31

    Questions ? ???

    Global Business Services

  • 8/2/2019 a Etl Intro

    32/32

    Global Business Services

    2007 IBM C ti32

    Thank You