WorkShop BI Adquisition

download WorkShop BI Adquisition

of 90

Transcript of WorkShop BI Adquisition

  • 8/8/2019 WorkShop BI Adquisition

    1/90

    WORKSHOPBI

    Fundamentos

    Adquisicin de datos

  • 8/8/2019 WorkShop BI Adquisition

    2/90

    Scope Part 2.

    The first lesson describes the flow of data between BIand source systems that contain data.

    The second lesson shows the procedure for loadingmaster data (attributes and texts) from an SAP system.

    On the third lesson we will discuss the data transferprocess with more complexity and more details. We willdiscuss the available transformation rule types andmore advanced start and end routines. In addition, wewill visualize our data in the InfoCube upon completion.

  • 8/8/2019 WorkShop BI Adquisition

    3/90

    Generic Data Warehouse Positioning of the

    Data Flow

    The ETL process, sometimes called the data flow is a list of the steps that raw (source)data must follow to be extracted, transformed, and loaded into targets in the BI system

  • 8/8/2019 WorkShop BI Adquisition

    4/90

    BI Architecture: Positioning of the ETL Process

  • 8/8/2019 WorkShop BI Adquisition

    5/90

    BI Data Flow Details

  • 8/8/2019 WorkShop BI Adquisition

    6/90

    Source Systems and DataSource

    A source system is any system that is available to BI for data

    extraction and transfer purposes. Examples include mySAP ERP,

    mySAP CRM, custom system-based Oracle DB, PeopleSoft, and many

    others.

    DataSources are BI objects used to extract and stage data from

    source systems. DataSources subdivide the data provided by a

    source system into self-contained business areas. Our cost center

    example includes cost center texts, master data, and Cost CenterTransaction DataSources from two different source systems. A

    DataSource contains a number of logically-related fields that are

    arranged in a flat structure and contain data to be transferred into

    BI

  • 8/8/2019 WorkShop BI Adquisition

    7/90

    Source System Types and Interfaces

  • 8/8/2019 WorkShop BI Adquisition

    8/90

    Persistent Staging Area

    Persistent Staging Area (PSA) is an industry term, but not everyone

    agrees on an exact definition. In response to a posting on Ask the

    Experts at DMreview.com, Evan Levy defines a PSA as:

    1. The storage and processing to support the transformation of

    data.

    2. It is typically temporary.

    3. Is not constructed to support end-user or tool access.

    4. Specifically built to provide working (or scratch) space for ETL

    processing.

    (This definition comes to us from Evan Levy.s response to a posting on

    ask the experts on DMreview.com)

  • 8/8/2019 WorkShop BI Adquisition

    9/90

    BI 7.0 Transformation

    Once the data arrives in the PSA, you then to cleanse / transform it prior to physical storage in your targets.These targets include InfoObjects (master data), InfoCubes and DataStore Objects.

  • 8/8/2019 WorkShop BI Adquisition

    10/90

    Optional BIInfoSources

  • 8/8/2019 WorkShop BI Adquisition

    11/90

    InfoPackages and Data Transfer Processes 1

    The design of the data flow uses metadata objects such as

    DataSources, Transformations, InfoSources and InfoProviders. Once

    the data flow is designed, the InfoPackages and the Data Transfer

    Processes take over to actually manage the execution and

    scheduling of the actual data transfer. As you can see from the

    figure below, there are two processes that need to be scheduled.

  • 8/8/2019 WorkShop BI Adquisition

    12/90

    InfoPackages and Data Transfer

    Processes 2

    The first process is loading the data from the source system. This

    involves multiple steps that differ depending on which source system

    is involved. For example, if it is a SAP source system, a function call

    must be made to the other system, and an extractor program

    associated with the DataSource might be initiated. An InfoPackageis the BI object that contains all the settings directing exactly how this

    data should be uploaded from the source system. The target of the

    InfoPackage is the PSA table tied to the specific DataSource

    associated with the InfoPackage. In a production environment, the

    same data in the same source system should only be extracted once,with one InfoPackage; from there, as many data transfer processes

    as necessary can push this data to as many InfoProviders as

    necessary.

  • 8/8/2019 WorkShop BI Adquisition

    13/90

    InfoPackages and Data Transfer Processes

    Initiate the Data Flow

  • 8/8/2019 WorkShop BI Adquisition

    14/90

    InfoPackages and Data Transfer

    Processes 3

    The second process identified in the figure is the data transfer process.

    It is this object that controls the actual data flow (filters, update

    mode (delta or full) for a specific transformation. You might have

    more than one data transfer process if you have more than one

    transformation step or target in the ETL flow. This more complexsituation is shown below. Note if you involve more than one

    InfoProvider, you need more than one data transfer process.

    Sometime necessity drives very complex architectures.

  • 8/8/2019 WorkShop BI Adquisition

    15/90

    More Complex ETL: Multiple InfoProviders and

    InfoSource Use

  • 8/8/2019 WorkShop BI Adquisition

    16/90

    Loading SAP source system Master Data

    Scenario

  • 8/8/2019 WorkShop BI Adquisition

    17/90

    Global Transfer Routines

    Cleansing or transforming the data is accomplished in a dedicated BI

    transformation.

    Each time you want to convert incoming fields from your source system

    to InfoObjects on your BI InfoProviders, you create a dedicated

    TRANSFORMATION, consisting of one transformation rule for each

    object.

  • 8/8/2019 WorkShop BI Adquisition

    18/90

  • 8/8/2019 WorkShop BI Adquisition

    19/90

    DataSource Creation Access and the Generic

    Extractor

  • 8/8/2019 WorkShop BI Adquisition

    20/90

    Replication

    In order to access DataSources and map them to yourInfoProviders in BI, you must inform BI of the name and fieldsprovided by the DataSource. This process is calledreplication, or replicating the DataSource metadata. It is

    accomplished from the context menu on the folder where theDataSource is located. Once the DataSource has beenreplicated into BI, the final step is to activate it. As of thenewest version of BI, you can activate Business Content dataflows entirely from within the Data WarehousingWorkbench. During this process the Business Content

    DataSource Activation in the SAP source system andReplication to SAP NetWeaver BI takes place using aRemote Function Call (RFC).

  • 8/8/2019 WorkShop BI Adquisition

    21/90

    DataSource in BI After Replication

  • 8/8/2019 WorkShop BI Adquisition

    22/90

    Access Path to Create a Transformation

    In this first load process, we are trying to keep it simple. Since we added some custom global transfer logic

    directly to our InfoObject, we just need field-to-field mapping for our third step:Transformation.

  • 8/8/2019 WorkShop BI Adquisition

    23/90

    Transformation GUIMaster Data

  • 8/8/2019 WorkShop BI Adquisition

    24/90

    InfoPackage: Loading Source Data to the PSA

  • 8/8/2019 WorkShop BI Adquisition

    25/90

    Creation and Monitoring of the Data Transfer

    Process

  • 8/8/2019 WorkShop BI Adquisition

    26/90

    Complete Scenario: Transaction Load from

    mySAP ERP

  • 8/8/2019 WorkShop BI Adquisition

    27/90

    Emulated DataSources

  • 8/8/2019 WorkShop BI Adquisition

    28/90

    Issues Relating to 3.x DatasSources

  • 8/8/2019 WorkShop BI Adquisition

    29/90

    Using the Graphical Transformation GUI

  • 8/8/2019 WorkShop BI Adquisition

    30/90

    The Transformation Process: Technical

    Perspective

  • 8/8/2019 WorkShop BI Adquisition

    31/90

    Start Routine 1

  • 8/8/2019 WorkShop BI Adquisition

    32/90

    Start Routine 2

  • 8/8/2019 WorkShop BI Adquisition

    33/90

    Transformation Rules: Rule Detail

  • 8/8/2019 WorkShop BI Adquisition

    34/90

    Transformation Rules: Options and

    Features

  • 8/8/2019 WorkShop BI Adquisition

    35/90

    Transformation: Rule Groups

    A rule group is a group of transformation rules. It contains one transformation rule for each key field of the

    target. A transformation can contain multiple rule groups. Rule groups allow you to combine various rules.This means that you can create different rules for different key figures for a characteristic.

  • 8/8/2019 WorkShop BI Adquisition

    36/90

    Transformation Groups: Details

  • 8/8/2019 WorkShop BI Adquisition

    37/90

    End Routine

  • 8/8/2019 WorkShop BI Adquisition

    38/90

    Data Acquisition Layer

  • 8/8/2019 WorkShop BI Adquisition

    39/90

    Extraction using DB Connect and UD Connect

  • 8/8/2019 WorkShop BI Adquisition

    40/90

    UD Connect Extraction Highlights

  • 8/8/2019 WorkShop BI Adquisition

    41/90

    DB Connect Extraction

  • 8/8/2019 WorkShop BI Adquisition

    42/90

    Technical View of DB Connect

  • 8/8/2019 WorkShop BI Adquisition

    43/90

    XML Extraction

  • 8/8/2019 WorkShop BI Adquisition

    44/90

    XML Purchase Order Example

  • 8/8/2019 WorkShop BI Adquisition

    45/90

    XML Extraction Highlights

  • 8/8/2019 WorkShop BI Adquisition

    46/90

    Loading Data from Flat Files: Complete

    Scenario

  • 8/8/2019 WorkShop BI Adquisition

    47/90

    Flat File Sources

  • 8/8/2019 WorkShop BI Adquisition

    48/90

    Features of the BI File Adapter and File-Based

    DataSources

    Basically a DataSource based on a flat file is an object that contains all the settings necessary to load and

    parse the file when it is initiated by the InfoPackage. Some of features of the BI file adapter are listed below.

  • 8/8/2019 WorkShop BI Adquisition

    49/90

    File System DataSource: Extraction Tab

  • 8/8/2019 WorkShop BI Adquisition

    50/90

    File System DataSource: Proposal Tab

  • 8/8/2019 WorkShop BI Adquisition

    51/90

    File System DataSource: Fields tab

  • 8/8/2019 WorkShop BI Adquisition

    52/90

    File System DataSource: Preview Tab

  • 8/8/2019 WorkShop BI Adquisition

    53/90

    BI Flexible InfoSources

  • 8/8/2019 WorkShop BI Adquisition

    54/90

    A New BIInfoSource in the Data Flow

  • 8/8/2019 WorkShop BI Adquisition

    55/90

    Complex ETL: DataSource Objects and

    InfoSources

  • 8/8/2019 WorkShop BI Adquisition

    56/90

    DTP: Filtering Data

  • 8/8/2019 WorkShop BI Adquisition

    57/90

    Error Handling

    The data transfer process supports you in handling data records with

    errors. The data transfer process also supports error handling for

    DataStore objects. You can determine how the system responds if

    errors occur. At runtime, the incorrect data records are sorted and

    can be written to an error stack (request-based database table). Inaddition, another feature supports debugging bad transformations.

    It is called temporary storage.

  • 8/8/2019 WorkShop BI Adquisition

    58/90

    Error Processing

  • 8/8/2019 WorkShop BI Adquisition

    59/90

    Features of Error Processing

  • 8/8/2019 WorkShop BI Adquisition

    60/90

    More Error Handling Features

  • 8/8/2019 WorkShop BI Adquisition

    61/90

    DTP Temporary Storage Features

  • 8/8/2019 WorkShop BI Adquisition

    62/90

    Access to the Error Stack and Temporary

    Storage via the DTP Monitor

  • 8/8/2019 WorkShop BI Adquisition

    63/90

    Loading and Activation in DataStore Objects

    A standard DataStore Object has three tables. Previously, we

    described the three tables and the purpose for each, but we only

    explained that a data transfer process is used to load the first one.

    In the following section, we will examine the DataStore Object

    activation process, which is the technical term used to describe howthese tables get their data. In addition, we will look at an example

    to illustrate exactly what hapens when data is uploaded and

    subsequently activated in a DataStore Object.

    Let us assume that two requests, REQU1 and REQU2, are loaded into

    the DataStore Object. This can occur sequentially or in parallel. Theload process posts both requests into the activation queue.

  • 8/8/2019 WorkShop BI Adquisition

    64/90

    Loading Data into the Activation Queue of a

    Standard DataStore Object

  • 8/8/2019 WorkShop BI Adquisition

    65/90

    Activation Example: First Load Activated

  • 8/8/2019 WorkShop BI Adquisition

    66/90

    Activation Example: Offsetting Data Created

    by Activation Process 1

  • 8/8/2019 WorkShop BI Adquisition

    67/90

    Activation Example: Offsetting Data Created

    by Activation Process 2

    If the DataStore Object was not in the flow of data in this example,

    and the source data flowed directly to a InfoCube, the InfoCube

    would add the 10 to the 30 and get an incorrect value of 40. If,

    instead, we feed the change log data to the InfoCube, 10,-10, and

    30 add to the correct 30 value.In this example, a DataStore Object was required in the data flow

    before the InfoCube. It is not always required, but many times it is

    desired

  • 8/8/2019 WorkShop BI Adquisition

    68/90

    Integrating a New Target

  • 8/8/2019 WorkShop BI Adquisition

    69/90

    MultiProviders

    A MultiProvider is a special InfoProvider that combines data from

    several InfoProviders, providing it for reporting. The MultiProvider

    itself (like InfoSets and VirtualProviders) does not contain any data.

    Its data comes exclusively from the InfoProviders on which it is

    based. A MultiProvider can be made up of various combinations ofthe following InfoProviders:

    InfoCubes

    DataStore objects

    InfoObjects

    InfoSets

    Aggregation levels (slices of a InfoCube to support BI Integrated

    Planning)

  • 8/8/2019 WorkShop BI Adquisition

    70/90

    MultiProvider Concept

  • 8/8/2019 WorkShop BI Adquisition

    71/90

    Advantages of the MultiProvider

    Simplified design: The MultiProvider concept provides you with

    advanced analysis options, without you having to fill new and

    extremely large InfoCubes with data. You can construct simpler

    BasicCubes with smaller tables and with less redundancy.

    Individual InfoCubes and DataStore Objects can be partitionedseparately. Partitioned separately can either relate to the concept

    of splitting cubes and DataStore Objects into smaller ones

    Performance gains though parallel execution of subqueries.

  • 8/8/2019 WorkShop BI Adquisition

    72/90

    MultiProviders Are Unions of Providers

    E l Pl A d A l C C

  • 8/8/2019 WorkShop BI Adquisition

    73/90

    Example: Plan And Actual Cost Center

    Transactions

  • 8/8/2019 WorkShop BI Adquisition

    74/90

    MultiProvider Queries

    S l ti R l t I f P id f

  • 8/8/2019 WorkShop BI Adquisition

    75/90

    Selecting Relevant InfoProviders for a

    MultiProvider

  • 8/8/2019 WorkShop BI Adquisition

    76/90

    MultiProvider Design GUI

  • 8/8/2019 WorkShop BI Adquisition

    77/90

    Characteristic Identification in a MultiProvider

  • 8/8/2019 WorkShop BI Adquisition

    78/90

    Key Figure Selection

  • 8/8/2019 WorkShop BI Adquisition

    79/90

    Centralized Administration Tasks

  • 8/8/2019 WorkShop BI Adquisition

    80/90

    Process Chains: Automating Warehouse Tasks

  • 8/8/2019 WorkShop BI Adquisition

    81/90

    Summary of Dedicated BI Task Monitors

  • 8/8/2019 WorkShop BI Adquisition

    82/90

    Administration / Managing InfoCubes

    The Manage function allows you to display the contents of the fact table

    or the contentwith selected characteristic values (through a view of

    the tables provided by the Data Browser). You can also repair and

    reconstruct indexes, delete requests that have been loaded with

    errors, roll up requests in the aggregates, and compress the contentsof the fact table. Select the InfoCube that you want to manage and

    choose Manage from the context menu. Six tab pages appear:

    Contents

    Performance

    Requests

    Roll-Up

    Compress

    Reconstruct ( Only valid with 3.x data flow objects)

  • 8/8/2019 WorkShop BI Adquisition

    83/90

    Managing InfoCubes

  • 8/8/2019 WorkShop BI Adquisition

    84/90

    Requests in InfoCubes

  • 8/8/2019 WorkShop BI Adquisition

    85/90

    Compressing InfoCubes

  • 8/8/2019 WorkShop BI Adquisition

    86/90

    Management Functions of DataStore Objects

    The functions on the Manage tab are used to manage standard

    DataStore Objects. Although there are not as many tabs for

    managing DataStore Objects as in the equivalent task for InfoCubes,

    the functions for InfoCubes are more complex. The three tabs under

    the Manage option for DataStore Objects are:C

    ontents, Requests,and Reconstruction.

  • 8/8/2019 WorkShop BI Adquisition

    87/90

    DataStore Object Administration

  • 8/8/2019 WorkShop BI Adquisition

    88/90

    Contents and Selective Deletion

  • 8/8/2019 WorkShop BI Adquisition

    89/90

    DataStore Object Administration: Requests Tab

    The Query icon, indicating readability by BEx queries, is set when activation is started for a request. The

    system does not check whether the data has been successfully activated.

    DataStore Object Change Log: Maintenance

  • 8/8/2019 WorkShop BI Adquisition

    90/90

    DataStore Object Change Log: Maintenance

    Required