WorkShop BI Adquisition
-
Upload
alfredoerubiel -
Category
Documents
-
view
221 -
download
0
Transcript of WorkShop BI Adquisition
-
8/8/2019 WorkShop BI Adquisition
1/90
WORKSHOPBI
Fundamentos
Adquisicin de datos
-
8/8/2019 WorkShop BI Adquisition
2/90
Scope Part 2.
The first lesson describes the flow of data between BIand source systems that contain data.
The second lesson shows the procedure for loadingmaster data (attributes and texts) from an SAP system.
On the third lesson we will discuss the data transferprocess with more complexity and more details. We willdiscuss the available transformation rule types andmore advanced start and end routines. In addition, wewill visualize our data in the InfoCube upon completion.
-
8/8/2019 WorkShop BI Adquisition
3/90
Generic Data Warehouse Positioning of the
Data Flow
The ETL process, sometimes called the data flow is a list of the steps that raw (source)data must follow to be extracted, transformed, and loaded into targets in the BI system
-
8/8/2019 WorkShop BI Adquisition
4/90
BI Architecture: Positioning of the ETL Process
-
8/8/2019 WorkShop BI Adquisition
5/90
BI Data Flow Details
-
8/8/2019 WorkShop BI Adquisition
6/90
Source Systems and DataSource
A source system is any system that is available to BI for data
extraction and transfer purposes. Examples include mySAP ERP,
mySAP CRM, custom system-based Oracle DB, PeopleSoft, and many
others.
DataSources are BI objects used to extract and stage data from
source systems. DataSources subdivide the data provided by a
source system into self-contained business areas. Our cost center
example includes cost center texts, master data, and Cost CenterTransaction DataSources from two different source systems. A
DataSource contains a number of logically-related fields that are
arranged in a flat structure and contain data to be transferred into
BI
-
8/8/2019 WorkShop BI Adquisition
7/90
Source System Types and Interfaces
-
8/8/2019 WorkShop BI Adquisition
8/90
Persistent Staging Area
Persistent Staging Area (PSA) is an industry term, but not everyone
agrees on an exact definition. In response to a posting on Ask the
Experts at DMreview.com, Evan Levy defines a PSA as:
1. The storage and processing to support the transformation of
data.
2. It is typically temporary.
3. Is not constructed to support end-user or tool access.
4. Specifically built to provide working (or scratch) space for ETL
processing.
(This definition comes to us from Evan Levy.s response to a posting on
ask the experts on DMreview.com)
-
8/8/2019 WorkShop BI Adquisition
9/90
BI 7.0 Transformation
Once the data arrives in the PSA, you then to cleanse / transform it prior to physical storage in your targets.These targets include InfoObjects (master data), InfoCubes and DataStore Objects.
-
8/8/2019 WorkShop BI Adquisition
10/90
Optional BIInfoSources
-
8/8/2019 WorkShop BI Adquisition
11/90
InfoPackages and Data Transfer Processes 1
The design of the data flow uses metadata objects such as
DataSources, Transformations, InfoSources and InfoProviders. Once
the data flow is designed, the InfoPackages and the Data Transfer
Processes take over to actually manage the execution and
scheduling of the actual data transfer. As you can see from the
figure below, there are two processes that need to be scheduled.
-
8/8/2019 WorkShop BI Adquisition
12/90
InfoPackages and Data Transfer
Processes 2
The first process is loading the data from the source system. This
involves multiple steps that differ depending on which source system
is involved. For example, if it is a SAP source system, a function call
must be made to the other system, and an extractor program
associated with the DataSource might be initiated. An InfoPackageis the BI object that contains all the settings directing exactly how this
data should be uploaded from the source system. The target of the
InfoPackage is the PSA table tied to the specific DataSource
associated with the InfoPackage. In a production environment, the
same data in the same source system should only be extracted once,with one InfoPackage; from there, as many data transfer processes
as necessary can push this data to as many InfoProviders as
necessary.
-
8/8/2019 WorkShop BI Adquisition
13/90
InfoPackages and Data Transfer Processes
Initiate the Data Flow
-
8/8/2019 WorkShop BI Adquisition
14/90
InfoPackages and Data Transfer
Processes 3
The second process identified in the figure is the data transfer process.
It is this object that controls the actual data flow (filters, update
mode (delta or full) for a specific transformation. You might have
more than one data transfer process if you have more than one
transformation step or target in the ETL flow. This more complexsituation is shown below. Note if you involve more than one
InfoProvider, you need more than one data transfer process.
Sometime necessity drives very complex architectures.
-
8/8/2019 WorkShop BI Adquisition
15/90
More Complex ETL: Multiple InfoProviders and
InfoSource Use
-
8/8/2019 WorkShop BI Adquisition
16/90
Loading SAP source system Master Data
Scenario
-
8/8/2019 WorkShop BI Adquisition
17/90
Global Transfer Routines
Cleansing or transforming the data is accomplished in a dedicated BI
transformation.
Each time you want to convert incoming fields from your source system
to InfoObjects on your BI InfoProviders, you create a dedicated
TRANSFORMATION, consisting of one transformation rule for each
object.
-
8/8/2019 WorkShop BI Adquisition
18/90
-
8/8/2019 WorkShop BI Adquisition
19/90
DataSource Creation Access and the Generic
Extractor
-
8/8/2019 WorkShop BI Adquisition
20/90
Replication
In order to access DataSources and map them to yourInfoProviders in BI, you must inform BI of the name and fieldsprovided by the DataSource. This process is calledreplication, or replicating the DataSource metadata. It is
accomplished from the context menu on the folder where theDataSource is located. Once the DataSource has beenreplicated into BI, the final step is to activate it. As of thenewest version of BI, you can activate Business Content dataflows entirely from within the Data WarehousingWorkbench. During this process the Business Content
DataSource Activation in the SAP source system andReplication to SAP NetWeaver BI takes place using aRemote Function Call (RFC).
-
8/8/2019 WorkShop BI Adquisition
21/90
DataSource in BI After Replication
-
8/8/2019 WorkShop BI Adquisition
22/90
Access Path to Create a Transformation
In this first load process, we are trying to keep it simple. Since we added some custom global transfer logic
directly to our InfoObject, we just need field-to-field mapping for our third step:Transformation.
-
8/8/2019 WorkShop BI Adquisition
23/90
Transformation GUIMaster Data
-
8/8/2019 WorkShop BI Adquisition
24/90
InfoPackage: Loading Source Data to the PSA
-
8/8/2019 WorkShop BI Adquisition
25/90
Creation and Monitoring of the Data Transfer
Process
-
8/8/2019 WorkShop BI Adquisition
26/90
Complete Scenario: Transaction Load from
mySAP ERP
-
8/8/2019 WorkShop BI Adquisition
27/90
Emulated DataSources
-
8/8/2019 WorkShop BI Adquisition
28/90
Issues Relating to 3.x DatasSources
-
8/8/2019 WorkShop BI Adquisition
29/90
Using the Graphical Transformation GUI
-
8/8/2019 WorkShop BI Adquisition
30/90
The Transformation Process: Technical
Perspective
-
8/8/2019 WorkShop BI Adquisition
31/90
Start Routine 1
-
8/8/2019 WorkShop BI Adquisition
32/90
Start Routine 2
-
8/8/2019 WorkShop BI Adquisition
33/90
Transformation Rules: Rule Detail
-
8/8/2019 WorkShop BI Adquisition
34/90
Transformation Rules: Options and
Features
-
8/8/2019 WorkShop BI Adquisition
35/90
Transformation: Rule Groups
A rule group is a group of transformation rules. It contains one transformation rule for each key field of the
target. A transformation can contain multiple rule groups. Rule groups allow you to combine various rules.This means that you can create different rules for different key figures for a characteristic.
-
8/8/2019 WorkShop BI Adquisition
36/90
Transformation Groups: Details
-
8/8/2019 WorkShop BI Adquisition
37/90
End Routine
-
8/8/2019 WorkShop BI Adquisition
38/90
Data Acquisition Layer
-
8/8/2019 WorkShop BI Adquisition
39/90
Extraction using DB Connect and UD Connect
-
8/8/2019 WorkShop BI Adquisition
40/90
UD Connect Extraction Highlights
-
8/8/2019 WorkShop BI Adquisition
41/90
DB Connect Extraction
-
8/8/2019 WorkShop BI Adquisition
42/90
Technical View of DB Connect
-
8/8/2019 WorkShop BI Adquisition
43/90
XML Extraction
-
8/8/2019 WorkShop BI Adquisition
44/90
XML Purchase Order Example
-
8/8/2019 WorkShop BI Adquisition
45/90
XML Extraction Highlights
-
8/8/2019 WorkShop BI Adquisition
46/90
Loading Data from Flat Files: Complete
Scenario
-
8/8/2019 WorkShop BI Adquisition
47/90
Flat File Sources
-
8/8/2019 WorkShop BI Adquisition
48/90
Features of the BI File Adapter and File-Based
DataSources
Basically a DataSource based on a flat file is an object that contains all the settings necessary to load and
parse the file when it is initiated by the InfoPackage. Some of features of the BI file adapter are listed below.
-
8/8/2019 WorkShop BI Adquisition
49/90
File System DataSource: Extraction Tab
-
8/8/2019 WorkShop BI Adquisition
50/90
File System DataSource: Proposal Tab
-
8/8/2019 WorkShop BI Adquisition
51/90
File System DataSource: Fields tab
-
8/8/2019 WorkShop BI Adquisition
52/90
File System DataSource: Preview Tab
-
8/8/2019 WorkShop BI Adquisition
53/90
BI Flexible InfoSources
-
8/8/2019 WorkShop BI Adquisition
54/90
A New BIInfoSource in the Data Flow
-
8/8/2019 WorkShop BI Adquisition
55/90
Complex ETL: DataSource Objects and
InfoSources
-
8/8/2019 WorkShop BI Adquisition
56/90
DTP: Filtering Data
-
8/8/2019 WorkShop BI Adquisition
57/90
Error Handling
The data transfer process supports you in handling data records with
errors. The data transfer process also supports error handling for
DataStore objects. You can determine how the system responds if
errors occur. At runtime, the incorrect data records are sorted and
can be written to an error stack (request-based database table). Inaddition, another feature supports debugging bad transformations.
It is called temporary storage.
-
8/8/2019 WorkShop BI Adquisition
58/90
Error Processing
-
8/8/2019 WorkShop BI Adquisition
59/90
Features of Error Processing
-
8/8/2019 WorkShop BI Adquisition
60/90
More Error Handling Features
-
8/8/2019 WorkShop BI Adquisition
61/90
DTP Temporary Storage Features
-
8/8/2019 WorkShop BI Adquisition
62/90
Access to the Error Stack and Temporary
Storage via the DTP Monitor
-
8/8/2019 WorkShop BI Adquisition
63/90
Loading and Activation in DataStore Objects
A standard DataStore Object has three tables. Previously, we
described the three tables and the purpose for each, but we only
explained that a data transfer process is used to load the first one.
In the following section, we will examine the DataStore Object
activation process, which is the technical term used to describe howthese tables get their data. In addition, we will look at an example
to illustrate exactly what hapens when data is uploaded and
subsequently activated in a DataStore Object.
Let us assume that two requests, REQU1 and REQU2, are loaded into
the DataStore Object. This can occur sequentially or in parallel. Theload process posts both requests into the activation queue.
-
8/8/2019 WorkShop BI Adquisition
64/90
Loading Data into the Activation Queue of a
Standard DataStore Object
-
8/8/2019 WorkShop BI Adquisition
65/90
Activation Example: First Load Activated
-
8/8/2019 WorkShop BI Adquisition
66/90
Activation Example: Offsetting Data Created
by Activation Process 1
-
8/8/2019 WorkShop BI Adquisition
67/90
Activation Example: Offsetting Data Created
by Activation Process 2
If the DataStore Object was not in the flow of data in this example,
and the source data flowed directly to a InfoCube, the InfoCube
would add the 10 to the 30 and get an incorrect value of 40. If,
instead, we feed the change log data to the InfoCube, 10,-10, and
30 add to the correct 30 value.In this example, a DataStore Object was required in the data flow
before the InfoCube. It is not always required, but many times it is
desired
-
8/8/2019 WorkShop BI Adquisition
68/90
Integrating a New Target
-
8/8/2019 WorkShop BI Adquisition
69/90
MultiProviders
A MultiProvider is a special InfoProvider that combines data from
several InfoProviders, providing it for reporting. The MultiProvider
itself (like InfoSets and VirtualProviders) does not contain any data.
Its data comes exclusively from the InfoProviders on which it is
based. A MultiProvider can be made up of various combinations ofthe following InfoProviders:
InfoCubes
DataStore objects
InfoObjects
InfoSets
Aggregation levels (slices of a InfoCube to support BI Integrated
Planning)
-
8/8/2019 WorkShop BI Adquisition
70/90
MultiProvider Concept
-
8/8/2019 WorkShop BI Adquisition
71/90
Advantages of the MultiProvider
Simplified design: The MultiProvider concept provides you with
advanced analysis options, without you having to fill new and
extremely large InfoCubes with data. You can construct simpler
BasicCubes with smaller tables and with less redundancy.
Individual InfoCubes and DataStore Objects can be partitionedseparately. Partitioned separately can either relate to the concept
of splitting cubes and DataStore Objects into smaller ones
Performance gains though parallel execution of subqueries.
-
8/8/2019 WorkShop BI Adquisition
72/90
MultiProviders Are Unions of Providers
E l Pl A d A l C C
-
8/8/2019 WorkShop BI Adquisition
73/90
Example: Plan And Actual Cost Center
Transactions
-
8/8/2019 WorkShop BI Adquisition
74/90
MultiProvider Queries
S l ti R l t I f P id f
-
8/8/2019 WorkShop BI Adquisition
75/90
Selecting Relevant InfoProviders for a
MultiProvider
-
8/8/2019 WorkShop BI Adquisition
76/90
MultiProvider Design GUI
-
8/8/2019 WorkShop BI Adquisition
77/90
Characteristic Identification in a MultiProvider
-
8/8/2019 WorkShop BI Adquisition
78/90
Key Figure Selection
-
8/8/2019 WorkShop BI Adquisition
79/90
Centralized Administration Tasks
-
8/8/2019 WorkShop BI Adquisition
80/90
Process Chains: Automating Warehouse Tasks
-
8/8/2019 WorkShop BI Adquisition
81/90
Summary of Dedicated BI Task Monitors
-
8/8/2019 WorkShop BI Adquisition
82/90
Administration / Managing InfoCubes
The Manage function allows you to display the contents of the fact table
or the contentwith selected characteristic values (through a view of
the tables provided by the Data Browser). You can also repair and
reconstruct indexes, delete requests that have been loaded with
errors, roll up requests in the aggregates, and compress the contentsof the fact table. Select the InfoCube that you want to manage and
choose Manage from the context menu. Six tab pages appear:
Contents
Performance
Requests
Roll-Up
Compress
Reconstruct ( Only valid with 3.x data flow objects)
-
8/8/2019 WorkShop BI Adquisition
83/90
Managing InfoCubes
-
8/8/2019 WorkShop BI Adquisition
84/90
Requests in InfoCubes
-
8/8/2019 WorkShop BI Adquisition
85/90
Compressing InfoCubes
-
8/8/2019 WorkShop BI Adquisition
86/90
Management Functions of DataStore Objects
The functions on the Manage tab are used to manage standard
DataStore Objects. Although there are not as many tabs for
managing DataStore Objects as in the equivalent task for InfoCubes,
the functions for InfoCubes are more complex. The three tabs under
the Manage option for DataStore Objects are:C
ontents, Requests,and Reconstruction.
-
8/8/2019 WorkShop BI Adquisition
87/90
DataStore Object Administration
-
8/8/2019 WorkShop BI Adquisition
88/90
Contents and Selective Deletion
-
8/8/2019 WorkShop BI Adquisition
89/90
DataStore Object Administration: Requests Tab
The Query icon, indicating readability by BEx queries, is set when activation is started for a request. The
system does not check whether the data has been successfully activated.
DataStore Object Change Log: Maintenance
-
8/8/2019 WorkShop BI Adquisition
90/90
DataStore Object Change Log: Maintenance
Required