ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech...
-
Upload
leslie-jefferson -
Category
Documents
-
view
221 -
download
0
Transcript of ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech...
ISQS 3358, Business IntelligenceISQS 3358, Business Intelligence
Cubism – Measures and Cubism – Measures and DimensionsDimensionsZhangxi Lin
Texas Tech University
1
OutlineOutlineWhere we’ve beenPopulating fact tableCreating a cube with SSISMeasuresTypes of dimensions Cube design tabs
2
Structure and Components Structure and Components of Business Intelligenceof Business Intelligence
3
SSMSSSMS SSISSSIS SSASSSAS
SSRSSSRS
SASEM
SASEM
SASEG
SASEG
Snowflake Schema of the Data Snowflake Schema of the Data MartMart
4
Manufacturingfact
DimProduct
DimProductSubType
DimProductType
DimBatch
DimMachine
DimMachineType
DimMaterial
DimPlant
DimCountry
1
2
3
4
5
8
6
7
910
Where we’ve been and where we Where we’ve been and where we are noware now Exercise 1: Getting started Exercise 2: Creating data marts Exercise 3: Creating a cube from a data mart Exercise 4: Populating dimensions of a data
mart Exercise 5: Exploring features of ETL data
conversion tasks Exercise 6: Loading fact tables
5
What we need to do with the half-What we need to do with the half-done data mart?done data mart?Populate DimBatch dimenstion tablePopulate ManufacturingFact tableBuild an OLAP cube (we already did
this before)Check measuresCheck dimensions
6
LOADING FACT LOADING FACT TABLESTABLES
7
Exercise 6: Loading Fact Exercise 6: Loading Fact TablesTablesProject name: MMMFactLoad-lastnamePackage name: FactLoad.dtsxTasks
◦ Create Inventory Fact table◦ Load Dim Batch◦ Load Manufacturing Fact◦ Load Inventory Fact
Deliverable: email a screenshot of the “green” outcome of the ETL project to [email protected]
8
Inventory Fact TableInventory Fact Table
Create a Table InventoryFact in database MaxMinManufacturingDM-lastname.◦ Compound primary key: DateOfInventory,
ProductCode, and Material◦ Define two foreign keys
Column Name Data Type Allow Nulls
InventoryLevel Int No
NumberOnBackorder Int No
DateOfInventory Datatime No
ProductCode Int No
Material Varchar(30) No
9
Data Sources for Loading Data Sources for Loading FactFactFor loading DimBatch table and
ManufacturingFact table◦ BatchInfo.CSV
For loading InventortyFact table ◦ OREDB.OrderProcessingSystem.Inventory
10
Control Flow for Loading Facts and Control Flow for Loading Facts and the Remaining Dimensionthe Remaining Dimension
Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package
11
Flat File Connection Flat File Connection Data types
◦BatchNumber, MachinNumber: four-byte signed integer [DT_I4]
◦ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4]
◦TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp]
Only check BatchNumber as the input of Dim Batch
All columns are needed for fact tables
12
Load DimBatch Data FlowLoad DimBatch Data Flow
13
Load DimBatch Data FlowLoad DimBatch Data Flow
14
Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.
The Flat File SourceThe Flat File Source
15
16
Sort Transformation
In the Aggregate item,Define “Group-by” BatchNumber.
In Derived column item, Define BatchName From BatchNumber
Load Fact Data FlowLoad Fact Data Flow
17
Derived Columns for the Fact Derived Columns for the Fact tabletable
18
Expressions for the Derived Expressions for the Derived ColumnsColumnsAcceptedProducts
◦ [NumberProduced] – [NumberRejected]ElapsedTimeForManufacture
◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])
DateOfManufacture◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,
25)[TimeStarted],1,10) This expression converts TimeStarted into a
string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.
19
20
OLE DB DestinationFor loading the facttable
Load Inventory FactLoad Inventory FactOLE DB Source
◦ OrderProcessingSystem.InventoryFactOLE DB Destination
◦ MaxMinManufacturingDM-lastname.InventoryFactNo transformation
There are two ways to loading the table◦ Create the table and use ETL to load it◦ Import directly from the source to the database
MaxMinManufacturingDM-lastname
21
Debugging ResultsDebugging Results
22
Loading DimBatch Loading ManufacturingFact
BUILDING AN OLAP BUILDING AN OLAP CUBECUBE
23
Three Steps to Create a Cube from Three Steps to Create a Cube from Data SourcesData SourcesDefining data sourceDefining data source view
◦ Add in three new columns of year, quarter, and month for the two fact tables
Building a cube. ◦ Define a new dimension Dim Time from
Manufacturing Fact tableCustomize the cube:
◦ Link two fact tables in a cube◦ Define new primary key for Dim Time◦ Define calculated measures◦ Relate dimensions to measures
24
T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - ManufactureDefinition - Manufacture YearOfManufacture
CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture
CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3
THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6
THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9
THEN 'Q3'ELSE 'Q4'END
MonthOfManufactureCONVERT(char(4), YEAR(DateOfManufacture)) +
RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)
25
T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - InventoryDefinition - Inventory YearOfInventory
CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory
CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3
THEN 'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6
THEN 'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9
THEN 'Q3'ELSE 'Q4'END
MonthOfInventoryCONVERT(char(4), YEAR(DateOfInventory)) +
RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)
26
Data Source ViewData Source View
27
New columns
Select Measures PageSelect Measures Page
28
Uncheck ManufactureFact Count
Review New Dimensions Review New Dimensions PagePage
29
Rename ManufacturingFact to Dim Time
30
The finished cube
New dimensionCreated from the Fact table
31
CubeStructure
MEASURES MEASURES
32
FactsFactsMeasurements associated with a specific
business process.Types of measures
◦Most facts are additive (calculative), such as sum; others are semi-additive (those that can be added along some dimensions, not along others), non-additive (such as max, average), or descriptive (e.g. factless fact table).
Many facts can be derived from other facts. So, non-additive facts can be avoided by calculating it from additive facts.
33
Calculated measuresCalculated measuresThe definition of calculated measure is stored
in the OLAP cube itself. The actual values that result from a calculated measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation.
The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL.
34
35
Define Format String “#, #” for measures:
AcceptedProduct, RejectedProject
36
Defining a format string
37
38
Define Calculated Define Calculated MeasuresMeasures
39
DIMENSIONSDIMENSIONS
40
Managing DimensionsManaging Dimensions
41
Managing DimensionsManaging Dimensions
42
Relating Dimensions to Measure Relating Dimensions to Measure GroupsGroups
43
Completed Dimension Completed Dimension DefinitionsDefinitions
44
Types of DimensionsTypes of Dimensions Fact dimensions: the Dimensions created from attributes in a fact
table Parent-Child dimensions: Built on a table containing a self-
referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure group
multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment
Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group
Data mining dimensions: the information discovered by data mining
Many-to-many dimensions: e.g. multiple ship to addresses Slowly changing dimensions
◦ Type 1 SCD – no track◦ Type 2 SCD – tracking the entire history, adding four attributes: SCD
Original ID, SCD Start Date, SCD End Date, SCD Status◦ Type 3 SCD – Similar to Type 2 SCD but only track current state and the
original state; two additional attribute: SCD Start Date, SCD Initial Value
45
CUBE DESIGN TABS CUBE DESIGN TABS
46
Understanding the Cube Designer Understanding the Cube Designer Tabs Tabs
47
Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships
between dimensions and measure groups, and the granularity of each dimension within each measure group.
Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints.
KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube.
Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube..
Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions.
Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user.
Translations: Use this tab to create and manage translated names for cube objects, such as month or product names.
Browser: Use this tab to view data in the cube.
ISQS 6339, Data Mgmt & Business Intelligence
Key Performance Indicators Key Performance Indicators (KPIs)(KPIs)
48
Digital dashboardCreating a KPI
ISQS 6339, Data Mgmt & Business Intelligence
The MDX expression for KPI The MDX expression for KPI Status Status Expression Expression (MaxMinManufacturingDM)(MaxMinManufacturingDM)
49
Case When ROUND([Measures].[percent Rejected],4) < 0.0103
Then 1When ROUND([Measures].[percent Rejected],4) >= 0.0103 AND
ROUND([Measures].[percent Rejected],4) >= 0.0104 Then .5
When ROUND([Measures].[percent Rejected],4) >= 0.0104 AND ROUND([Measures].[percent Rejected],4) >= 0.0105 Then 0
When ROUND([Measures].[percent Rejected],4) >= 0.0105 AND ROUND([Measures].[percent Rejected],4) >= 0.0106 Then -.5
Else -1End
ISQS 6339, Data Mgmt & Business Intelligence
50
Calculated measure
51
KPI definition anddeployment
KPI BrowserKPI Browser
52
Browser View
ISQS 6339, Data Mgmt & Business Intelligence
ActionsActions
53
Instructions stored inside the cubeAllow the OLAP cubes to “reach out and
touch someone.”Enable us to define commands,
statements, and directives that are to be executed outside of the cube
Linked to certain objects in the cube, which can be enacted as a menu when a user is browsing the objects. The user can select one of the these actions to accomplish certain tasks.
ISQS 6339, Data Mgmt & Business Intelligence
Types of ActionsTypes of Actions
54
Action◦ Dataset◦ Proprietary◦ Rowset - Retrieve a rowset.◦ Statement◦ URL
Drillthrough Action. Defines a dataset to be returned as a drillthrough to a more detailed level
Report Action. Launch a SQL Server 2005 Reporting Services report
ISQS 6339, Data Mgmt & Business Intelligence
55
Defining Actions
56
Perspectives
57
Translations
Q & AQ & A Conceptual level
◦ What are rationale behind the structure of “Data Source”, “Data Source View” and “Cube”?
◦ Why time dimension is so important in a data mart?◦ Why is the multi-levels of dimensions, such as Material-MachineType-
Machine in MaxMinManufacturingDM, useful? ◦ Why do you need to change the primary key of DimTime after it was
created from the MaxMinManufacturingFact table?◦ Can you summarize a number of main differences between a regular
database design and a data mart design? Technical level
◦ After you made changes in a data source node why do you have to check “Mapping” in the data destination node again?
◦ When there is a red wave line under an object, such as a table during cube design, what does it imply? How to solve it? Specifically, when a fact table has such a problem how could it be fixed?
◦ Why not all dimensions appear in the cube structure diagram? ◦ What is the difference between the variable names in the format of
Name and [Name]? ◦ Do you understand the parameters configured in the data flow tasks,
such as those in data sources, data destination, Aggregate node, Derived Column node, etc?
Any other questions?
58
Data Mart Application Development Data Mart Application Development DebuggingDebugging Problem 0: You cannot find your database entry. Problem 1: The source node is red after running a
data flow task◦ Causes?
Problem 2: The destination node is red after running a data flow task◦ Causes?
Problem 3: Even though you redefined the source node, the problem remains.
Open problems◦ What are frequently encountered problems in ETL
application implementation?◦ What are the problems you encountered in building a
cube?
59