ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin...

52
ISQS 6339, Data Management and Business ISQS 6339, Data Management and Business Intelligence Intelligence Cubism – Measures and Cubism – Measures and Dimensions Dimensions Zhangxi Lin Texas Tech University 1

Transcript of ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin...

Page 1: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

ISQS 6339, Data Management and Business ISQS 6339, Data Management and Business Intelligence Intelligence

Cubism – Measures and Cubism – Measures and DimensionsDimensionsZhangxi Lin

Texas Tech University

1

Page 2: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

OutlineOutlineMeasuresWhere we’ve beenPopulating fact tableTypes of dimensions

2

Page 3: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Structure and Components of Structure and Components of Business IntelligenceBusiness Intelligence

3

SSMSSSMS SSISSSIS SSASSSAS

SSRSSSRS

SASEM

SASEM

SASEG

SASEG

Page 4: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Snowflake Schema of the Data MartSnowflake Schema of the Data Mart

4

Manufacturingfact

DimProduct

DimProductSubType

DimProductType

DimBatch

DimMachine

DimMachineType

DimMaterial

DimPlant

DimCountry

1

2

3

4

5

8

6

7

910

Page 5: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Where we’ve been and where we Where we’ve been and where we are noware nowExercise 1: Getting started Exercise 2: Creating a data mart with

SSMSExercise 3: Creating data mart with

BIDSExercise 4: Populating dimensions of a

data martExercise 5: Loading fact tablesExercise 6: Create and customize a

cube

5

Page 6: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

What we need to do with the half-What we need to do with the half-done data mart?done data mart?Populate DimBatch dimenstion tablePopulate ManufacturingFact tableBuild an OLAP cube (we already did this

before)Check measuresCheck dimensions

6

Page 7: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

7

MEASURES MEASURES

Page 8: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

FactsFactsFacts are measurements associated with a

specific business process.Many facts can be derived from other facts,

including additive and semiadditive facts. Non-additive facts can be avoided by

calculating it from additive facts.Measures are clustered together in a

group, called measure group.

8

Page 9: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Types of measuresTypes of measures Three types

◦ Additive measures. Most facts are additive (calculative), such as sum

◦ Semiadditive measures. The measures that can be added along some dimensions, but not along others. For example, inventory level can be added along product dimension but not time dimension.

◦ Non-additive (such as max, average), or descriptive (e.g. factless fact table).

Aggregate functions◦ Additive: Sum◦ Semiadditive: ByAccount, Count, FirstChild,

FirstNonEmpty, LastChild, LastNonEmpty, Max, Min◦ Nonadditive: DistinctCount, None.

Page 10: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Measures and dimensionsMeasures and dimensionsDimensions are used to

aggregate measures. Therefore, they must be somehow related to measures

Granularity◦Important for the analysis◦There could be missing values in the

fact table

Page 11: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

LOADING FACT LOADING FACT TABLESTABLES

11

Page 12: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Exercise 5: Loading Fact Exercise 5: Loading Fact TablesTables Project name: MMMFactLoad-lastname Package name: FactLoad.dtsx Tasks

◦ Create Inventory Fact table◦ Load Dim Batch◦ Load Manufacturing Fact◦ Load Inventory Fact

Deliverable: email a screenshot of the “green” outcome of the ETL project to [email protected], with a subject title “ISQS 6339 EX5 - <lastname>”

12

Page 13: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Inventory Fact TableInventory Fact Table

Create a Table InventoryFact in your database.◦ Compound primary key: DateOfInventory,

ProductCode, and Material◦ Define two foreign keys

Column Name Data Type Allow Nulls

InventoryLevel Int No

NumberOnBackorder Int No

DateOfInventory Datatime No

ProductCode Int No

Material Varchar(30) No

13

Page 14: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Data Sources for Loading Data Sources for Loading FactFact For loading DimBatch table and ManufacturingFact

table◦ BatchInfo.CSV

For loading InventortyFact table ◦ Lin.OrderProcessingSystem Database

14

Page 15: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Control Flow for Loading Facts and the Control Flow for Loading Facts and the Remaining DimensionRemaining Dimension

Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package

15

Page 16: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Flat File Connection Flat File Connection Data types

◦BatchNumber, MachinNumber: four-byte signed integer [DT_I4]

◦ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4]

◦TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp]

Only check BatchNumber as the input of Dim Batch

All columns are needed for fact tables

16

Page 17: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Some Frequently Used Some Frequently Used NodesNodes

Page 18: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load DimBatch Data FlowLoad DimBatch Data Flow

18

Page 19: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load DimBatch Data FlowLoad DimBatch Data Flow

19

Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.

Page 20: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

The Flat File SourceThe Flat File Source

20

Page 21: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

21

Sort Transformation

In the Aggregate item,Define “Group-by” BatchNumber.

In Derived column item, Define BatchName From BatchNumber

Use the expression(DT_WSTR, 50)[BatchNumber]To change the data typeOf BatchName.

Page 22: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load Fact Data FlowLoad Fact Data Flow

22

Page 23: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Derived Columns for the Fact Derived Columns for the Fact tabletable

23

Page 24: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Expressions for the Derived ColumnsExpressions for the Derived Columns

AcceptedProducts◦ [NumberProduced] – [NumberRejected]

ElapsedTimeForManufacture◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])

DateOfManufacture◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)

[TimeStarted],1,10) This expression converts TimeStarted into a

string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.

24

Page 25: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

25

OLE DB DestinationFor loading the facttable

Page 26: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load Inventory FactLoad Inventory Fact OLE DB Source

◦ OrderProcessingSystem.InventoryFact OLE DB Destination

◦ MaxMinManufacturingDM-lastname.InventoryFact No transformation

There are two ways to loading the table◦ Create the table and use ETL to load it◦ Import directly from the source to the database

MaxMinManufacturingDM-lastname

26

Page 27: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Debugging ResultsDebugging Results

27

Loading DimBatch Loading ManufacturingFact

Page 28: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

28

BUILDING AN OLAP BUILDING AN OLAP CUBECUBE

Page 29: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Exercise 6: Design a CubeExercise 6: Design a Cube Project name: ISQS6339_EX6_2015_lastname Tasks

◦ Add in new date items (year, quarter, and month) to two fact tables

◦ Create time dimension using Manufacturing Fact table◦ Define calculated measures (Total Products, Percent

Rejected) ◦ Define hierarchies of attributes in dimension tables◦ Create a cube from the MaxMinManufacturing data

mart with hierarchical date dimension Deliverable:

◦ Screenshots: dimension hierarchies, dimensions, relationships of facts and dimensions, deployment result, format of measures, and browsing results.

29

Page 30: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Three Steps to Create a Cube from Three Steps to Create a Cube from Data SourcesData Sources Defining data source Defining data source view

◦ Add in three new columns of year, quarter, and month for the two fact tables

Building a cube. ◦ Define a new dimension Dim Time from

Manufacturing Fact table Customize the cube:

◦ Link two fact tables in a cube◦ Define new primary key for Dim Time◦ Define calculated measures◦ Relate dimensions to measures

30

Page 31: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - ManufactureDefinition - Manufacture YearOfManufacture

CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture

CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3

THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6

THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9

THEN 'Q3'ELSE 'Q4'END

MonthOfManufactureCONVERT(char(4), YEAR(DateOfManufacture)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)

31

Page 32: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - InventoryDefinition - Inventory YearOfInventory

CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory

CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3 THEN

'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6 THEN

'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9 THEN

'Q3'ELSE 'Q4'END

MonthOfInventoryCONVERT(char(4), YEAR(DateOfInventory)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)

32

Page 33: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Data Source ViewData Source View

33

New columns

Page 34: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Select Measures PageSelect Measures Page

34

Uncheck ManufactureFact Count

Page 35: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

35

The finished cube

Page 36: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

36

CubeStructure

Page 37: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

37

Defining a format string

Page 38: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

38

Inventory measures

“Number on Backorder” is also set with these two parameters

Page 39: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Calculated measures – Calculated measures – made-up factsmade-up facts The definition of calculated measure is stored

in the OLAP cube itself. The actual values that result from a calculated

measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation.

The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL.

39

Page 40: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.
Page 41: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

41

Page 42: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

42

DIMENSIONSDIMENSIONSin SQL Serverin SQL Server

Page 43: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Types of DimensionsTypes of Dimensions Fact dimensions: the Dimensions created from attributes

in a fact table Parent-Child dimensions: Built on a table containing a

self-referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure

group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment. ◦ To create a role playing dimension, add the dimension to the

Dimension Usage tab multiple times. Then create a relationship between each instance of the dimension and the measure group.

Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group

Data mining dimensions: the information discovered by data mining

Many-to-many dimensions: e.g. multiple ship to addresses

Slowly changing dimensions

43

Page 44: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Slowly changing Slowly changing dimensionsdimensionsType 1 SCD – no trackType 2 SCD – tracking the entire history,

adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status

Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value

Page 45: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Add a time dimension (a fact Add a time dimension (a fact dimension)dimension)

Page 46: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.
Page 47: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Rename time dimensionRename time dimension

Page 48: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Date HierarchyDate Hierarchy

Page 49: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.
Page 50: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Material Hierarchy & Plant Material Hierarchy & Plant HierarchyHierarchy

Page 51: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Product HierarchyProduct Hierarchy

Page 52: ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Relating Dimensions in Relating Dimensions in the Cubethe Cube