Lecture 7

10
Ahsan Abdullah Ahsan Abdullah 1 Data Warehousing Data Warehousing Lecture-7 Lecture-7 De-normalization De-normalization Virtual University of Virtual University of Pakistan Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

description

Data Ware Housing

Transcript of Lecture 7

Page 1: Lecture 7

Ahsan AbdullahAhsan Abdullah

11

Data Warehousing Data Warehousing Lecture-7Lecture-7

De-normalizationDe-normalization

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]

Page 2: Lecture 7

Ahsan AbdullahAhsan Abdullah

22

De-normalizationDe-normalization

Page 3: Lecture 7

Ahsan AbdullahAhsan Abdullah

33

Striking a balance between “good” & “evil”Striking a balance between “good” & “evil”

Flat Table

Data Lists

Data Cubes 1st Normal Form

2nd Normal Form

3rd Normal Form

4+ Normal Forms

NormalizationDe-normalization

One big flat file

Too many tables

Page 4: Lecture 7

Ahsan AbdullahAhsan Abdullah

44

What is De-normalization?What is De-normalization?

It is not chaos, more like a “controlled crash” with the aim of performance enhancement without loss of information.

Normalization is a rule of thumb in DBMS, but in DSS ease of use is achieved by way of denormalization.

De-normalization comes in many flavors, such as combining tables, splitting tables, adding data etc., but all done very carefully.

Page 5: Lecture 7

Ahsan AbdullahAhsan Abdullah

55

Why De-normalization In DSS?Why De-normalization In DSS? Bringing “close” dispersed but related data items.Bringing “close” dispersed but related data items.

Query performance in DSS significantly Query performance in DSS significantly dependent on physical data model.dependent on physical data model.

Very early studies showed performance Very early studies showed performance difference in orders of magnitude for different difference in orders of magnitude for different number de-normalized tables and rows per table.number de-normalized tables and rows per table.

The level of de-normalization should be carefully The level of de-normalization should be carefully considered.considered.

Page 6: Lecture 7

Ahsan AbdullahAhsan Abdullah

66

How De-normalization improves performance?How De-normalization improves performance?

De-normalization specifically improves performance by either:

Reducing the number of tables and hence the reliance on joins, which consequently speeds up performance.

Reducing the number of joins required during query execution, or

Reducing the number of rows to be retrieved from the Primary Data Table.

Page 7: Lecture 7

Ahsan AbdullahAhsan Abdullah

77

4 Guidelines for De-normalization4 Guidelines for De-normalization

1. Carefully do a cost-benefit analysis (frequency of use, additional storage, join time).

2. Do a data requirement and storage analysis.

3. Weigh against the maintenance issue of the redundant data (triggers used).

4. When in doubt, don’t denormalize.

Page 8: Lecture 7

Ahsan AbdullahAhsan Abdullah

88

Areas for Applying De-Normalization TechniquesAreas for Applying De-Normalization Techniques

Dealing with the abundance of star schemas.

Fast access of time series data for analysis.

Fast aggregate (sum, average etc.) results and complicated calculations.

Multidimensional analysis (e.g. geography) in a complex hierarchy.

Dealing with few updates but many join queries.

De-normalization will ultimately affect the database size and query performance.

Page 9: Lecture 7

Ahsan AbdullahAhsan Abdullah

99

Five principal De-normalization techniquesFive principal De-normalization techniques

1. Collapsing Tables. - Two entities with a One-to-One relationship. - Two entities with a Many-to-Many relationship.

2. Splitting Tables (Horizontal/Vertical Splitting).

3. Pre-Joining.

4. Adding Redundant Columns (Reference Data).

5. Derived Attributes (Summary, Total, Balance etc).

Page 10: Lecture 7

Ahsan AbdullahAhsan Abdullah

1010

Collapsing TablesCollapsing TablesColA ColB

ColA ColCnorm

aliz

ed

ColA ColB ColC

denormalized

Reduced storage space.

Reduced update time.

Does not changes business view.

Reduced foreign keys.

Reduced indexing.