Database Systems: Design, Implementation, and Management

18
BTM 382 Database Management Chapter 6: Normalization of Database Tables Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia University, Montréal

description

Database Systems: Design, Implementation, and Management. Chapter 6 Normalization of Database Tables. Problems with unnormalized tables. Needless redundancy, hence insert, update and delete anomalies (inconsistencies) Data updates are less efficient because tables are larger - PowerPoint PPT Presentation

Transcript of Database Systems: Design, Implementation, and Management

Page 1: Database Systems:  Design, Implementation, and  Management

BTM 382 Database Management

Chapter 6:Normalization of Database Tables

Chitu OkoliAssociate Professor in Business Technology Management

John Molson School of Business, Concordia University, Montréal

Page 2: Database Systems:  Design, Implementation, and  Management

Problems with unnormalized tables

Needless redundancy, hence insert, update and delete anomalies (inconsistencies)

Data updates are less efficient because tables are larger

Indexing is more cumbersome No simple strategies for creating views (virtual tables)

Page 3: Database Systems:  Design, Implementation, and  Management

Understanding dependencies to be able to properly normalize tables

Page 4: Database Systems:  Design, Implementation, and  Management

Functional dependency

Functional dependency: A→B or (A,B)→(C,D) B is functionally dependent on A means that knowing A will therefore give

you the correct value of B E.g. Project.ID → Project.Name Also called determination: “A determines B”

Full functional dependency: (A,B)→C where A→�C and B→�C When all the attributes in a key are required for the determination (none is

optional) E.g. (Project.ID, Project.Manager) → Project.Name

Project.Manager is optional—this is not a full functional dependency E.g. (Project.Manager, Project.StartDate) → Project.Name

This is a full functional dependency, assuming a manager can launch no more than one project on a given date

Page 5: Database Systems:  Design, Implementation, and  Management

Repeating group = multivalued attributeMultivalued dependency Repeating group = multivalued attribute

Attribute whose values contain multiple values (a list or array of values), instead of a single value

Illegal in the relational model; troublesome for normalization if you don’t catch it

Functional dependency: A→B Multivalued dependency: A→B1/B2/B3/…/Bn

Instead of determining just one value of B in a table, A determines multiple values at the same time

E.g. Project.ID → Project.EmployeeID Usually indicates a problem with normalization

Page 6: Database Systems:  Design, Implementation, and  Management

Partial and transitive dependencies

Partial dependency: (A,B)→(C,D) and B→C (A,B) is a candidate key (e.g. primary key) C doesn’t need both A and B to determine it; it only needs B E.g. (Project.ID,Project.ManagerID) → Project.Name

and Project.ID → Project.Name

Transitive dependency: A→(B,C) and B→C A is a candidate key

Technically speaking, a transitive dependency requires that B and C not be part of any candidate key. However, if you expand the meaning to include even if they are part of the key, then you will avoid BCNF automatically

A determines C, but so does B, even though B is not a candidate key

E.g. Project.ID → (Project.Client,Project.Location)and Project.Client → Project.Location

Page 7: Database Systems:  Design, Implementation, and  Management

The normal forms

Page 8: Database Systems:  Design, Implementation, and  Management

Summary of attaining normal forms

1NF: Primary key identified and no multivalued attributes Legitimate primary key selected (unique identifying key) Only one value per table cell; no lists/arrays (multivalued attributes) in any table cell

If you split multivalued attributes off to separate tables, then you avoid 4NF violations

2NF: 1NF minus partial dependencies All candidate key dependencies are fully functional

(A,B)→C where A→=C and B→=C

3NF/BCNF: 2NF minus transitive dependencies Only a candidate key determines any attribute

If A→(B,C), then B →= C There is a technical distinction between 3NF and BCNF, but if you keep this rule, then you

take care of both 3NF and BCNF

4NF: BCNF minus multivalued dependencies Each row strictly describes just one entity

If you split multivalued attributes into separate tables to attain 1NF, then you also avoid 4NF violations

DKNF, 5NF, 6NF relatively rare and often not worth the trouble normalizing, even if applicable

Page 9: Database Systems:  Design, Implementation, and  Management

Dependency diagram:Basic tool for normalization Depicts all dependencies found in a given table structure Gives bird’s-eye view of all relationships among table’s

attributes Makes it less likely that you will overlook an important

dependency

Page 10: Database Systems:  Design, Implementation, and  Management
Page 11: Database Systems:  Design, Implementation, and  Management
Page 12: Database Systems:  Design, Implementation, and  Management
Page 13: Database Systems:  Design, Implementation, and  Management

3NF vs BCNF

BCNF is only an issue because of poor selection of primary key for 1NF step

Regardless, dealing with all dependencies resolves table into BCNF

Page 14: Database Systems:  Design, Implementation, and  Management

Fixing 4NF problem

The only reason a table might be in 3NF/BCNF but not in 4NF is because two originally multivalued attributes existed at 1NF stage

Two multivalued attributes should always be placed in separate tables If you do this in the

first step to resolve 1NF, you will never have problems with 4NF

Page 15: Database Systems:  Design, Implementation, and  Management

Denormalization

Page 16: Database Systems:  Design, Implementation, and  Management

Denormalization Although normalization is important, processing

speed and efficiency is also important in database design

Page 17: Database Systems:  Design, Implementation, and  Management

Sources

Most of the slides are adapted from Database Systems: Design, Implementation and Management by Carlos Coronel and Steven Morris. 11th edition (2015) published by Cengage Learning. ISBN 13: 978-1-285-19614-5

Other sources are noted on the slides themselves

Page 18: Database Systems:  Design, Implementation, and  Management