Lecture – 5 Course Code – MIS4102. Edgar F. Codd, the inventor of the relational model,...

40
Lecture – 5 Course Code – MIS4102

Transcript of Lecture – 5 Course Code – MIS4102. Edgar F. Codd, the inventor of the relational model,...

Page 1: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Lecture – 5Course Code –

MIS4102

Page 2: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970.

Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.

Higher normal forms were defined by other theorists in subsequent years, Fagin introduced Forth and Fifth normal form (Fagin 1977, 1979).

The most recent being the Sixth Normal Form (6NF) introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.

Introduction

Page 3: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Why Normalization? The step-by-step process of identifying and

eliminating data redundancies and inconsistencies is called normalization.

Tables are the basic building blocks of the database, so good database design must be matched with good table structures.

Normalization enables us to recognize bad table structures and allow us to create good table structures.

Page 4: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Redundancy & Anomalies Data redundancy = data stored in several places

Too much data redundancy causes problems--which value is correct?

Data integrity and consistency suffer Data anomaly = abnormal data relationships

Insertion anomaly - Can’t add data because don’t know entire primary key value, e.g., primary key based on first, middle, and last name

Deletion anomaly - Deletions result in too many fields being removed unintentionally, e.g., delete an employee but lose transaction data

Update anomaly - Change requires many updates, e.g., if you store customer names in transaction tables

Page 5: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Normalization stages are called normal forms (1NF, 2NF and 3NF) each better than previous (less anomalies/redundancy).

Highest level not always the most desirable.

Most professionally designed databases reach third normal form.

Fourth and Fifth normal forms are seldom used.

Normalization (Cont…)

Page 6: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Non-loss Decomposition

The process of transforming an un- normalized data set into a fully normalized database is frequently referred to as a process of non-loss decomposition.

This is because we continually fragment our data structure into more tables without losing the fundamental relationships between data items.

Page 7: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Normalization Example

To recognize good design, first look at bad one

Example, construction company manages several projects and whose charges are dependent on employee’s position.

Page 8: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Desired Report

Proj No Proj Name Emp No Emp Name Job Class Chg/Hr ($) Hrs Billed Tot Chg ($)

1 Hurricane 101 Kamal Hossain Elec Eng 65 13 845

102 David Pol Comm Eng 60 16 960

104 Didar Ahmed Comm Eng 60 19 1,140

Sub Total 2,945

2 Coast 101 Kamal Hossain Elec Eng 65 15 975

103 Younus Mia Asst Eng 55 17 935

Sub Total 1,910

3 Satellite 104 Didar Ahmed Comm Eng 60 18 1,080

102 David Pol Comm Eng 60 14 840

Sub Total 1,920

Total 6,775

Page 9: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Table view of the previous report

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

1 Hurricane 101 Kamal Hossain Elec Eng 65 13

102 David Pol Comm Eng 60 16

104 Didar Ahmed Comm Eng 60 19

2 Coast 101 Kamal Hossain Elec Eng 65 15

103 Younus Mia Asst Eng 55 17

3 Satellite 104 Didar Ahmed Comm Eng 60 18

102 David Pol Comm Eng 60 14

Page 10: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problems P_No intended to be primary key but contains

null values

Data redundancies Invites data inconsistencies (Elect Eng & EE) Wastes data entry time, wastes storage space

Anomalies Update anomaly – modify Job_Class for E_No 101

requires many alterations Insert anomaly – to add a project row we need an

employee Deletion anomaly – delete E_No 101, we delete

other vital data too

Page 11: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problems (cont…)

Table above has repeating groups Each P_No has a group of entries

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

1 Hurricane 101 Kamal Hossain Elec Eng 65 13

102 David Pol Comm Eng 60 16

104 Didar Ahmed Comm Eng 60 19

Page 12: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Conversion to 1NF

Eliminate repeating groups By adding entries in primary key column

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

1 Hurricane 101 Kamal Hossain Elec Eng 65 13

1 Hurricane 102 David Pol Comm Eng 60 16

1 Hurricane 104 Didar Ahmed Comm Eng 60 19

2 Coast 101 Kamal Hossain Elec Eng 65 15

2 Coast 103 Younus Mia Asst Eng 55 17

3 Satellite 104 Didar Ahmed Comm Eng 60 18

3 Satellite 102 David Pol Comm Eng 60 14

Page 13: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problems

Primary key P_No does not uniquely identify all attributes in row

Must create composite key made up of P_No & E_No

Page 14: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Dependency Diagram

Helps us to discover relationships between entity attributes

Upper arrows implies dependency on P_No & E_No Lower arrows implies dependency on only one

attribute

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

Page 15: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Dependencies

Upper arrows If you know P_No & E_No you can determine the

other row values Lower arrows

Partial dependencies – based on only part of key P_Name only dependent on P_No E_Name, Job_Class, Chg_Hr only dependent on E_No

Dependency diagram may be written: P_No, E_No P_Name, E_Name, Job_Class, Chg_Hr, Hrs P_No P_Name E_No E_Name, Job_Class, Chg_Hr

Page 16: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

New Table (1NF) Composite primary key P_No & E_No

Charges Table

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

1 Hurricane 101 Kamal Hossain Elec Eng 65 13

1 Hurricane 102 David Pol Comm Eng 60 16

1 Hurricane 104 Didar Ahmed Comm Eng 60 19

2 Coast 101 Kamal Hossain Elec Eng 65 15

2 Coast 103 Younus Mia Asst Eng 55 17

3 Satellite 104 Didar Ahmed Comm Eng 60 18

3 Satellite 102 David Pol Comm Eng 60 14

Page 17: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

1NF Definition

1. All the key attributes are defined Any attribute that is part of the primary key

2. There are no repeating groups in the table Each cell can contain one and only one

value, rather than set

3. All attributes are dependent on the primary key

Page 18: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problem - Partial Dependencies

Contains partial dependencies Dependencies base on only part of the primary key

This makes table subject to data redundancies and hence to data anomalies

Redundancy caused by fact that every row entry requires duplicate data E.g., suppose E_No 104 is entered 20 times, must

also enter E_Name, Job_Class, Chg_Hr

Anomalies caused by redundancy E.g., employee name may be spelled Didar Ahmed,

Dedar Ahmed, Diader Ahmad or D. Ahmed

Page 19: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Conversion to 2NF

1. Starting with 1NF write each of the key components on separate lines, then write the original key on the last line

P_NoE_NoP_No E_No

Each will become key in a new table Here, original table split into three tables

Page 20: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Conversion to 2NF (cont…)

2. Write the dependent attributes after each of the new keys using the dependency diagram

P_No P_NameE_No E_Name, Job_Class, Chg_HrP_No E_No Hrs

Page 21: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Three New Tables (2NF)

P_No P_Name

1 Hurricane

2 Coast

3 Satellite

E_No E_Name Job_Class Chg_Hr

101 Kamal Hossain Elec Eng 65

102 David Pol Comm Eng 60

103 Younus Mia Asst Eng 55

104 Didar Ahmed Comm Eng 60

Project Table

Employee Table

Assign Table

P_No E_No Hrs

1 101 13

1 102 16

1 104 19

2 101 15

2 103 17

3 104 18

3 102 14

Page 22: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

2NF Definition

1. Table is in 1NF and2. It includes no partial dependencies (no

attribute is dependent on only a portion of the primary key)

**Note: Since partial dependencies can exist only if there is a composite key, a table with a single attribute as primary key is automatically in 2NF if it is in 1NF

Page 23: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problem - Transitive Dependency

Note that Chg_Hr is dependent on Job_Class, but neither Chg_Hr nor Job_Class is part of the primary key

This is called transitive dependency A condition in which an attribute is functionally

dependent on non-key attributes (another attribute that is not part of the primary key)

Transitive dependency yields data anomalies

Page 24: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Conversion to 3NF

Break off the pieces that are identified by the transitive dependency arrows (lower arrows) in the dependency diagram

Store them in a separate tableP_No P_NameE_No E_Name, Job_ClassP_No E_No HrsJob_Class Chg_Hr

**Note: Job_Class must be retained in Employee table to establish a link to the newly created Job table

Page 25: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

New Tables (3NF)

P_No P_Name

1 Hurricane

2 Coast

3 Satellite

E_No E_Name Job_Class

101 Kamal Hossain Elec Eng

102 David Pol Comm Eng

103 Younus Mia Asst Eng

104 Didar Ahmed Comm Eng

Project Table

Employee Table

Assign Table P_No E_No Hrs

1 101 13

1 102 16

1 104 19

2 101 15

2 103 17

3 104 18

3 102 14

Job_Class Chg_Hr

Elec Eng 65

Comm Eng 60

Asst Eng 55

Job Table

Page 26: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

3NF Definition

1. Table is in 2NF and2. It contains no transitive dependencies

Page 27: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Problem

Although the four tables are in 3NF, we have a potential problem

The Job_Class is entered for each new employee in the Employee table

For example, too easy to enter Electrical Engr, or EE, or El Eng

E_No E_Name Job_Class

101 Kamal Hossain Elec Eng

102 David Pol Comm Eng

103 Younus Mia Asst Eng

104 Didar Ahmed Comm Eng

105 Ali Ahmed Asst Eng

106 Nipa Ahmed Elec Eng

Employee Table

Page 28: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

New Attribute Create a Job_Code attribute to serve as primary

key in the Job table and as a foreign key in the Employee table

Page 29: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

New Tables

P_No P_Name

1 Hurricane

2 Coast

3 Satellite

E_No E_Name Job_Code

101 Kamal Hossain 500

102 David Pol 501

103 Younus Mia 502

104 Didar Ahmed 501

Project Table

Employee Table

Assign Table P_No E_No Hrs

1 101 13

1 102 16

1 104 19

2 101 15

2 103 17

3 104 18

3 102 14

Job_Code Job_Class Chg_Hr

500 Elec Eng 65

501 Comm Eng 60

502 Asst Eng 55

Job Table

Page 30: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Another Example…

Page 31: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

The Problem:Keeping Track of a Stack of Invoices

Page 32: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Required TableOrders

order_id order_datecustomer_idcustomer_namecustomer_addresscustomer_citycustomer_stateitem_iditem_descriptionitem_qtyitem_price

Page 33: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

First Normal Form:No Repeating Elements or Groups of Elements

NF1 addresses two issues:

1. A row of data cannot contain repeating groups of similar data (atomicity)

2. Each row of data must have a unique identifier (or Primary Key)

Orders

order_id (PK)order_datecustomer_idcustomer_namecustomer_addresscustomer_citycustomer_stateitem_id (PK)item_descriptionitem_qtyitem_price

Page 34: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Second Normal Form:No Partial Dependencies on a Concatenated Key

Here we test each table for partial dependencies on a concatenated key.

This means that for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence.

If any column only depends upon one part of the concatenated key, then we say that the entire table has failed Second Normal Form and we must create another table to rectify the failure.

Page 35: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Checking Partial Dependencies

Ordersorder_id (PK)order_date customer_id ?customer_name ?customer_address ?customer_city ?customer_state ?item_id (PK)item_description item_qty item_price

Orders

order_id (PK)order_datecustomer_idcustomer_namecustomer_addresscustomer_citycustomer_stateitem_id (PK)item_descriptionitem_qtyitem_price

Page 36: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Second Normal Form:

ordersorder_id (PK)order_datecustomer_idcustomer_namecustomer_addresscustomer_citycustomer_state

itemsitem_id (PK)item_descriptionitem_price

order_itemsorder_id (PK)item_id (PK)item_qty

Page 37: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Third Normal Form:No Dependencies on Non-Key Attributes

Here, we return to the problem of the repeating customer information. As our database now stands, if a customer places more than one order then we have to input all of that customer's contact information again. This is because there are columns in the orders table that rely on "non-key attributes".

Page 38: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Third Normal Form:

ordersorder_id (PK)customer_id (FK)order_date

itemsitem_id (PK)item_descriptionitem_price

order_itemsorder_id (PK)item_id (PK)item_qty

customers customer_id (PK)customer_namecustomer_addresscustomer_citycustomer_state

Page 39: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Summary

1NF – Eliminate repeating groups

2NF – Eliminate partial dependencies

3NF – Eliminate transitive dependencies

Page 40: Lecture – 5 Course Code – MIS4102.  Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know.

Thanks to All