Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB...

16
Data Normalization Database Management Systems Reading: Hoffer Chapter 4 Dr. Wingyan Chung

Transcript of Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB...

Page 1: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Data Normalization

Database Management Systems Reading: Hoffer Chapter 4 Dr. Wingyan Chung

Page 2: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

2

In logical DB design … n  Relations may not be well-structured, e.g.,

n  ORDERED_PRODUCT (Order_Line_ID, Product_ID, ProductName, ProductPrice, StandardPrice)

n  Unnecessary duplication in attribute values n  StandardPrice vs. ProductPrice

n  Some attributes’ values determine other attributes’ values

n  ProductID -> ProductPrice n  Multiple themes exist in a relation, causing anomalies

in insertion, deletion, and update n  CustomerOrder (CustomerID, Name, OrderID,

OrderDate, ProductOrdered, Quantity)

Page 3: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

3

Data Normalization n  Validates and improves a logical DB

design n  Decomposes relations that have

anomalies to produce smaller, well-structured relations

n  Different levels of normalization are achieved n  Our focus: First, second, third normal

forms

Page 4: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Example – Figure 4.2b

n  Insertion anomaly – can’t enter a new employee without having the employee take a class

n  Deletion anomaly – if we remove employee 140, we lose information about the existence of a Tax Acc class

n  Modification anomaly – giving a salary increase to employee 100 forces us to update multiple records 4

Page 5: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Problem: table is not atomic

n  Is it a relation? n  No (each cell of a relation must have only

one value) 5

Page 6: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

An improved version

n  No multivalued attributes n  Every attribute value is atomic n  Fig. 4-25 is not in 1st Normal Form (1NF) è it is not a

relation n  By definition, all relations are in 1st Normal Form (no multivalued

attributes) n  Fig. 4-26 is in 1NF, but still not well-structured 6

Page 7: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

7

Anomalies in this Table n  Insertion – if new product is ordered for

order 1007 of existing customer, customer data must be re-entered, causing duplication

n  Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price

n  Update – changing the price of product ID 4 requires update in several records

Page 8: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Why do anomalies exist? n  Multiple themes (entity types) exist in one

relation, causing unnecessary dependencies among attributes n  Normally, all attributes are functionally dependent on

primary key only n  Functional dependency = relationship between attributes

such that one attribute’s values are determined by the other attributes

n  E.g., Emp_ID, Course_Title → Salary n  (LHS = determinant; RHS = non-key attribute)

n  Anomalies exist when some other attributes’ values depend on values of non-PK attributes or only part of PK n  Partial functional dependency

8

Page 9: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Partial functional dependency n  Examples of partial functional

dependencies are n  Product_ID è Product_Description n  Product_ID è Unit_Price n  Order_ID è Order_Date

n  Must remove partial FD for a relation to be in second normal form (2NF) n  2NF = 1NF PLUS every non-key attribute is

fully functionally dependent on the ENTIRE primary key (not just some components of PK)

9

Page 10: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

10

Order_ID è Order_Date, Customer_ID, Customer_Name, Customer_Address

Therefore, NOT in 2nd Normal Form

Customer_ID è Customer_Name, Customer_Address Product_ID è Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID è Order_Quantity

Figure 4-27 Functional dependency diagram for INVOICE

Page 11: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Getting the relations to 2NF

n  Partial Dependencies are removed, but there are still transitive dependencies n  Non-key attributes determine values of some

other non-key attributes, e.g., n  Order_ID è Customer_ID è Customer_ Address

11

Figure 4-28 Removing partial dependencies

Page 12: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

12

Transitive Dependency

n  Examples of anomaly n  Insertion – Must duplicate John Doe’s data

if he places more order n  Deletion – Permanently loses Mary Smith’s

data if order 1004 is canceled n  Update – Need to update multiple records

when John Doe changes his address

Order_ID Order_Date Customer_ID Customer_NameCustomer_Address1001 10/22/2005 501 John Doe 100 Mesa St.1002 10/23/2005 501 John Doe 100 Mesa St.1003 10/24/2005 501 John Doe 100 Mesa St.1004 10/24/2005 504 Mary Smith 200 Sun Dr.1005 10/24/2005 505 Susan Young 5243 Hill Blvd.

Page 13: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Getting the relations to 3NF

n  3NF = 2NF PLUS no transitive dependencies (no functional dependencies on non-PK attributes) n  Non-key determinant with transitive dependencies go

into a new table n  Non-key determinant becomes primary key in the new

table and stays as foreign key in the old table

13

Figure 4-29 Removing partial dependencies

Getting it into Third Normal Form

Page 14: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

Merging Relations n  View Integration–Combining entities from

multiple ER models into common relations n  Issues to watch out for when merging entities

from different ER models: n  Synonyms–two or more attributes with different

names but same meaning n  Homonyms–attributes with same name but different

meanings n  Transitive dependencies–even if relations are in

3NF prior to merging, they may not be after merging

n  Supertype/subtype relationships–may be hidden prior to merging

14

Page 15: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

15

Figure 4-31 Enterprise keys

a) Relations with enterprise key

b) Sample data with enterprise key

•  Primary keys that are unique in the whole database, not just within a single relation

•  Corresponds with the concept of an object ID in object-oriented systems

Page 16: Data Normalization · 2019-04-03 · 3 Data Normalization ! Validates and improves a logical DB design ! Decomposes relations that have anomalies to produce smaller, well-structured

16

Summary n  Anomalies exist when attribute values in a table are

determined by non-PK attributes or only part of PK n  The table is in 1NF (i.e., a relation) if it contains no

multivalued attribute n  The relation is in

n  2NF if all non-key attributes are determined by the entire PK (not part of it) – i.e., no partial functional dependencies

n  3NF if all non-key attributes are determined only by the PK (not other non-PK attributes) – i.e., no transitive dependencies

n  Solution: Decomposing large relations into smaller relations n  Remove partial and transitive dependencies n  Possibly with FK referencing to parent relations