Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

38
Copyright, Harris Corporation & Ophir Frieder, 1998 1 The Process of Normalization

Transcript of Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Page 1: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19981

The Process of Normalization

Page 2: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19982

Objective

• Learn how to decompose a relational scheme into 1NF, 2NF, 3NF, and BCNF.

• Learn what is meant by a lossless decomposition.

• Learn what is meant by a dependency preserving decomposition.

Page 3: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19983

Normalization - 1NF

• A repeating group is typically eliminated by “flattening” the table.

Before

SS# NAME HOBBIES

032446254 Mary Smith jogging, wrestling

045242453 Bob Jones cooking, cycling,gardening

932415223 Sue Clark quilting,hiking,

...

Page 4: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19984

Normalization - 1NF

After

SS# NAME HOBBIES

032446254 Mary Smith jogging

032446254 Mary Smith wrestling

045242453 Bob Jones cooking

045242453 Bob Jones cycling

045242453 Bob Jones gardening

:

Page 5: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19985

Decomposition

• The decomposition of a relational scheme

R={A1, A2,..., An}

is its replacement by a collection

R1,R2,...,Rk

such that R is equal to the union of the Ri’s.

• Note that there is no requirement that the Ri’s be disjoint.

Page 6: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19986

Decomposition, Cont.

• Recall the department store relational scheme:

STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#, ITEM => PRICE

STORE_ID# CITY STATE ITEM PRICE

W001 Orlando FL Duck Tape 3.95

W001 Orlando FL Rope 5.95

W002 Savannah GA Plywood 12.50

W002 Savannah GA Rope 8.75

Page 7: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19987

Decomposition #1

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Rope 5.95

W002 Plywood 12.50

W002 Rope 8.75

STORE_ID# CITY STATE

W001 Orlando FL

W002 Savannah GA

Page 8: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19988

Decomposition #1, Cont.

• Note that:

– Some redundancy is eliminated - CITY and STATE are not repeated for every item.

– Redundancy still exists; the STORE_ID# attribute appears in both tables, and multiple times in the second table.

– The contents of the original table, regardless of contents, can always be obtained by performing a natural join on the two tables, i.e., the decomposition is lossless.

– Insertion, deletion, and update anomalies do not occur.

Page 9: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19989

Decomposition #2

ITEM PRICE

Duck Tape 3.95

Rope 5.95

Plywood 12.50

Rope 8.75

STORE_ID# CITY STATE

W001 Orlando FL

W002 Savannah GA

Page 10: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199810

Decomposition #2, Cont.

• Note that:

– No redundancy is introduced by the decomposition.

– Insertion, deletion, and update anomalies are eliminated...sort of...

– The relationship between STORE_ID# and ITEM from the original scheme is lost and, consequently, the contents of the original table cannot be recovered using a join, i.e., the decomposition is lossy.

– The dependency STORE_ID#,ITEM => PRICE is not represented by either table, I.e., the decomposition does not preserve dependencies.

Page 11: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199811

Requirements of Decomposition

• In general, a decomposition should be:

– lossless, i.e., it should be able to represent any legal relation that can be represented by the original schema in a recoverable way, i.e., without losing tuples, and

– dependency preserving, i.e., every functional dependency applying to the original schema should apply to some schema in the decomposition.

• Decomposition #1 is both, decomposition #2 is neither.

Page 12: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199812

Preservation of Dependencies vs Lossless Join

• Currently, proof of the following is beyond the scope of this course. However, it is worth noting that:

– A lossless decomposition is not necessarily dependency preserving.

– A dependency preserving decomposition is not necessarily lossless.

Page 13: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199813

Preservation of Dependencies vs Lossless Join , Cont.

• FACT:

Every relational scheme has a decomposition into 3NF that has a lossless join and preserves dependencies.

• FACT:

Every relational scheme has a lossless decomposition into BCNF. This decomposition, however, is not guaranteed to preserve dependencies.

Page 14: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199814

Dependency PreservingDecomposition Into 3NF

• INPUT:– Relational scheme R and set of functional dependencies F, which is assumed,

without loss of generality, to be a minimal cover.

• ALGORITHM #1:– Step #1: If R has any attributes not involved in any dependency in F, then let each

such attribute be its own relational scheme and eliminate it from R.

– Step #2: If a single dependency in F involves all of the attributes of R, then let R be the final collection of relational schemes, in addition to any relational schemes resulting from Step #1.

– Step #3: For each dependency X=>A in F, create a relational scheme in the final decomposition containing the attributes of X and A (Note that if X=>A and X=>B are in F, then XAB can be used instead and may, in fact, be preferable).

Page 15: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199815

Dependency PreservingDecomposition Into 3NF, Cont.

Example #1

• Recall the following (modified) relational scheme for a department store chain:

– Attributes:

STORE_ID# - A department store ID number.

ITEM - An item sold by the department store.

PRICE - The price of the item.

– Functional Dependencies:

STORE_ID#,ITEM=>PRICE

Page 16: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199816

Dependency PreservingDecomposition Into 3NF

Example #1, Cont.

STORE_ID#,ITEM=>PRICE

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Nails 5.95

W002 Plywood 12.50

W002 Paint 8.75

Page 17: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199817

Dependency PreservingDecomposition Into 3NF

Example #1, Cont.

• First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply.

• Second, STORE_ID#,ITEM => PRICE contains every attribute in the relation. Consequently, Step #2 dictates that the final decomposition be the initial relational scheme. In other words, no decomposition is necessary.

• By the way, since STORE_ID#,ITEM=>PRICE is the only dependency, and since STORE_ID#,ITEM is a key, it follows that this relational scheme also happens to be in BCNF (in general, this is not guaranteed by the algorithm).

Page 18: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199818

Dependency PreservingDecomposition Into 3NF

Example #2

• Consider the following abstract relational scheme:

– Attributes:

A,B,C,D,E,F

– Functional Dependencies:

A=>B CB=>D

CD=>A AE=>F

CE=>D

Note that, based on the above, the only key is CE

Page 19: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199819

Dependency PreservingDecomposition Into 3NF

Example #2, Cont.

• First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply.

• Second, there is no dependency that contains all of the attributes. Consequently Step #2 of the algorithm does not apply.

• It follows from Step #3 that the relational scheme can be decomposed into the following:

AB CBD

CDA AEF

CED

Page 20: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199820

Dependency PreservingDecomposition Into 3NF With a

Lossless Join

• INPUT:– Relational scheme R and set of functional dependencies F which is assumed,

without loss of generality, to be a minimal cover.

• ALGORITHM #2:– Step #1: Construct a dependency preserving decomposition of R into 3NF using

Algorithm #1.

– Step #2: Let X be a key for R, and add a relational scheme consisting of all of the attributes in X.

Page 21: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199821

Dependency PreservingDecomposition Into 3NF With a

Lossless Join, Cont.

• Example #1 (revisited):

Note that there is only one relational scheme in the decomposition resulting from the application of Algorithm #1. Consequently, that decomposition is lossless, by definition.

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Nails 5.95

W002 Plywood 12.50

W002 Paint 8.75

Page 22: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199822

Dependency PreservingDecomposition Into 3NF With a

Lossless Join, Cont.

• Example #2 (revisited):

Since CE is the only key for the original relation ABCDEF, Step #2 in Algorithm #2 dictates that CE be added to the result of Algorithm #1.

AB CBD

CDA AEF

CED CE

Page 23: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199823

Lossless Join DecompositionInto BCNF

• INPUT:– Relational scheme R and set of functional dependencies F.

• ALGORITHM #3:– Let D be an initial decomposition consisting of R alone;

– while (D contains a relation R’ that is not in BCNF) loop

Let X=>A be a functional dependency that holds in R’ where X is not a superkey and A is not in X;

Replace R’ by S1and S2 where S1 consists of A and the attributes of X, and S2 consists of the attributes of R’ except for A;

end loop;

Page 24: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199824

Lossless Join DecompositionInto BCNF, Cont.

• Algorithm #3 is actually somewhat more complex.

INPUT:

– Relational scheme R and set of functional dependencies F.

ALGORITHM #3:

– Let D be an initial decomposition consisting of R alone;

– Let F be a set of functional dependencies for R;

– while (D contains a relation R’ that is not in BCNF with respect to F’)

Let X=>A be a functional dependency in F’ that holds in R’ where X is not a superkey and A is not in X;

Replace R’ by S1and S2 where S1 consists of A and the attributes of X, and S2 consists of the attributes of R’ except for A;

Compute F’+, and project it onto S1 and S2 to get F1 and F2;

Convert F1 and F2 to minimum covers;

end loop;

Page 25: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199825

Lossless Join DecompositionInto BCNF, Cont.

• Note that, in general, F+ includes– All dependencies in F

– All trivial dependencies

– All dependencies that follow from Armstrong’s axioms

• More specifically, the size of F+ can be exponential in the size of F. Thus, computing F+ is, in general, impractical.

• Also note that determining if a relational scheme is in BCNF is, in general, NP-Complete (i.e., will require exponential time) and is therefore impractical as well.

Collectively, these facts mean that Algorithm #3 is of more theoretical rather than practical interest, especially for complex relations.

Page 26: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199826

Lossless Join DecompositionInto BCNF

Example #1:• Recall the following relational scheme for a department store chain

(e.g., Walmart):

– Attributes:STORE_ID# - A store identification number.

CITY - The city in which the store is located.

STATE - The state in which the store is located.

ITEM - An item sold by the store.

PRICE - The price of the item.

– Functional Dependencies:STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#, ITEM => PRICE

Page 27: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199827

Lossless Join DecompositionInto BCNF

Example #1, Cont.• Initial decomposition:

STORE_ID#,CITY,STATE,ITEM,PRICE.

• Minimal cover:

STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#,ITEM => PRICE

• Key:

STORE_ID#,ITEM

Page 28: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199828

Lossless Join DecompositionInto BCNF

Example #1, Cont.• The relational scheme is not in BCNF since, for example, the

dependency STORE_ID# => CITY holds, yet STORE_ID# is not a superkey, and CITY (the RHS) is not part of STORE_ID# (the LHS).

• By Algorithm #3, the relational scheme can be decomposed into

STORE_ID#,CITY

STORE_ID#,STATE,ITEM,PRICE

• STORE_ID# => CITY is a minimal cover for STORE_ID#,CITY, with STORE_ID# as the only key.

• STORE_ID#,CITY is in BCNF, and does not need to be decomposed further.

Page 29: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199829

Lossless Join DecompositionInto BCNF

Example #1, Cont.• The remaining relational scheme:

STORE_ID#,STATE,ITEM,PRICE.

• Minimal cover:

STORE_ID# => STATE

STORE_ID#,ITEM => PRICE

• Key:

STORE_ID#,ITEM

Page 30: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199830

Lossless Join DecompositionInto BCNF

Example #1, Cont.

• The relational scheme is not in BCNF since STORE_ID# is not a superkey and STORE_ID# => STATE holds.

• By Algorithm #3, the relational scheme can be decomposed into

STORE_ID#,STATE

STORE_ID#,ITEM,PRICE

• STORE_ID# => STATE is a minimal cover for STORE_ID#,STATE, with STORE_ID# as the only key.

• STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

Page 31: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199831

Lossless Join DecompositionInto BCNF

Example #1, Cont.

• STORE_ID#,ITEM => PRICE holds for STORE_ID#,ITEM,PRICE with STORE_ID#,ITEM as the only key.

• STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

• Final relational schemes:

STORE_ID#,CITY

STORE_ID#,STATE

STORE_ID#,ITEM,PRICE

Page 32: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199832

Lossless Join DecompositionInto BCNFExample #2

• Consider again the following abstract relational scheme:

– Attributes:

A,B,C,D,E,F

– Functional Dependencies:

A=>B CB=>D

CD=>A AE=>F

CE=>D

Note that the only key is CE

Page 33: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199833

Lossless Join DecompositionInto BCNF

Example #2, Cont.• The initial decomposition consists of one relational scheme ABCDEF.

• ABCDEF is not in BCNF since, for example, the dependency AE=>F holds, yet AE is not a superkey, and F is not in AE.

• By Algorithm #3, ABCDEF can be decomposed into AEF and ABCDE.

• {AE=>F} holds for AEF, with AE as the only key.

• AEF is in BCNF (why?), and does not need to be decomposed further.

• {A=>B, CB=>D, CD=>A, CE=>D} is a minimal cover for ABCDE, with CE as the only key.

Page 34: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199834

Lossless Join DecompositionInto BCNF

Example #2, Cont.• ABCDE is not in BCNF since, for example, A is not a superkey and

A=>B holds.

• By Algorithm #3, ABCDE can be decomposed into AB and ACDE.

• {A=>B} is a minimal cover for AB, with A as the only key.

• AB is in BCNF (why?), and does not need to be decomposed further.

• {AC=>D, CD=>A, CE=>D} is a minimal cover for ACDE, with CE as the only key.

Question: Where did AC=>D come from?

Answer: AC=>D is not in the original set of functional dependencies.

However, it is implied by them.

Page 35: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199835

Lossless Join DecompositionInto BCNF

Example #2, Cont.• ACDE is not in BCNF since, for example, AC is not a superkey and

AC=>D holds.

• By Algorithm #3, ACDE could be decomposed into ACD and ACE.

• {AC=>D} is a minimal cover for ACD, which AC as the only key.

• ACD is in BCNF (why?), and does not need to be decomposed further.

• {CE=>A} is a minimal cover for ACE, which CE as the only key (note that CE=>A is not in the original set of functional dependencies. However, it is implied by them).

• ACE is in BCNF (why?), and does not need to be decomposed further.

Page 36: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199836

General Guidelines

“From a relational point of view, it is standard to have tables that are in Third Normal Form.”

-Sybase SQL Server Performance and Tuning Guide

“It turns out that in some circumstances, Boyce-Codd normal form is too strong a condition,...Thus third normal form has seen use as a condition that has almost the benefits of Boyce-Codd normal form...”

-Principles of Database Systems, by Jeffery D. Ullman

Page 37: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199837

General Guidelines, Cont.

“It is interesting to conjecture that all functional dependencies that satisfy third normal form but violate Boyce-Codd normal form are in a sense irrelevant.”

-Principles of Database Systems, by Jeffery D. Ullman

“...we feel that the third normal form is the most important normal form...”

-Database Management, by Ralph B. Bisland, Jr.

Page 38: Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 199838

General Guidelines, Cont.

• Also, as noted previously, any relational scheme can be decomposed into a collection of 3NF relational schemes that preserve dependencies and has a lossless join.

• Algorithm #3, for decomposing a relational scheme into BCNF which is lossless, is, in general, very inefficient.

• It is unlikely that the normalization process will begin with one big relation in 0NF, which will then be converted successively to 1NF, 2NF, etc. In general, it is more likely that the process will start out somewhere in the middle.

• Common sense and utility must guide arbitrary choices.