Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Post on 20-Jan-2016

227 views 0 download

Tags:

Transcript of Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Copyright, Harris Corporation &

Ophir Frieder, 19981

The Process of Normalization

Copyright, Harris Corporation &

Ophir Frieder, 19982

Objective

• Learn how to decompose a relational scheme into 1NF, 2NF, 3NF, and BCNF.

• Learn what is meant by a lossless decomposition.

• Learn what is meant by a dependency preserving decomposition.

Copyright, Harris Corporation &

Ophir Frieder, 19983

Normalization - 1NF

• A repeating group is typically eliminated by “flattening” the table.

Before

SS# NAME HOBBIES

032446254 Mary Smith jogging, wrestling

045242453 Bob Jones cooking, cycling,gardening

932415223 Sue Clark quilting,hiking,

...

Copyright, Harris Corporation &

Ophir Frieder, 19984

Normalization - 1NF

After

SS# NAME HOBBIES

032446254 Mary Smith jogging

032446254 Mary Smith wrestling

045242453 Bob Jones cooking

045242453 Bob Jones cycling

045242453 Bob Jones gardening

:

Copyright, Harris Corporation &

Ophir Frieder, 19985

Decomposition

• The decomposition of a relational scheme

R={A1, A2,..., An}

is its replacement by a collection

R1,R2,...,Rk

such that R is equal to the union of the Ri’s.

• Note that there is no requirement that the Ri’s be disjoint.

Copyright, Harris Corporation &

Ophir Frieder, 19986

Decomposition, Cont.

• Recall the department store relational scheme:

STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#, ITEM => PRICE

STORE_ID# CITY STATE ITEM PRICE

W001 Orlando FL Duck Tape 3.95

W001 Orlando FL Rope 5.95

W002 Savannah GA Plywood 12.50

W002 Savannah GA Rope 8.75

Copyright, Harris Corporation &

Ophir Frieder, 19987

Decomposition #1

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Rope 5.95

W002 Plywood 12.50

W002 Rope 8.75

STORE_ID# CITY STATE

W001 Orlando FL

W002 Savannah GA

Copyright, Harris Corporation &

Ophir Frieder, 19988

Decomposition #1, Cont.

• Note that:

– Some redundancy is eliminated - CITY and STATE are not repeated for every item.

– Redundancy still exists; the STORE_ID# attribute appears in both tables, and multiple times in the second table.

– The contents of the original table, regardless of contents, can always be obtained by performing a natural join on the two tables, i.e., the decomposition is lossless.

– Insertion, deletion, and update anomalies do not occur.

Copyright, Harris Corporation &

Ophir Frieder, 19989

Decomposition #2

ITEM PRICE

Duck Tape 3.95

Rope 5.95

Plywood 12.50

Rope 8.75

STORE_ID# CITY STATE

W001 Orlando FL

W002 Savannah GA

Copyright, Harris Corporation &

Ophir Frieder, 199810

Decomposition #2, Cont.

• Note that:

– No redundancy is introduced by the decomposition.

– Insertion, deletion, and update anomalies are eliminated...sort of...

– The relationship between STORE_ID# and ITEM from the original scheme is lost and, consequently, the contents of the original table cannot be recovered using a join, i.e., the decomposition is lossy.

– The dependency STORE_ID#,ITEM => PRICE is not represented by either table, I.e., the decomposition does not preserve dependencies.

Copyright, Harris Corporation &

Ophir Frieder, 199811

Requirements of Decomposition

• In general, a decomposition should be:

– lossless, i.e., it should be able to represent any legal relation that can be represented by the original schema in a recoverable way, i.e., without losing tuples, and

– dependency preserving, i.e., every functional dependency applying to the original schema should apply to some schema in the decomposition.

• Decomposition #1 is both, decomposition #2 is neither.

Copyright, Harris Corporation &

Ophir Frieder, 199812

Preservation of Dependencies vs Lossless Join

• Currently, proof of the following is beyond the scope of this course. However, it is worth noting that:

– A lossless decomposition is not necessarily dependency preserving.

– A dependency preserving decomposition is not necessarily lossless.

Copyright, Harris Corporation &

Ophir Frieder, 199813

Preservation of Dependencies vs Lossless Join , Cont.

• FACT:

Every relational scheme has a decomposition into 3NF that has a lossless join and preserves dependencies.

• FACT:

Every relational scheme has a lossless decomposition into BCNF. This decomposition, however, is not guaranteed to preserve dependencies.

Copyright, Harris Corporation &

Ophir Frieder, 199814

Dependency PreservingDecomposition Into 3NF

• INPUT:– Relational scheme R and set of functional dependencies F, which is assumed,

without loss of generality, to be a minimal cover.

• ALGORITHM #1:– Step #1: If R has any attributes not involved in any dependency in F, then let each

such attribute be its own relational scheme and eliminate it from R.

– Step #2: If a single dependency in F involves all of the attributes of R, then let R be the final collection of relational schemes, in addition to any relational schemes resulting from Step #1.

– Step #3: For each dependency X=>A in F, create a relational scheme in the final decomposition containing the attributes of X and A (Note that if X=>A and X=>B are in F, then XAB can be used instead and may, in fact, be preferable).

Copyright, Harris Corporation &

Ophir Frieder, 199815

Dependency PreservingDecomposition Into 3NF, Cont.

Example #1

• Recall the following (modified) relational scheme for a department store chain:

– Attributes:

STORE_ID# - A department store ID number.

ITEM - An item sold by the department store.

PRICE - The price of the item.

– Functional Dependencies:

STORE_ID#,ITEM=>PRICE

Copyright, Harris Corporation &

Ophir Frieder, 199816

Dependency PreservingDecomposition Into 3NF

Example #1, Cont.

STORE_ID#,ITEM=>PRICE

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Nails 5.95

W002 Plywood 12.50

W002 Paint 8.75

Copyright, Harris Corporation &

Ophir Frieder, 199817

Dependency PreservingDecomposition Into 3NF

Example #1, Cont.

• First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply.

• Second, STORE_ID#,ITEM => PRICE contains every attribute in the relation. Consequently, Step #2 dictates that the final decomposition be the initial relational scheme. In other words, no decomposition is necessary.

• By the way, since STORE_ID#,ITEM=>PRICE is the only dependency, and since STORE_ID#,ITEM is a key, it follows that this relational scheme also happens to be in BCNF (in general, this is not guaranteed by the algorithm).

Copyright, Harris Corporation &

Ophir Frieder, 199818

Dependency PreservingDecomposition Into 3NF

Example #2

• Consider the following abstract relational scheme:

– Attributes:

A,B,C,D,E,F

– Functional Dependencies:

A=>B CB=>D

CD=>A AE=>F

CE=>D

Note that, based on the above, the only key is CE

Copyright, Harris Corporation &

Ophir Frieder, 199819

Dependency PreservingDecomposition Into 3NF

Example #2, Cont.

• First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply.

• Second, there is no dependency that contains all of the attributes. Consequently Step #2 of the algorithm does not apply.

• It follows from Step #3 that the relational scheme can be decomposed into the following:

AB CBD

CDA AEF

CED

Copyright, Harris Corporation &

Ophir Frieder, 199820

Dependency PreservingDecomposition Into 3NF With a

Lossless Join

• INPUT:– Relational scheme R and set of functional dependencies F which is assumed,

without loss of generality, to be a minimal cover.

• ALGORITHM #2:– Step #1: Construct a dependency preserving decomposition of R into 3NF using

Algorithm #1.

– Step #2: Let X be a key for R, and add a relational scheme consisting of all of the attributes in X.

Copyright, Harris Corporation &

Ophir Frieder, 199821

Dependency PreservingDecomposition Into 3NF With a

Lossless Join, Cont.

• Example #1 (revisited):

Note that there is only one relational scheme in the decomposition resulting from the application of Algorithm #1. Consequently, that decomposition is lossless, by definition.

STORE_ID# ITEM PRICE

W001 Duck Tape 3.95

W001 Nails 5.95

W002 Plywood 12.50

W002 Paint 8.75

Copyright, Harris Corporation &

Ophir Frieder, 199822

Dependency PreservingDecomposition Into 3NF With a

Lossless Join, Cont.

• Example #2 (revisited):

Since CE is the only key for the original relation ABCDEF, Step #2 in Algorithm #2 dictates that CE be added to the result of Algorithm #1.

AB CBD

CDA AEF

CED CE

Copyright, Harris Corporation &

Ophir Frieder, 199823

Lossless Join DecompositionInto BCNF

• INPUT:– Relational scheme R and set of functional dependencies F.

• ALGORITHM #3:– Let D be an initial decomposition consisting of R alone;

– while (D contains a relation R’ that is not in BCNF) loop

Let X=>A be a functional dependency that holds in R’ where X is not a superkey and A is not in X;

Replace R’ by S1and S2 where S1 consists of A and the attributes of X, and S2 consists of the attributes of R’ except for A;

end loop;

Copyright, Harris Corporation &

Ophir Frieder, 199824

Lossless Join DecompositionInto BCNF, Cont.

• Algorithm #3 is actually somewhat more complex.

INPUT:

– Relational scheme R and set of functional dependencies F.

ALGORITHM #3:

– Let D be an initial decomposition consisting of R alone;

– Let F be a set of functional dependencies for R;

– while (D contains a relation R’ that is not in BCNF with respect to F’)

Let X=>A be a functional dependency in F’ that holds in R’ where X is not a superkey and A is not in X;

Replace R’ by S1and S2 where S1 consists of A and the attributes of X, and S2 consists of the attributes of R’ except for A;

Compute F’+, and project it onto S1 and S2 to get F1 and F2;

Convert F1 and F2 to minimum covers;

end loop;

Copyright, Harris Corporation &

Ophir Frieder, 199825

Lossless Join DecompositionInto BCNF, Cont.

• Note that, in general, F+ includes– All dependencies in F

– All trivial dependencies

– All dependencies that follow from Armstrong’s axioms

• More specifically, the size of F+ can be exponential in the size of F. Thus, computing F+ is, in general, impractical.

• Also note that determining if a relational scheme is in BCNF is, in general, NP-Complete (i.e., will require exponential time) and is therefore impractical as well.

Collectively, these facts mean that Algorithm #3 is of more theoretical rather than practical interest, especially for complex relations.

Copyright, Harris Corporation &

Ophir Frieder, 199826

Lossless Join DecompositionInto BCNF

Example #1:• Recall the following relational scheme for a department store chain

(e.g., Walmart):

– Attributes:STORE_ID# - A store identification number.

CITY - The city in which the store is located.

STATE - The state in which the store is located.

ITEM - An item sold by the store.

PRICE - The price of the item.

– Functional Dependencies:STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#, ITEM => PRICE

Copyright, Harris Corporation &

Ophir Frieder, 199827

Lossless Join DecompositionInto BCNF

Example #1, Cont.• Initial decomposition:

STORE_ID#,CITY,STATE,ITEM,PRICE.

• Minimal cover:

STORE_ID# => CITY

STORE_ID# => STATE

STORE_ID#,ITEM => PRICE

• Key:

STORE_ID#,ITEM

Copyright, Harris Corporation &

Ophir Frieder, 199828

Lossless Join DecompositionInto BCNF

Example #1, Cont.• The relational scheme is not in BCNF since, for example, the

dependency STORE_ID# => CITY holds, yet STORE_ID# is not a superkey, and CITY (the RHS) is not part of STORE_ID# (the LHS).

• By Algorithm #3, the relational scheme can be decomposed into

STORE_ID#,CITY

STORE_ID#,STATE,ITEM,PRICE

• STORE_ID# => CITY is a minimal cover for STORE_ID#,CITY, with STORE_ID# as the only key.

• STORE_ID#,CITY is in BCNF, and does not need to be decomposed further.

Copyright, Harris Corporation &

Ophir Frieder, 199829

Lossless Join DecompositionInto BCNF

Example #1, Cont.• The remaining relational scheme:

STORE_ID#,STATE,ITEM,PRICE.

• Minimal cover:

STORE_ID# => STATE

STORE_ID#,ITEM => PRICE

• Key:

STORE_ID#,ITEM

Copyright, Harris Corporation &

Ophir Frieder, 199830

Lossless Join DecompositionInto BCNF

Example #1, Cont.

• The relational scheme is not in BCNF since STORE_ID# is not a superkey and STORE_ID# => STATE holds.

• By Algorithm #3, the relational scheme can be decomposed into

STORE_ID#,STATE

STORE_ID#,ITEM,PRICE

• STORE_ID# => STATE is a minimal cover for STORE_ID#,STATE, with STORE_ID# as the only key.

• STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

Copyright, Harris Corporation &

Ophir Frieder, 199831

Lossless Join DecompositionInto BCNF

Example #1, Cont.

• STORE_ID#,ITEM => PRICE holds for STORE_ID#,ITEM,PRICE with STORE_ID#,ITEM as the only key.

• STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

• Final relational schemes:

STORE_ID#,CITY

STORE_ID#,STATE

STORE_ID#,ITEM,PRICE

Copyright, Harris Corporation &

Ophir Frieder, 199832

Lossless Join DecompositionInto BCNFExample #2

• Consider again the following abstract relational scheme:

– Attributes:

A,B,C,D,E,F

– Functional Dependencies:

A=>B CB=>D

CD=>A AE=>F

CE=>D

Note that the only key is CE

Copyright, Harris Corporation &

Ophir Frieder, 199833

Lossless Join DecompositionInto BCNF

Example #2, Cont.• The initial decomposition consists of one relational scheme ABCDEF.

• ABCDEF is not in BCNF since, for example, the dependency AE=>F holds, yet AE is not a superkey, and F is not in AE.

• By Algorithm #3, ABCDEF can be decomposed into AEF and ABCDE.

• {AE=>F} holds for AEF, with AE as the only key.

• AEF is in BCNF (why?), and does not need to be decomposed further.

• {A=>B, CB=>D, CD=>A, CE=>D} is a minimal cover for ABCDE, with CE as the only key.

Copyright, Harris Corporation &

Ophir Frieder, 199834

Lossless Join DecompositionInto BCNF

Example #2, Cont.• ABCDE is not in BCNF since, for example, A is not a superkey and

A=>B holds.

• By Algorithm #3, ABCDE can be decomposed into AB and ACDE.

• {A=>B} is a minimal cover for AB, with A as the only key.

• AB is in BCNF (why?), and does not need to be decomposed further.

• {AC=>D, CD=>A, CE=>D} is a minimal cover for ACDE, with CE as the only key.

Question: Where did AC=>D come from?

Answer: AC=>D is not in the original set of functional dependencies.

However, it is implied by them.

Copyright, Harris Corporation &

Ophir Frieder, 199835

Lossless Join DecompositionInto BCNF

Example #2, Cont.• ACDE is not in BCNF since, for example, AC is not a superkey and

AC=>D holds.

• By Algorithm #3, ACDE could be decomposed into ACD and ACE.

• {AC=>D} is a minimal cover for ACD, which AC as the only key.

• ACD is in BCNF (why?), and does not need to be decomposed further.

• {CE=>A} is a minimal cover for ACE, which CE as the only key (note that CE=>A is not in the original set of functional dependencies. However, it is implied by them).

• ACE is in BCNF (why?), and does not need to be decomposed further.

Copyright, Harris Corporation &

Ophir Frieder, 199836

General Guidelines

“From a relational point of view, it is standard to have tables that are in Third Normal Form.”

-Sybase SQL Server Performance and Tuning Guide

“It turns out that in some circumstances, Boyce-Codd normal form is too strong a condition,...Thus third normal form has seen use as a condition that has almost the benefits of Boyce-Codd normal form...”

-Principles of Database Systems, by Jeffery D. Ullman

Copyright, Harris Corporation &

Ophir Frieder, 199837

General Guidelines, Cont.

“It is interesting to conjecture that all functional dependencies that satisfy third normal form but violate Boyce-Codd normal form are in a sense irrelevant.”

-Principles of Database Systems, by Jeffery D. Ullman

“...we feel that the third normal form is the most important normal form...”

-Database Management, by Ralph B. Bisland, Jr.

Copyright, Harris Corporation &

Ophir Frieder, 199838

General Guidelines, Cont.

• Also, as noted previously, any relational scheme can be decomposed into a collection of 3NF relational schemes that preserve dependencies and has a lossless join.

• Algorithm #3, for decomposing a relational scheme into BCNF which is lossless, is, in general, very inefficient.

• It is unlikely that the normalization process will begin with one big relation in 0NF, which will then be converted successively to 1NF, 2NF, etc. In general, it is more likely that the process will start out somewhere in the middle.

• Common sense and utility must guide arbitrary choices.