3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.
-
Upload
julian-juniper-hunter -
Category
Documents
-
view
240 -
download
1
Transcript of 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.
Spring 2007 1
3
Chapter 3.6-7
Normalization of Database Tables
Spring 2007 2
3
Normalization• Normalization is process for assigning attributes to entities
Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables
• Normalization stages 1NF - First normal form 2NF - Second normal form 3NF - Third normal form 4NF - Fourth normal form
Spring 2007 3
3Example:
Stars Movies
Owns
Studios
Starts-in
title
length
year
address
Name
Name
address
Spring 2007 4
3
Problem Example
Title Year Length FilmType StudioName StarName StarPhone
Star Wars 1977 124 Color Fox Carrie Fisher 713-872-9282
Star Wars 1977 124 Color Fox Harrison Ford 832-999-2002
Star Wars 1977 124 Color Fox Mark hamill 512-472-0282
Fugitive 1995 135 Color Universal Harrison Ford 832-999-2002
Wayne’s World 1992 95 Color Paramount Dana Carvey 882-333-9999 Wayne’s World 1992 95 Color Paramount Mike Meyers 222-355-1111
Movies
• Update anomalies: If Harrison Ford’s phone # changes, must change it in each of his tuples. If Length value of Star Wars needs to be changed, must change all occurrences
•Deletion anomalies: If we delete Wayne’s World entries from database, we also loose all info about Dana Carvey & Mike Meyers
Spring 2007 5
3
Conversion to 1NF• Repeating groups must be eliminated
Proper primary key developed • Uniquely identifies each tuple
Dependencies can be identified• undesirable dependencies allowed
– Partial » based on part of composite primary key
– Transitive
» one nonprime attribute depends on another nonprime attribute
Spring 2007 6
3
Conversion to 1NF Cont.• An attribute that is at least part of a key is known
as a prime attribute or key attribute or primary key.
Spring 2007 7
3
Example• Projects assigned to employees
• Each project has a number and a name
• Each employee has a number, a name, a job class
• Each employee working on a project, need to keep number of hours spent on project, and hourly rate.
• Project Assignments Table :
( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
• What’s the Key for this relation?
Spring 2007 8
3
Data Organization: 1NF
Spring 2007 9
3
Dependency Diagram (1NF)
PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS
PROJ_NUM --> PROJ_NAME
EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS
JOB_CLASS --> CHG_HOUR
Spring 2007 10
3
1NF Summarized• All key attributes defined• Primary Key identified• No repeating groups in table• All attributes dependent on
primary key
Spring 2007 11
3
2NF Summarized• In 1NF, but• Includes no partial dependencies• Partial dependency:
An attribute is functionally dependent on a portion of the primary key.
• Example: PROJ_NUM PROJ_NAME EMP_NUM-->EMP_NAME, JOB_CLASS,
CHG_HOURS
Spring 2007 12
3Conversion to 2NF
1. Start with 1NF format:
2. Write each key component on a separate line
3. Write dependent attributes after each key component
4. Write original key on last line
5. Write any remaining attributes after original key
6. Each component is new table
PROJECT (PROJ_NUM, PROJ_NAME)EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Spring 2007 13
3
2NF Conversion Results
Spring 2007 14
3
2NF Summarized• In 1NF• Includes no partial dependencies
• Still possible to exhibit transitive dependency Attributes may be functionally dependent on
non-key attributes
Spring 2007 15
3
Conversion to 3NF• decompose table(s) to eliminate transitive
functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)ASSIGN (PROJ_NUM, EMP_NUM, HOURS)EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)JOB (JOB_CLASS, CHG_HOUR)
Spring 2007 16
3
3NF Summarized• In 2NF• Contains no transitive dependencies
Spring 2007 18
3Boyce-Codd Normal Form
(BCNF)
• Formally, R is in BCNF if for every nontrivial FD for R, say X A, X is a superkey.
“Nontrivial” = right-side attribute not in left side. Trivial FDs examples
• AA• ABA
• Informally: the only arrows in the FD diagram are arrows out of superkeys Note: BCNF violation when relation has more than
one superkey that overlap
Spring 2007 19
3
3NF Table Not in BCNF
What normal form?
Spring 2007 20
3
Decomposition of Table Structure to Meet BCNF
Spring 2007 21
3Decomposition to Reach BCNF
Setting: relation R with FDs F.
Suppose relation R has BCNF violation X B and X not a superkey.
Spring 2007 22
31. Compute X+. Cannot be all attributes – why?
2. Decompose R into X+ and (R–X+) X.
3. Find the FD’s for the decomposed relations. Project the FD’s from F = calculate all
consequents of F that involve only attributes from X+ or only from (RX+) X.
R X+ X
Spring 2007 23
3
• Identify the violating FD
• E.g. : X1X2…Xn B1B2…Bm
• Add to the right-hand side of FD as many attributes as are functionally determined by (X1X2…Xn)
• Decompose relation R into two relations: One relation has all attributes Xs & Bs Second relation has the Xs plus any other
remaining attributes from R other than Bs
Decomposition to Reach BCNF
Spring 2007 24
3
BCNF--ExampleAssume R(S, J, T)
• S: Student
• J: subject
• T: Teacher
• Student S is taught subject J by teacher T.• Constraints:
For each subject, each student of that subject is taught by only one teacher
Each teacher teaches only one subject (but each subject is taught by several teachers)
Spring 2007 25
3
BCNF--Example
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jane Math Prof. White
Jane Physics Prof. Brown
Functional Dependencies:
•S , J T
•T J
Spring 2007 26
3
BCNF--Example• Candidate keys: {S, J} and {S, T}
• 3NF but not in BCNF• Update anomaly: if we delete the info that Jane is
studying Physics we also loose the info that Prof. Brown teaches Physics
• Solution: two relations R1{S, T} R2{T, J}
S
J
T
Spring 2007 27
3
Decomposition Based on BCNF is Necessarily Correct
Attributes A, B, C. FD: B C
Relations R1[A,B] R2[B, C]
Tuples in R: (a, b, c) Tuples in R1: (a, b)Tuples in R2: (b, c)
Natural join of R1 and R2 = (a, b, c) original relation R can be reconstructed by forming the natural join of R1 and R2.
Spring 2007 28
3
Decomposition Based on BCNF is Necessarily Correct
Attributes A, B, C. FD: B C
Relations R1[A,B] R2[B, C]
Tuples in R: (a, b, c) , (d, b, e) Tuples in R1: (a, b), (d, b)Tuples in R2: (b, c), (b, e)
Tuples in the natural join of R1 and R2: (a,b,c), (a,b, e) (d, b, c), (d, b, e)
Can (a,b,e), (d, b, c) be a bogus tuples?
Spring 2007 29
3Decomposition Based on BCNF is Necessarily Correct
• Answer: No • Because: B C i.e. if 2 tuples have same B
attribute then they must have the same C attribute. (b,c) = (b,e)
(a, b,e) = (a, b,c) and (d, b, c) = (d, b, e)
Spring 2007 30
3
Theorem• Any two-attribute relation is in BCNF.
Spring 2007 31
3
Decomposition Theorem• Suppose we decompose a relation R(X, Y, Z) into
R1(X, Y) and R2(X,Z) and project the R onto R1 and R2.
• Then join(R1, R2) is guaranteed to reconstruct R if and only if XY or XZ
• Notice that whenever we decompose because of a BNCF violation, one of the above FDs holds.
Spring 2007 32
33NF
One FD structure causes problems in BCNF:
• If you decompose, you can’t recover all of the original FD’s.
• If you don’t decompose, you violate BCNF.
Abstractly: AB C and C B.
• Example : street city zip, and zip city.
BCNF violation: C B has a left side that is not a superkey.
• Based on previous algorithm, decompose into BC and AC. But the FD AB C does not hold in new tables.
Spring 2007 33
3
ExampleA = street, B = city, C = zip.
street zip
1 Main St. 7700214000 Main St. 77005
city zip
Houston 77002Houston 77005
city street zip
Houston 1 Main St. 77002Houston 14000 Main St. 77005
zip city BCNF violation
street city zip
It is a bad idea to decompose relation because you loose the ability to check the dependency:
Decompose:
Spring 2007 34
3
Example
1 Main St. 7700214000 Main St. 770051 Main St. 33555
Houston 77002Houston 77005Boston 33555
Houston 1 Main St. 77002Houston 14000 Main St. 77005Boston 1 Main St. 33555
zip city
street city zip
It is a bad idea to decompose relation because you loose the ability to check the dependency:
Decompose:
Spring 2007 35
3
“Elegant” WorkaroundDefine the problem away.• A relation R is in 3NF iff (if and only if)
for every nontrivial FD X A, either:
1. X is a superkey, or
2. A is prime = member of at least one key.
• Thus, if we just normalize to the 3NF, the problem goes away.
Spring 2007 36
3What 3NF and BCNF Give You• There are two important properties of a
decomposition:
1. Recovery : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original.
2. Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.
Spring 2007 37
3
3NF and BCNF, Continued• We can get (1) with a BCNF decomposition.
• We can get both (1) and (2) with a 3NF decomposition.
• But we can’t always get (1) and (2) with a BCNF decomposition. street-city-zip is an example.
Spring 2007 38
3
Mutli-valued Dependencies
Fourth Normal Form
Spring 2007 39
3
Definition of MVD
• A multivalued dependency is an assertion that two attributes (sets of attributes) are independent of one another.
• Formally: A multivalued dependency (MVD) on R, X ->->Y , says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.
Spring 2007 40
3ExampleActors(name, addr, phones, cars) with MVD Name phones.
name addr phones cars sue a p1 b1
sue a p2 b2it must also have the same tuples with phones components
swapped: name addr phones cars
sue a p2 b1sue a p1 b2
Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.
Spring 2007 41
3Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983
•An actor may have more than one address
•Key? What normal form?
•Note the redundancies
•MVD: name street city
• read: name determines 1 or more street & city independent of all other attributes
Spring 2007 42
3
MVD Rules1. Every FD is an MVD.
Because if X Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples.
Example, in Actors: name addr.
• Note: the opposite is not true i.e. not every MVD is a FD
2. Complementation: if X Y, then X Z, where Z is all attributes not in X or Y. Example: since name phones holds in Actors,
the name addr cars.
Spring 2007 43
3
Splitting Doesn’t Hold• name street city holds, but
• name street does not hold Name does not determine 1 or more street independent
of city.
• name city does not hold
Spring 2007 44
3Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983
•An actor may have more than one address
•MVD: name street city
• read: name determines 1 or more street & city independent of all other attributes
•Also (complement MVD): name title year
Spring 2007 45
3
Fourth Normal Form• The redundancy that comes from MVD’s
is not removable by putting the database schema in BCNF.
• There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.
Spring 2007 46
34NF
Eliminate redundancy due to multiplicative effect of MVD’s.• Roughly: treat MVD’s as FD's for decomposition, but not for finding
keys.• Formally: R is in Fourth Normal Form if whenever MVD
X Y is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. Remember, X Y implies X Y, so 4NF is more stringent
than BCNF.
• Decompose R, using4NF violation X Y,into XY and X (R—Y). R Y X
Spring 2007 47
3Example
Drinkers(name, addr, phones, cars)• FD: name addr
• Nontrivial MVD’s: name phones
name cars.
• Only key: {name, phones, cars}
• All three dependencies above violate 4NF. Why?
• Successive decomposition yields 4NF relations:
D1(name, addr)
D2(name, phones)
D3(name, cars)
Spring 2007 48
3
Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983
name street city
Decompose into:
R1(name, street, city)
R2(name, title, year)