3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

47
Spring 2007 1 3 Chapter 3.6-7 Normalization of Database Tables

Transcript of 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Page 1: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 1

3

Chapter 3.6-7

Normalization of Database Tables

Page 2: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 2

3

Normalization• Normalization is process for assigning attributes to entities

Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables

• Normalization stages 1NF - First normal form 2NF - Second normal form 3NF - Third normal form 4NF - Fourth normal form

Page 3: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 3

3Example:

Stars Movies

Owns

Studios

Starts-in

title

length

year

address

Name

Name

address

Page 4: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 4

3

Problem Example

Title Year Length FilmType StudioName StarName StarPhone

Star Wars 1977 124 Color Fox Carrie Fisher 713-872-9282

Star Wars 1977 124 Color Fox Harrison Ford 832-999-2002

Star Wars 1977 124 Color Fox Mark hamill 512-472-0282

Fugitive 1995 135 Color Universal Harrison Ford 832-999-2002

Wayne’s World 1992 95 Color Paramount Dana Carvey 882-333-9999 Wayne’s World 1992 95 Color Paramount Mike Meyers 222-355-1111

Movies

• Update anomalies: If Harrison Ford’s phone # changes, must change it in each of his tuples. If Length value of Star Wars needs to be changed, must change all occurrences

•Deletion anomalies: If we delete Wayne’s World entries from database, we also loose all info about Dana Carvey & Mike Meyers

Page 5: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 5

3

Conversion to 1NF• Repeating groups must be eliminated

Proper primary key developed • Uniquely identifies each tuple

Dependencies can be identified• undesirable dependencies allowed

– Partial » based on part of composite primary key

– Transitive

» one nonprime attribute depends on another nonprime attribute

Page 6: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 6

3

Conversion to 1NF Cont.• An attribute that is at least part of a key is known

as a prime attribute or key attribute or primary key.

Page 7: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 7

3

Example• Projects assigned to employees

• Each project has a number and a name

• Each employee has a number, a name, a job class

• Each employee working on a project, need to keep number of hours spent on project, and hourly rate.

• Project Assignments Table :

( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)

• What’s the Key for this relation?

Page 8: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 8

3

Data Organization: 1NF

Page 9: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 9

3

Dependency Diagram (1NF)

PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS

PROJ_NUM --> PROJ_NAME

EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS

JOB_CLASS --> CHG_HOUR

Page 10: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 10

3

1NF Summarized• All key attributes defined• Primary Key identified• No repeating groups in table• All attributes dependent on

primary key

Page 11: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 11

3

2NF Summarized• In 1NF, but• Includes no partial dependencies• Partial dependency:

An attribute is functionally dependent on a portion of the primary key.

• Example: PROJ_NUM PROJ_NAME EMP_NUM-->EMP_NAME, JOB_CLASS,

CHG_HOURS

Page 12: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 12

3Conversion to 2NF

1. Start with 1NF format:

2. Write each key component on a separate line

3. Write dependent attributes after each key component

4. Write original key on last line

5. Write any remaining attributes after original key

6. Each component is new table

PROJECT (PROJ_NUM, PROJ_NAME)EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)ASSIGN (PROJ_NUM, EMP_NUM, HOURS)

Page 13: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 13

3

2NF Conversion Results

Page 14: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 14

3

2NF Summarized• In 1NF• Includes no partial dependencies

• Still possible to exhibit transitive dependency Attributes may be functionally dependent on

non-key attributes

Page 15: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 15

3

Conversion to 3NF• decompose table(s) to eliminate transitive

functional dependencies

PROJECT (PROJ_NUM, PROJ_NAME)ASSIGN (PROJ_NUM, EMP_NUM, HOURS)EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)JOB (JOB_CLASS, CHG_HOUR)

Page 16: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 16

3

3NF Summarized• In 2NF• Contains no transitive dependencies

Page 17: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 18

3Boyce-Codd Normal Form

(BCNF)

• Formally, R is in BCNF if for every nontrivial FD for R, say X A, X is a superkey.

“Nontrivial” = right-side attribute not in left side. Trivial FDs examples

• AA• ABA

• Informally: the only arrows in the FD diagram are arrows out of superkeys Note: BCNF violation when relation has more than

one superkey that overlap

Page 18: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 19

3

3NF Table Not in BCNF

What normal form?

Page 19: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 20

3

Decomposition of Table Structure to Meet BCNF

Page 20: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 21

3Decomposition to Reach BCNF

Setting: relation R with FDs F.

Suppose relation R has BCNF violation X B and X not a superkey.

Page 21: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 22

31. Compute X+. Cannot be all attributes – why?

2. Decompose R into X+ and (R–X+) X.

3. Find the FD’s for the decomposed relations. Project the FD’s from F = calculate all

consequents of F that involve only attributes from X+ or only from (RX+) X.

R X+ X

Page 22: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 23

3

• Identify the violating FD

• E.g. : X1X2…Xn B1B2…Bm

• Add to the right-hand side of FD as many attributes as are functionally determined by (X1X2…Xn)

• Decompose relation R into two relations: One relation has all attributes Xs & Bs Second relation has the Xs plus any other

remaining attributes from R other than Bs

Decomposition to Reach BCNF

Page 23: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 24

3

BCNF--ExampleAssume R(S, J, T)

• S: Student

• J: subject

• T: Teacher

• Student S is taught subject J by teacher T.• Constraints:

For each subject, each student of that subject is taught by only one teacher

Each teacher teaches only one subject (but each subject is taught by several teachers)

Page 24: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 25

3

BCNF--Example

S J T

Smith Math Prof. White

Smith Physics Prof. Green

Jane Math Prof. White

Jane Physics Prof. Brown

Functional Dependencies:

•S , J T

•T J

Page 25: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 26

3

BCNF--Example• Candidate keys: {S, J} and {S, T}

• 3NF but not in BCNF• Update anomaly: if we delete the info that Jane is

studying Physics we also loose the info that Prof. Brown teaches Physics

• Solution: two relations R1{S, T} R2{T, J}

S

J

T

Page 26: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 27

3

Decomposition Based on BCNF is Necessarily Correct

Attributes A, B, C. FD: B C

Relations R1[A,B] R2[B, C]

Tuples in R: (a, b, c) Tuples in R1: (a, b)Tuples in R2: (b, c)

Natural join of R1 and R2 = (a, b, c) original relation R can be reconstructed by forming the natural join of R1 and R2.

Page 27: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 28

3

Decomposition Based on BCNF is Necessarily Correct

Attributes A, B, C. FD: B C

Relations R1[A,B] R2[B, C]

Tuples in R: (a, b, c) , (d, b, e) Tuples in R1: (a, b), (d, b)Tuples in R2: (b, c), (b, e)

Tuples in the natural join of R1 and R2: (a,b,c), (a,b, e) (d, b, c), (d, b, e)

Can (a,b,e), (d, b, c) be a bogus tuples?

Page 28: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 29

3Decomposition Based on BCNF is Necessarily Correct

• Answer: No • Because: B C i.e. if 2 tuples have same B

attribute then they must have the same C attribute. (b,c) = (b,e)

(a, b,e) = (a, b,c) and (d, b, c) = (d, b, e)

Page 29: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 30

3

Theorem• Any two-attribute relation is in BCNF.

Page 30: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 31

3

Decomposition Theorem• Suppose we decompose a relation R(X, Y, Z) into

R1(X, Y) and R2(X,Z) and project the R onto R1 and R2.

• Then join(R1, R2) is guaranteed to reconstruct R if and only if XY or XZ

• Notice that whenever we decompose because of a BNCF violation, one of the above FDs holds.

Page 31: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 32

33NF

One FD structure causes problems in BCNF:

• If you decompose, you can’t recover all of the original FD’s.

• If you don’t decompose, you violate BCNF.

Abstractly: AB C and C B.

• Example : street city zip, and zip city.

BCNF violation: C B has a left side that is not a superkey.

• Based on previous algorithm, decompose into BC and AC. But the FD AB C does not hold in new tables.

Page 32: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 33

3

ExampleA = street, B = city, C = zip.

street zip

1 Main St. 7700214000 Main St. 77005

city zip

Houston 77002Houston 77005

city street zip

Houston 1 Main St. 77002Houston 14000 Main St. 77005

zip city BCNF violation

street city zip

It is a bad idea to decompose relation because you loose the ability to check the dependency:

Decompose:

Page 33: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 34

3

Example

1 Main St. 7700214000 Main St. 770051 Main St. 33555

Houston 77002Houston 77005Boston 33555

Houston 1 Main St. 77002Houston 14000 Main St. 77005Boston 1 Main St. 33555

zip city

street city zip

It is a bad idea to decompose relation because you loose the ability to check the dependency:

Decompose:

Page 34: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 35

3

“Elegant” WorkaroundDefine the problem away.• A relation R is in 3NF iff (if and only if)

for every nontrivial FD X A, either:

1. X is a superkey, or

2. A is prime = member of at least one key.

• Thus, if we just normalize to the 3NF, the problem goes away.

Page 35: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 36

3What 3NF and BCNF Give You• There are two important properties of a

decomposition:

1. Recovery : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original.

2. Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

Page 36: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 37

3

3NF and BCNF, Continued• We can get (1) with a BCNF decomposition.

• We can get both (1) and (2) with a 3NF decomposition.

• But we can’t always get (1) and (2) with a BCNF decomposition. street-city-zip is an example.

Page 37: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 38

3

Mutli-valued Dependencies

Fourth Normal Form

Page 38: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 39

3

Definition of MVD

• A multivalued dependency is an assertion that two attributes (sets of attributes) are independent of one another.

• Formally: A multivalued dependency (MVD) on R, X ->->Y , says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.

Page 39: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 40

3ExampleActors(name, addr, phones, cars) with MVD Name phones.

name addr phones cars sue a p1 b1

sue a p2 b2it must also have the same tuples with phones components

swapped: name addr phones cars

sue a p2 b1sue a p1 b2

Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

Page 40: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 41

3Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983

•An actor may have more than one address

•Key? What normal form?

•Note the redundancies

•MVD: name street city

• read: name determines 1 or more street & city independent of all other attributes

Page 41: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 42

3

MVD Rules1. Every FD is an MVD.

Because if X Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples.

Example, in Actors: name addr.

• Note: the opposite is not true i.e. not every MVD is a FD

2. Complementation: if X Y, then X Z, where Z is all attributes not in X or Y. Example: since name phones holds in Actors,

the name addr cars.

Page 42: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 43

3

Splitting Doesn’t Hold• name street city holds, but

• name street does not hold Name does not determine 1 or more street independent

of city.

• name city does not hold

Page 43: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 44

3Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983

•An actor may have more than one address

•MVD: name street city

• read: name determines 1 or more street & city independent of all other attributes

•Also (complement MVD): name title year

Page 44: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 45

3

Fourth Normal Form• The redundancy that comes from MVD’s

is not removable by putting the database schema in BCNF.

• There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.

Page 45: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 46

34NF

Eliminate redundancy due to multiplicative effect of MVD’s.• Roughly: treat MVD’s as FD's for decomposition, but not for finding

keys.• Formally: R is in Fourth Normal Form if whenever MVD

X Y is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. Remember, X Y implies X Y, so 4NF is more stringent

than BCNF.

• Decompose R, using4NF violation X Y,into XY and X (R—Y). R Y X

Page 46: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 47

3Example

Drinkers(name, addr, phones, cars)• FD: name addr

• Nontrivial MVD’s: name phones

name cars.

• Only key: {name, phones, cars}

• All three dependencies above violate 4NF. Why?

• Successive decomposition yields 4NF relations:

D1(name, addr)

D2(name, phones)

D3(name, cars)

Page 47: 3 Spring 20071 Chapter 3.6-7 Normalization of Database Tables.

Spring 2007 48

3

Example 2name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire 1980C. Fisher 5 Locust Ln. Malibu Empire 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi1983

name street city

Decompose into:

R1(name, street, city)

R2(name, title, year)