Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041...

37
normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 [email protected] .nz

Transcript of Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041...

Page 1: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

319B Database SystemsNormal Forms

Wilhelm Steinbuss

Room G1.25, ext. [email protected]

Page 2: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Introduction

• Develop first an ER Model • map this into a (logical) relational database

design • verify that the resulting design does not

violate any of the normalization principles

1NF 2NF 3NF BCNF 4NF 5NF ..

Page 3: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Why Normalization?

Assume you would have the following table

in your logical design: (project table)

There are many anomalies with this design:

Emp# Proj# Dept# Mgr# deptname percentage

Page 4: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Anomalies

• Insert anomaly:

no new department unless there is an employee in it

• Delete anomaly:

the last employee of a department can not be dropped; otherwise the information about the department disappears

• Update anomaly:

the name of a department is repeated once for each employee

Page 5: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

1NF

A relational Variable is in 1NF if and only if

every legal value of that relational variable

contains exactly one value for each attribute.

(A relational variable with strict typing is always in 1NF.)

Page 6: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

1NF (cont.)

Example: (a relational variable not in 1NF)

person p# name .... language_skills

1 McGee ... French,Dutch,English

: : ... :

Page 7: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

2NF

Example: (project is in 1NF, but with anomalies)

emp# proj# dept# dept_name mgr# percentage

Page 8: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

2NF (cont.)

A relational variable is in 2NF if and only if

it is in 1NF and every nonkey attribute

depends on the whole key.

Page 9: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example project

emp# proj# percentage

emp# mgr# dept# dept_name

Page 10: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Normalization step

Let Z be a key for R{A1,..,An}; if X Y,

X a proper subset of Z and Y Z = {}, then

R can be lossless decomposed into R1,R2:

R1{X Y} and R2{{A1,...,An} – Y}

If R1,R2 are not in 2NF, repeat the step

Page 11: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Lossless decomposition

Theorem 1: Let X,Y,Z be sets of attributes for R and S a set of FDs; thenR = R{X Y} R{X Z} X Y S+ or X Z S+

Proof: ‘‘ Let (x,y,z) be a short cut for {X:x,Y:y,Z:z}. We first show that R R{X Y} R{X Z}.

Let (x,y,z) R, then (x,y) R{X Y} and (x,z) R{X Z}, and so (x,y,z) R{X Y} R{X Z} Next we show R R{X Y} R{X Z}. Let (x,y,z) be an Element of the right hand side; in order to generate this element (x,y) R{X Y} and (x,z) R{X Z} and therefore

Page 12: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Lossless decomposition (cont.)

(x,y‘,z) R for some y‘ in order to generate (x,z)R{X Z};

therefore (x,y‘) and (x,y) R{X Y} and y‘=y because

X Y; therefore (x,y,z) R

‘‘ Let us assume that neither X Y nor X Z is valid. So at least an

A Y and a B Z exists with neither X {A} nor X {B}; so

A, B X+ (Lemma 2.3 FD). Now we choose r=(x,y1,z1) and

s=(x,y2,z2) like in Lemma 2.4 FD; now r|X = s|X but they are

different at least at the position for A (within the Y attributes) so

r|Y = y1 y2 = s|Y (the same for Z).

(x,y1,z2) R{X Y} R{X Z}, but (x,y1,z2) R

Page 13: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

3NF

Example: the first relational variable (EMP)

in the 2NF decomposition still has anomalies:

emp# mgr# dept# dept_name

Page 14: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

3NF (cont.)

A relational variable is in 3NF if and only if it

is in 2NF and every non-key attribute is non

transitively dependent on the primary key.

Page 15: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example project

emp# proj# percentage

dept# deptnameemp# mgr# dept#

Page 16: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Boyce-Codd Normal Form (BCNF)

So far we focused on FDs X Y with :

X key and Y non key attributes

or

X and Y non key attributes; but what‘s

about:

X non key attributes and Y key ?

Page 17: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example

Example: An course relational variable

with FDs:

{stud#,course#} {teacher#}

{teacher#} {course#}

student# course# teacher#

Page 18: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example (cont.)

course is in 3NF with key {stud#,course#}

(why?), but has anomalies (e.g. if we delete

the last sentence for a student in the course A

taught by a teacher B, we‘re losing the

information that B teaches A. The reason is:

{teacher#} {course#} and {teacher#} isn‘t

a (super)key.

Page 19: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example (cont.)

The situation is:

1. Two (or more) candidate keys

2. The candidate keys are composite and

3. They overlapped (i.e. had at least one

attribute in common)

( what is the second candidate key?)

Page 20: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

BCNF

A relational variable is in BCNF if and only if

whenever X A holds and A is not in X,

X is a superkey.

Page 21: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

BCNF (cont.)

More informal: each attribute must represent a fact about the entity identified by the key, the whole key and nothing but the key.

Or

If we assign the attributes in an ER Diagram to the suitable entity types then the resulting relational variables are in BCNF

Page 22: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example course

teacher# course#

What is the key?student# teacher#

Page 23: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Normalization Step

Let R{A1,..,An}; if X Y (X,Y {A1,..,An})

and X is not a superkey, then R can be lossless

decomposed into R1,R2:

R1{X Y} and R2{{A1,...,An} – Y}

If R1,R2 are not in BCNF, repeat the step

Page 24: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Exercise bookings

The relational variable Bookings:

title the name of a movie

theater the name of a theater where the movie is being shown

city the city where the theater is located

with FDs

{theater} {city}

{title,city} {theater} (only for the sake of the example)

Find the two candidate keys (proof that they are keys!) and decompose

bookings into relational variables which are in BCNF

Page 25: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Exercise events

The relational variable events:

event_type type of the event (e.g. sport)

date date for the event

event# the number of a specific event of that type

With FDs

{event_type,date} {event#} (for each event_type only one event of this type per day)

{event#} {event_type}

With the (candidate) key {event_type,date} events is not in BCNF;

decompose it to relational variables which are in BCNF

Page 26: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

summary

In BCNF the only (interesting) determinants

are the (candidate) keys; together with

Theorem 1 that is the end of the normalization

process depending on FDs (because there are

no more interesting lossless decompositions)

Page 27: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

4NF

Suppose we choose

instead of

an associative entity type:

Page 28: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example article

article_name colour size

T-shirt sunshine green M

T-shirt sunshine red M

T-shirt sunshine green L

T-shirt sunshine red L

T-shirt sunshine green S

T-shirt sunshine red S

Page 29: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Example article (cont.)

If the article_name and an arbitrarily chosen

value for size are known, then the set of valid

values for colour is known (e.g. given

‘T-shirt sunshine‘ with size=‘M‘, then

colour = {‘green‘,‘red‘}; the same is true for

size = ‘S‘ and size =‘L‘)

Page 30: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Multivalued Dependency

Let X,Y and Z be a decomposition of the attributes

of a relational variable R{X Y Z} and R a relational

value for R{X Y Z}. Let Yxz := {y: (x,y,z) R}

X Y (i.e. X multidetermines Y)

if and only if

Yxz = Yxz* for each z, z* whenever Yxz and Yxz* {}

Note: XY is a special case of XY whereYxz contains exactly one element

Page 31: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

4NF

A relational variable is in 4NF if and only if

X is a superkey for every nontrivial X Y

Note: Because each FD is a multivalued dependency

this implies also BCNF

Page 32: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

complementary rule

Theorem 2: X Y X ZConclusion from:Lemma 3: X Y ( If (x,y,z) R and

(x,y*,z*) R then (x,y*,z) R and (x,y,z*) R )

““ Let (x,y,z)R and Yxz* {}; then (x,y,z*) R

because Yxz = Yxz* by definition of X Y .Starting with (x,y*,z*), we get (x,y*,z) R

Page 33: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Lemma 3 (cont.)

““ Let y* Yxz* , i.e. (x,y*,z*) R and by

prerequisite (x,y*,z) R y* Yxz

i.e. Yxz* Yxz

Starting with y Yxz , i.e. (x,y,z) R and by

prerequisite (x,y,z*) R y Yxz*

i.e. Yxz Yxz*

Yxz = Yxz* X Y by definition

Page 34: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Decomposition

Theorem 4: Let X,Y and Z be a decomposition of the attributes of arelational variable R{X Y Z}. ThenR = R{X Y} R{X Z} X Y

““ Let (x,y,z) , (x,y*,z*) R; there is a representation(x,y,z)=(x,y) (x,z) and (x,y*,z*) = (x,y*) (x,z*); but then also (x,y,z*) = (x,y) (x,z*) R and (x,y*,z) = (x,y*) (x,z) R X Y by Lemma 3

Page 35: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Decomposition (cont.)

““ For R R{X Y} R{X Z} see

proof of Theorem 1; we have to show ““ :

Let t R{X Y} R{X Z} ; then there

are t1, t2 R with t = t1|X Y t2|X Z

with t1 = (x,y,z) and t2=(x,y*,z*)

then t=(x,y,z*) or t=(x,y*,z) t R by

Lemma 3 and X Y

Page 36: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

Normalization Step

Let X,Y and Z be a decomposition of the attributes of a relational variable R{X Y Z} and X Y.Then R{X Y Z} can be lossless decomposed:

R = R{X Y} R{X Z}

If R{X Y}, R{X Z} are not in 4NF, repeat the step

Page 37: Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz.

normalization 2003

summary

In our example we get the two (original) m:n

relationsships; so a unnecessarily designed

n-ary relationship results in a relational

variable which violates the 4NF.

4NF marks the end of a lossless

decomposition into two relational variables.