CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition)

34
CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela

description

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition). Korra Sathya Babu Department of Computer Science NIT Rourkela. Recap. Design of DB is needed to reduce redundancy and anomalies The theory of Functional Dependency is completely studied - PowerPoint PPT Presentation

Transcript of CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition)

Page 1: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

CS 222 Database Management System

Spring 2010-11

Lecture 5 Database Design (Decomposition)

Korra Sathya BabuDepartment of Computer Science

NIT Rourkela

Page 2: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Design of DB is needed to reduce redundancy and anomalies

• The theory of Functional Dependency is completely studied

• Better Design requires schema refinement• A solution for schema refinement is Synthesis of

relations

04/20/23 Database Design 2

Recap

Page 3: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 3

Relation Decomposition

R-X + X X +-X

R2

R1

R

Page 4: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Reason for Decomposition• A solution for reducing redundancy and Anomalies

• Rules for synthesis• Lossless Join (Information Preservation)

• Dependency Preservation (a special case of information preservation)

• Decomposition (synthesis) types• By functional dependency• By multi-valued dependency• By Join dependency

04/20/23 Database Design 4

Relation Decomposition

Page 5: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• DefinitionA decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, (R1(r), ..., Rm(r)) = r

where is the natural join of all the relations in D

• The word loss in lossless refers to loss of information, not to loss of tuples.

04/20/23 Database Design 5

Lossless Join

Page 6: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of Functional Dependencies

04/20/23 Database Design 6

Test for Lossless Join

Lossless Join Test Algorithm:Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R.

Step 2: Set S(i, j) := bij for all matrix entries

Step 3: For each row i representing relation schema Ri Do{for each column j representing Aj do

{if relation Ri includes attribute Aj thenset S(i, j) := aj;}

Step 4: Repeat the following loop until a complete loop execution results in no changes to S.

Page 7: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 7

Test for Lossless Join

Lossless Join Test Algorithm: continues…Step 4: Repeat the following loop until a complete loop execution results in no changes to S.

If {for each function dependency X Y in F dofor all rows in S which have the same symbols in the

columns corresponding to attributes in X do{make the symbols in each column that correspond to

an attribute in Y be the same in all these rows as follows:

if any of the rows has an “a” symbol for the column,set the other rows to the same “a” symbol in the column.If no “a” symbol exists for the attribute in any of therows, choose one of the “b” symbols that appear in oneof the rows for the attribute and set the other rows tothat same “b” symbol in the column;}}

Step 5: If a row is made up entirely of “a” symbols, then the

decomposition has the lossless join property;

otherwise it does not.

Page 8: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 8

Example 1

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}

SSN ENAME

R1

PNUM PNAME PLOCATION

R2

SSN PNUM hours

R3

Page 9: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 9

Example 1

A1SSN

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

b11

b21

b31

b12

b22

b32

b13

b23

b33

b14

b24

b34

b15

b25

b35

b16

b26

b36

R1

R2

R3

a1

b21

a1

a2

b22

b32

b13

a3

a3

b14

a4

b34

b15

a5

b35

b16

b26

a6

R1

R2

R3

Page 10: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 10

Example 1

a1

b21

a1

a2

b22

a2

b13

a3

a3

b14

a4

b34

b15

a5

b35

b16

b26

a6

R1

R2

R3

a1

b21

a1

a2

b22

a2

b13

a3

a3

b14

a4

a4

b15

a5

a5

b16

b26

a6

R1

R2

R3

SSN ENAME

PNUM {PNAME, PLOCATION}

SSN ENAME

PNUM PNAME PLOCATION

Page 11: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 11

Example 2

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}

ENAME

R1

SSN PNAMEPLOCATION

R2

PNUM hours PLOCATION

Page 12: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 12

Example 2

A1SSN

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

b11

b21

b12

b22

b13

b23

b14

b24

b15

b25

b16

b26

R1

R2

b11

a1

a2

b22

b13

a3

b14

a4

a5

a5

b16

a6

R1

R2

SSN ENAMEPNUM {PNAME, PLOCATION}{SSN, PNUM} hours

Page 13: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Check whether the following decompositions are lossy or lossless• Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.

Let F={AC, BC, CD, DEC, CEA}• R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.

R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)• R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)• R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}

F={XYW, XWP, PQZ, XYQ}

04/20/23 Database Design 13

Problems

Page 14: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

R was decomposed (normalisation) into R1, …, Rn

S - the set of FDs for RS1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to

only the attributes of Ri)

S’ = S1 … Sn (usually, S’ S)

the decomposition is dependency preserving if S’+ = S+

04/20/23 Database Design 14

Dependency Preservation

Page 15: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 15

Test for Dependency Preservation

Dependency Preservation Test:Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2

Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4

Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply the corresponding Ri operation (on a set of attributes T with

respect to set of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and

repeat step 3

Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4

Input: decomposition D={D1,…,Dk} and a set of FDs F

Page 16: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 16

Problems

1.Given R(XYZ) and the set F = {ZX , XYZ}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F.2.Given R(ABCD) and the set F = {AB , CD}. Check if the decomposition R1(AB) and R2(CD) preserve the set F.3.Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={XY, YZ, ZW, WX}.4.Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F.

Page 17: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Normalization is the process of successive reduction of a given set of relations to a better form (reduced redundancy and anomalies)

• The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity)

• Assumes that all possible functional dependencies are known• First construct a minimal set of FDs• Then apply algorithms that construct a required Normal

Form

• Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory

04/20/23 Database Design 17

Normalization

Page 18: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns

• It is the process of converting complex data structures into more simple, stable data structures

• A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute

• First Normal From (1NF)• Unique rows• All attributes are atomic

04/20/23 Database Design 18

1 NF

Page 19: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key

• The following relation is in 1NF but not 2NF

04/20/23 Database Design 19

2 NF

EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)

Functional dependencies:1. Emp_ID Name, Dept, Salary2. Emp_ID, Course Date_Completed

partial key dependency

Decompose into 2NFEMPLOYEE1(Emp_ID, Name, Dept, Salary)Functional dependencies: Emp_ID Name, Dept, Salary

EMPCOURSE(Emp_ID, Course,Date_Completed)Functional dependency: Emp_ID, Course Date_Completed

Page 20: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key

04/20/23 Database Design 20

3 NF

SALES(Customer_ID, Customer_Name, SalesPerson, Region)Functional dependencies:1. Customer_ID Customer_Name, SalesPerson, Region2. SalesPerson Region

Decompose into 3NFSALES1(Customer_ID, Customer_Name, SalesPerson)Functional dependencies: Customer_ID Customer_Name, SalesPerson

SPERSON(SalesPerson, Region) Functional dependency: SalesPerson Region

Transitive Dependency

Page 21: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table

• A table is in BCNF if the only determinants in the table are the candidate keys

04/20/23 Database Design 21

BCNF

SCHOOL(Student, Subject, Teacher)Functional dependencies:1. Student, Subject Teacher2. Student, Teacher Subject3. Teacher Subject

Decompose into BCNFSCHOOL1(Student, Subject)SCHOOL2(Subject, Teacher)

All Functional Dependencies vanished except TeacherSubject

Page 22: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• It is always possible to decompose a relation into relations in 3NF such that: the decomposition is lossless the dependencies are preserved

• It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve dependencies But may eliminate more redundancy

04/20/23 Database Design 22

Comparison between 3NF and BCNF

Page 23: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

Let R be a relation schema and let R and R. The multivalued dependency

holds on R if in any legal relation r(R), for all pairs for tuples t1

and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4 []

t3[] = t1 [] t3[R – ] = t2[R – ] t4 ] = t2[] t4[R – ] = t1[R – ]

• MVD is a tuple generating Dependency04/20/23 Database Design 23

Multivalued Dependency

Page 24: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multi-valued parts of the primary key

• If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B

• The normal forms after BCNF are theoretical interests

04/20/23 Database Design 24

4 NF

Page 25: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

Student Table

04/20/23 Database Design 25

4 NF

Student Subject Language

Geeta Mythology English

Geeta Psychology English

Geeta Mythology Hindi

Geeta Psychology Hindi

Shekher Gardening English

Student Subject Student Language

Page 26: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 26

4 NF

Student Subject

Geeta Mythology

Geeta Psychology

Shekher Gardening

Here we take care of the update anomaly

Split the independent multi-valued components of the primary key into two tablesThe primary key is (student subject language)

Student_Subject Table

Student Language

Geeta English

Geeta Hindi

Shekher English

Student_Language Table

Page 27: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• There exists relations that cannot be nonloss-decomposed into two projects, but can be decomposed into three or more

04/20/23 Database Design 27

Surprise: Loss less Decomposition

Page 28: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z)

iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R.

• Consider the following Suppliers(S), Parts(P) and Location they Supply (L) tableSPL Table

04/20/23 Database Design 28

Join Dependency

S P L

S1 P1 L2

S1 P2 L1

S2 P1 L1

S1 P1 L1

S P

S1 P1

S1 P2

S2 P1

P L

P1 L2

P2 L1

P1 L1

ACTUAL DECOMPOSTIO

N

Page 29: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 29

Join Dependency

S P L

S1 P1 L2

S1 P2 L1

S2 P1 L1

S1 P1 L1

S P

S1 P1

S1 P2

S2 P1

P L

P1 L2

P2 L1

P1 L1

ACTUAL DECOMPOSTIO

N

Join

S P L

S1 P1 L2

S1 P2 L1

S2 P1 L1

S1 P1 L1

S2 P1 L2Spurious Tuple

Page 30: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 30

Join Dependency

S P L

S1 P1 L2

S1 P2 L1

S2 P1 L1

S1 P1 L1

S P

S1 P1

S1 P2

S2 P1

P L

P1 L2

P2 L1

P1 L1

DECOMPOSTION

Join

L S

L2 S1

L1 S1

L2 S2

S P L

S1 P1 L2

S1 P2 L1

S2 P1 L1

S1 P1 L1

Page 31: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key

• Its also called as the Project Join Normal Form (PJNF)

04/20/23 Database Design 31

5 NF

Page 32: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

04/20/23 Database Design 32

Normalization

Un-normalized Relation

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Arrange every atomic value in the cell (intersection of row and column) of a table

Eliminate Partial Dependencies

Eliminate Transitive Dependencies

Make every determinant as a key

Eliminate Multi-valued Dependencies that are not Functional Dependencies

Eliminate Join Dependencies that are not implied by Candidate keys

Page 33: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access

• Where there arise tradeoffs

04/20/23 Database Design 33

Denormalization

Page 34: CS 222  Database Management System Spring 2010-11   Lecture 5  Database Design (Decomposition)

• Normalization helps to reduce redundancy and few anomalies

• The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests

• Denormalization is done for fast access

04/20/23 Database Design 34

Summary