Schema Refinement and Normal Forms
-
Upload
scarlett-mcmillan -
Category
Documents
-
view
39 -
download
0
description
Transcript of Schema Refinement and Normal Forms
Schema Refinement and Normal Forms
2013 1CS3754 Class Notes #7, John Shieh
2 CS4753/2006F
Normalization
• It is a process that we can use to remove design flaws from a database
• A number of normal forms, which are sets of rules describing what we should and should not do in our table structure
• 3NF is sufficient to avoid the data redundancy problem of a designed relational database
Problems caused by redundancy
• Redundant Storage– Some information is stored repeatedly.
• Update Anomalies– If one copy of such repeated data is updated, an
inconsistency is created, unless all copies are similarly updated.
• Insertion anomalies– It may not be possible to store certain information
unless some other, unrelated, information is stored.• Deletion Anomalies
– It may not be possible to delete certain information without losing some other, unrelated, information.
2013 3CS3754 Class Notes #7, John Shieh
• Redundant Storage– The hourly wages depend on rating levels. So, for
example, hourly wage 10 for rating level 8 is repeated three times.
• Update Anomalies– The hourly_wages in the first tuple could be updated
without making a similar change in the second tuple.
Id name lot rating Hourly_wages Hours_worked
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
2013 4CS3754 Class Notes #7, John Shieh
• Insertion Anomalies– We cannot insert a tuple for an employee unless we
know the hourly wage for the employee’s rating value.
• Deletion Anomalies– If we delete all tuples with a given rating value (e.g.
tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value.
Id name lot rating Hourly_wages Hours_worked
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
2013 5CS3754 Class Notes #7, John Shieh
Decompositions
• Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural.
• Functional dependencies can be used to identify such situations and suggest refinements to the schema.
• The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relation.
2013 6CS3754 Class Notes #7, John Shieh
Id name lot rating Hourly_wages Hours_worked
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
Id name lot rating Hours_worked
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
rating Hourly_wages
8 10
5 7
A decomposition of a relation schema R consists of replacingthe relation schema by two (or more) relation schemas each of which contains a subset of attributes of R and together include allattributes in R
Functional dependency: - rating determines Hourly_wages
2013 7CS3754 Class Notes #7, John Shieh
Functional Dependencies• A functional dependency (FD) is a kind of IC that generalizes the
concept of a key.• Let R be a relation schema, and X and Y be sets of nonempty sets
of attributes in R. – An FD X Y exists, if in every relation instance for R, any two tuples that
agree on the value of X also agree on the value of Y.– More formally
• Let R be a relation schema and let X and Y be nonempty sets of attributes in R. An FD X Y exists in R if every instance of R preserves the FD X Y.
• We say that an instance r of R preserves the FD X Y if the following holds for every pair of tuples t1 and t2 in r
If t1.X = t2.X, then t1.Y = t2.Y
The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X2013 8CS3754 Class Notes #7, John Shieh
student_ID student_name course_ID course_name
111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms
Examples:
course_ID course_name is preserved?
{student_ID, course_ID} course_name is preserved ?
if no two rows agree on value, then is trivially preserved.
yes
yes
Take
2013 9CS3754 Class Notes #7, John Shieh
student_ID student_name course_ID course_name
111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms
The table instance also preserves the following
student_ID student_name
Student_ID, course_ID {student_name, course_name}
student_ID, course_ID {student_ID, student_name, course_ID, course_Name}
student_name student_name (a trivial dependency)
student_name, course_name student_name (also trivial)
many more ….
2013 10CS3754 Class Notes #7, John Shieh
How do we know if a FD exist in R?• Can we check all instances of R to see if the FD is preserved?
– Definitely, not possible!– Whether or not a functional dependency exists must be determined by
assumptions given in advance, or common sense, not by individual relation instances.
• Given an instance r of R, we can check if r preserves some
functional dependency f, but we cannot tell if f holds over R.
course_ID student_name ?
Although it is preserved by this table, it does not fit the assumption.
student_ID student_name course_ID course_name
111 Chan Tai Man 3170 Database222 Wong Siu Ling 2150 Graph Theory333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3000 Compiler
no
2013 11CS3754 Class Notes #7, John Shieh
• The assumptions given in advance, or common sense, impose some constraints, and are called the semantics of a database
• Assumptions given in advance impose explicit constraints; common sense imposes implicit constraints
2013 12CS3754 Class Notes #7, John Shieh
Example:• Application is to keep track of information about
employees in a company.• Information to be kept track of includes:
eid: employee’s id number
ename: employee name
address: address of the employee
sex: employee’s sex
dname: name of the department that the employee works for
dhname: department head’s name
dhsex: department head’s sex
2013 13CS3754 Class Notes #7, John Shieh
Let’s construct a relation schema as follows:
Which of the following dependencies are true?1. eid ename2. ename eid3. eid address4. eid sex5. sex address6. dhname dname7. dhname eid8. dhsex sex
Assumptions:
a:Employee’s id number is unique
b:Each employee has a unique address
c:Each employee works for only one dept.
d:A person can be the head of at most one department
e:All department heads have different names
Implicit: common sense
Employee eid ename address sex dname dhname dhsex
201314
CS3754 Class Notes #7, John Shieh
• is a superkey for relation schema R iff attri(R) where attri(R) denotes the set of all the attributes in schema R • is a candidate key (or simply, key) for R iff
- attri(R), and- is minimal, i.e., for any , attri(R)
• In other words, a candidate key is a minimal superkey
(student_ID, course_ID) is a candidate key (and the only one)(student_ID, course_ID, course_name) is a superkey, but not a candidate key(student_ID, course_ID, student_name) is another non-candidate superkey(student_ID, course_ID, course_name, student_name) is also a non-candidate
superkey
student_ID student_name course_ID course_name
111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms
2013 15CS3754 Class Notes #7, John Shieh
1st Normal Form No repeating data groups
2nd Normal Form No partial key dependency
3rd Normal Form No transitive dependency
Boyce-Codd Normal Form Reduce keys dependency
4th Normal Form No multi-valued dependency
5th Normal Form No join dependency
Normal Forms
NFNFBCNFNFNFNF 54321
2013 16CS3754 Class Notes #7, John Shieh
CS4753/2006F 17
Normal Form (NF)
• 1NF: each attribute or column value must be atomic
• 2NF: if a schema is 1NF, and if its all attributes that are not part of the primary key are fully functionally dependent on the primary key
• 3NF: if a schema is 2NF, and all transitive dependencies have been removed
Ex: employeeDept(employeeID, name, job, deptID, deptName) has to convert to
employee(employeeID, name, job, deptID)
Dept(deptID, deptName)
CS4753/2006F 18
2NF
• It means that each non-key attribute must be functionally dependent on all parts of the primary key (i.e., the combination of the composite attributes of the key).
• Example: not 2NFEmployee(employeeID, name, job, departmentID, skill)
employeeID, skill name, job, departmentID
employeeID name, job, departmentID
(Note: determine)
• Break the table into two tables to become 2NFEmployee(employeeID, name, job, departmentID)
employeeSkills(employeeID, skill)
CS4753/2006F 19
3NF
• Example: 2NF but not 3NFEmployee(employeeID, name, job, departmentID, departmentName)
Here employeeID departmentID
employeeID departmentName
Also departmentID departmentName, departmentID is not a key
Therefore, employeeID departmentName is a transitive dependency
• Convert the schema to 3NF by breaking to two tables:Employee(employeeID, name, job, departmentID)
Department(departmentID, departmentName)
CS4753/2006F 20
Normal Forms Defined Informally
• 1st normal form– All attributes depend on the key
• 2nd normal form– All attributes depend on the whole key
• 3rd normal form– All attributes depend on nothing but the key
CS4753/2006F 21
SUMMARY OF NORMAL FORMS based on Primary Keys