CSE 4701 Chaps15&16-1 Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and...
-
Upload
stuart-warner -
Category
Documents
-
view
242 -
download
0
Transcript of CSE 4701 Chaps15&16-1 Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and...
CSE4701
Chaps15&16-1
Chap 15 & 16 6e - 14 5e: Relational DB Design,
Functional Dependencies and Normalization
Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department
The University of Connecticut191 Auditorium Road, Box U-155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
A portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech.
Other slides and figures have been adapted from the AWL web site for the textbook.
CSE4701
Chaps15&16-2
Normalizing a Relational DB Schema Recall: Defining Relations - Deciding which Attributes
belong Together in Each Relation Choosing Appropriate Names for the Relations and
Their Attributes (with Domains and Data Types) Identifying the Candidate Keys and Choosing a PK
for Each Relation, and Specifying All Foreign Keys Two Techniques for Relational Schema Design
Using ER-to-Relational Mapping (Chapter 9) Relational Normalization Theory (Chaps 15/16)
In Normalization, we Strive for an “Optimal” Design in Terms of Redundancy - Improve Performance Anomalies - Eliminate “Problems”
CSE4701
Chaps15&16-3
Design Process - Where are we?Conceptual
Design
ConceptualSchema
(ER Model)
LogicalDesign
Logical Schema(Relational Model)
Analysisof Schema
Step 2: NormalizationAnalyzing the Schema fromPerformance/Efficiency Perspectivesto arrive at “Optimal” Schema
Normalized Schema
CSE4701
Chaps15&16-4
Focus of this Chapter Informal Design Guidelines for Relational Databases
Semantics of the Relation Attributes Redundant Information in Tuples/Update Anomalies Null Values in Tuples Spurious Tuples
Functional Dependencies (FDs) Recall Key Concepts from Chapter 7 Inference Rules for FDs Equivalence of Sets of FDs
Normal Form and Normalization First, Second, and Third Normal Forms Boyce-Codd Normal Form
CSE4701
Chaps15&16-5
Informal DB Design Guidelines
What is Relational Database Design? The Grouping of Attributes to Form "Good"
Relation Schemas Two Levels of Relation Schemas:
The Logical "User View" Level The Storage "Base Relation" Level
Design is Concerned Mainly with Base Relations What are the Criteria for "Good" Base Relations? We’ll Start with Informal Guidelines for Good
Relational Design
CSE4701
Chaps15&16-6
The Four
Commandments:
• Thou Shalt Commit
No Redundancy of
Fact
• Thou Shalt Clutter
No Facts
• Thou Shalt Preserve
Information
• Thou Shalt Preserve
Functional
Dependencies
The Four
Commandments:
• Thou Shalt Commit
No Redundancy of
Fact
• Thou Shalt Clutter
No Facts
• Thou Shalt Preserve
Information
• Thou Shalt Preserve
Functional
Dependencies
What are Commandments for DB Design?
© Leo Mark, Database Group, Georgia Tech
CSE4701
Chaps15&16-7
What is a “Good” DB Schema?
Focus on the “Semantics” of the Relations What Does Each Relation Mean? Do the “Semantics” of Each Relation Make Sense?
Each Relation has a Consistent Meaning Dependencies Between Relations are Clear
What about Keys? Are Primary Keys Well Defined? Do Links to Foreign Keys Make Sense?
How Does Relational Schema Relate to ER or EER Predecessor?
What is an Example of a “Good” Schema? Why?
CSE4701
Chaps15&16-8
A Well Defined DB Schema
CSE4701
Chaps15&16-9
Relational Instances for Prior Example
What does DEPT_LOCATIONS Represent?
CSE4701
Chaps15&16-10
Relational Instances for Prior Example
What does WORKS_ONRepresent?
CSE4701
Chaps15&16-11
Guideline 1: Represent a Single Entity GUIDELINE 1: Informally, Each Tuple in a Relation
Should Represent One Entity or Relationship Instance (Applies to Individual Relations and their Attributes)
Attributes of Different Entities should not be Mixed in the Same Relation
Only FKs should be used to Refer to Other Entities Entity and Relationship Attributes should be Kept
Apart as Much as Possible Bottom Line:
Design a Schema that can be Explained Easily Relation by Relation
The Semantics of Attributes should be Easy to Interpret
CSE4701
Chaps15&16-12
What is a “Lousy” Relation? Why?
Represents a “Single” Employee in Each Line as Identified via SSN
What is it Trying to Represent? Each Employee Works in a Department Identified by
DNUMBER, DNAME, and DMGRSSN What is the Problem with this Design? What Happens When you Update? Delete?
CSE4701
Chaps15&16-13
What is a “Lousy” Relation? Why?
What is the Problem with this Design? Mixing Attributes from EMPLOYEE and PROJECT
Relations! Significant Amounts of Redundant Data More Critically - Anomalies in Update, Delete, Insert
CSE4701
Chaps15&16-14
Guideline 2: Redundant Information and Update Anomalies
Mixing Attributes of Multiple Entities (see Prior Two Slides) May Cause Problems
Key Problem: Information is Stored Redundantly There are Two Consequences:
Wasting Storage Problems with Update Anomalies
Insertion Anomalies - Inserting New TuplesDeletion Anomalies - Removing Existing TuplesModification Anomalies - Changing Existing Tuples
CSE4701
Chaps15&16-15
Insertion Anomalies
What Happens When Insert a new Employee Who Works in Department 5?
Must Enter DNUMBER, DNAME, and DMGRSSN Must Be Exact w.r.t. Other Dept. 5 Employees! What Happens if you Enter: “Resaerch” or
“333445565”? What are Implications?
CSE4701
Chaps15&16-16
Insertion Anomalies
What Happens When you Want to Insert a New Department? (3, “Education”, 123123123)
Can you do the Insert? If so, How? If Not, Why Not?
CSE4701
Chaps15&16-17
Deletion Anomalies
What Happens When you Delete “Borg, James” from the EMP_DEPT Table?
Is the Resulting Table OK? Why or Why Not?
CSE4701
Chaps15&16-18
Modification Anomalies
What Happens When you Want to Change “Research” to “R and D”?
What is Required in this Case? What is the Responsibility of the DB Application
Programmer or Anyone Doing an Update?
CSE4701
Chaps15&16-19
Another Schema with Problems
Slide 10- 19
Two relation schemas suffering from update anomalies
CSE4701
Chaps15&16-20
Database Instances
Slide 10- 20
Base Relations EMP_DEPT and EMP_PROJ formed after a Natural Join : with redundant information
CSE4701
Chaps15&16-21Slide 10- 21
Example of an Update Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Update Anomaly: Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.
CSE4701
Chaps15&16-22Slide 10- 22
Example of an Insert Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Insert Anomaly: Cannot insert a project unless an employee is
assigned to it. Conversely
Cannot insert an employee unless a he/she is assigned to a project.
CSE4701
Chaps15&16-23Slide 10- 23
Example of an Delete Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Delete Anomaly: When a project is deleted, it will result in deleting
all the employees who work on that project. Alternately, if an employee is the sole employee on a
project, deleting that employee would result in deleting the corresponding project.
CSE4701
Chaps15&16-24
Yet Another Example
1. Each Dept. has several students, and a student may enroll in one Dept. for his/her major
3. A student may register more than one course, and each course may have many students. 4. Each student registered for a course must have a corresponding grade
2. A Dept. has only one head, i.e., the Dept. chair
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
CSE4701
Chaps15&16-25
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
Update Anomalies
Insertion: What Must Occur When a New Student is Inserted?
(Insert “Correct” DName and DHead) How is the New “BioInformatics” Dept. Added? Can you even Add New Dept Easily?
Deletion: What Happens When the Last Student in the
“Puppetry” Department is Deleted? Update:
What Must Occur When a New DHead Takes Over?
CSE4701
Chaps15&16-26
Guideline 3: Null Values in Tuples
Guideline 3: Relations should be Designed such that their Tuples will have as Few NULL Values as Possible
Attributes that are NULL Frequently Could Be Placed in Separate Relations (With the Primary Key)
Reasons for Null ValuesAttribute Not Applicable or InvalidAttribute Value Unkown (May Exist)Value Known to Exist, but Unavailable
So Many Types of Null Values Become Impossible to Assess/Understand
CSE4701
Chaps15&16-27
Guideline 3 Why do we Need NULL in a Relation?
A “Flat” Relation With Many Attributes May Have Faster Queries (Since Fewer Joins Required) More Null Values
Example: Not Every Student Will Enroll in Every Course Offered by the Department
Problems with NULL Waste of Space at Storage Level (less of an issue) Aggregate Operations (Sum, Count) Cannot Apply Cannot Join Multiple Meanings, e.g., Unknown, Not Available,
Known but Absent
CSE4701
Chaps15&16-28
How Else Can Null Values Occur?
Recall Options 3 and 4 on Specialization For Each Specialization with m Subclasses {S1, …, Sm}
and Generalized Superclass C, where the Attributes of C are {k, A1, …, An} (k is the PK), Convert According to the Following:
Option 3: For Disjoint Subclasses:Create a Single Relation U which Contains all the
Attributes of all Si and {k, A1, …, An} and t
Use k as the primary key of Ui
The Attribute t Indicates the Type Attribute According to which Specialization is Performed
CSE4701
Chaps15&16-29
Step 8 – Option 3 Example
Secretary Tech Engr
What is True for the ThreeAttributes Boxed Above?
CSE4701
Chaps15&16-30
How Else Can Null Values Occur?
Recall Options 3 and 4 on Specialization
Option 4: For Overlapping Subclasses:Create a single relation U which contains all
Attributes of all Si and all Attributes of C ({k, A1, …, An}) and {t1, …, tm}
Use k as the Primary Key of Ui
The Attributes ti are Boolean Valued, Indicating if a Tuple Belongs to Subclass Si
Note: May Generate a Large Number of Null Values in the Relation
CSE4701
Chaps15&16-31
Step 8 – Option 4 Example
Boolean
Boolean
What is True regardingthe AttributesIntroduced for
MANUFATURED_PARTand PURCHASED_PART?
CSE4701
Chaps15&16-32
Information Loss and Spurious Tuples
We’ve had Guidelines for: One Concept/Relation, Avoiding Update Anomalies, Null Values
Two Other “Related” Concerns Can Arise First, in Decomposing (Splitting) a Relation Apart,
we May “Lose” Information Second, in Attempting to Reassemble Two or More
Relations into One (via a Join), Spurious Tuples may Result
A Spurious Tuple “Wasn’t” Present Originally and Makes No Sense - Didn’t Exist and its Existence is Inconsistency
CSE4701
Chaps15&16-33
Suppose Split EMP_PROJ
CSE4701
Chaps15&16-34
What are Semantics of Split?
EMP_LOCS Means the Employee ENAME Works on Some Project at PLOCATION
EMP_PROJ1 Means the Employee Identified by SSN Works HOURS per Week on Project Identified by PNAME, PNUMBER, PLOCATION
What has been Lost in the Split? Can the Information Even be Recovered?
CSE4701
Chaps15&16-35
Recall EMP_PROJ
CSE4701
Chaps15&16-36
What are Tuples After Split?
CSE4701
Chaps15&16-37
What is the Issue?
Suppose EMP_PROJ1 and EMP_LOCS used in Place of EMP_PROJ
The Split is Legitimate if we Can Recover the Information Originally in EMP_PROJ
How could you Recover the Information? Natural Join on EMP_PROJ1 and EMP_LOCS What would be the Result?
Note: “*’ed” Entries are Spurious Tuples We do not Obtain the “Correct” Information We have Conducted a “Lossy” Decomposition
CSE4701
Chaps15&16-38
What Happens When we Join?
What do “*”ed Tuples Represent?
CSE4701
Chaps15&16-39
Guideline 4: Spurious Tuples
Guideline 4: The Relations should be Designed to Satisfy the
Lossless Join Condition No Spurious Tuples Should Be Generated by Doing
a Natural-join of Any Relations Two Important Properties of Decompositions:
a. Non-additive(Losslessness) of Corresponding Join
b. Preservation of the Functional Dependencies Property (a) is Extremely Important and Cannot Be
Sacrificed Property (b) is Less Stringent and May Be
Sacrificed
CSE4701
Chaps15&16-40
R = (A, B, C) S = (D, C)
b2b2b4
c1c1c2
A B C
c1c2c2c3
d1d2d4d5
D C
a1a2a3a3
a1a2a3
b2b2b4b4
c1c1c2c2
d1d1d2d4
A B C D
RS(A, B, C, D)
lost info.
Guideline 4: Lost Information
A First Example of Lost Information What is Lost in the Join of R and S?
CSE4701
Chaps15&16-41
Guideline 4: Spurious Tuples
A Second Example of Spurious Tuples What are Spurious in the Join of R1and R2?
a1a2a3a4
b1b2b1b2
c1c2c1c2
d1d1d2d3
A B C D
R(A, B, C, D) R1(B, D)
B C
b1b2b1b2
d1d1d2d3
D
d1d1d2d3
A
a1a2a3a4
R2(A, D)
a1a2a1a2a3a4
b1b1b2b2b1b2
c1c2c2c2c1c2
d1d1d1d1d2d3
A B C D
R1 and R2 Join
CSE4701
Chaps15&16-42
Let’s Review Some Other Examples
Some NULL values for Join Attribute DNUM
CSE4701
Chaps15&16-43
Let’s Review Some Other Examples
Natural Join on Employee Department Shown Below What is the Problem?
Berger and Benitez Employees are no Longer Present
CSE4701
Chaps15&16-44
Let’s Review Some Other Examples
The Outer Join as Described in Chapter 8 Works
CSE4701
Chaps15&16-45
Dangling Tuples
Decompose
CSE4701
Chaps15&16-46
Dangling Tuples
If Decompose when Try to Rejoin on SSN Lose Both Berger and Benitez These are Called Dangling Tuples
CSE4701
Chaps15&16-47
Functional Dependencies (FDs)
FDs are used to Specify Formal Measures of the "Goodness" of Relational Designs
FDs and Keys are used to Define Normal Forms for Relations
FDs are Constraints that are Derived from the Meaning and Interrelationships of the Data Attributes
A Set of Attributes X Functionally Determines a Set of Attributes Y if the Value of X Determines a Unique Value for Y
FDs are Derived from the Real-World Constraints on the Attributes
A Relational Schema is Relations with Keys and FDs!
CSE4701
Chaps15&16-48
Functional Dependencies
A Functional Dependency Exists Between Two (or Two Set Of) Single Valued Attributes X and Y of Relation R, if Each Value of X Corresponds to Precisely One Value of Y
Denoted by X Y X is Called the Left Hand Side of FD Y is Called the Right Hand Side of FD
Read as X Functionally Determines Y in R
For any t1, t2 r(R), if t1[X] =t2[X], then
t1[Y] =t2[Y], We say that X Y hold in R
CSE4701
Chaps15&16-49Slide 10- 49
Functional Dependencies – Another view
X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y] X -> Y in R specifies a constraint on all relation
instances r(R) Written as X -> Y; can be displayed graphically on a
relation schema as in Figures. (denoted by the arrow) FDs are derived from the real-world constraints on the
attributes
CSE4701
Chaps15&16-50
Examples of FD constraints Social security number determines employee name
SSN ENAME Project number determines project name and location
PNUMBER {PNAME, PLOCATION} Employee ssn and project number determines the hours
per week that the employee works on the project {SSN, PNUMBER} HOURS
CSE4701
Chaps15&16-51
Examples of FD constraints An FD is a Property of Attributes in the Schema FDs Must Hold on Every Relation Instance R If K is a Key of R, then K Functionally Determines All
Attributes in R Since we Never have Two Distinct Tuples with
T1[k]=t2[k]
CSE4701
Chaps15&16-52
Example of FDs
{S#, CN} Grade, S# DNAME, DNAME DHead.
STUDENT_DEPT (S#, DName, DHead, CN, Grade)
FDs over STUDENT_DEPT:
S# DHead CN GradeDNAME
fd1
fd2
fd3
CSE4701
Chaps15&16-53
Example of FDs
SSN ENAMEPNUMBER {PNAME, PLOCATION}{SSN, PNUMBER} HOURS
SSN {ENAME, BDATE, ADDRESS, DNUMBER}DNUMBER {DNAME, DMGRSSN}
CSE4701
Chaps15&16-54
Determining FDs
Must Understand the Semantics of Data Based on Schema or Current/Future Instances
What are FDs Below?TEXT COURSE?COURSE TEXT?
What if I add a row to the table? Need to Understand the Potential Future Data
James Web Databases Al-Nour
CSE4701
Chaps15&16-55
Inference Rules for FDs
Given a set of FDs F, we can Infer Additional FDs that Hold whenever the FDs in F Hold
For Example, Consider:F = {SSN {EName, BDate, Address, DNumber},
DNumber {DName, DMGRSSN} } What are Additional FDs?
SSN EName SSN BDate SSN SSNSSN Address SSN DNumber
DNumber Dname DNumber DMGRSSN
CSE4701
Chaps15&16-56
Inference Rules for FDs Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z) IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
IR1, IR2, IR3 form a sound and complete set of inference rules These are rules hold and all other rules that hold can be
deduced from these
CSE4701
Chaps15&16-57
Inference Rules for FDs Some additional inference rules that are useful:
Decomposition: If X -> YZ, then X -> Y and X -> Z
Union: If X -> Y and X -> Z, then X -> YZ
Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z
The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)
CSE4701
Chaps15&16-58
Summary of Inference Rules Armstrong’s Inference Rules
Derived Inference Rules
1. Reflective: If X Y, then X Y.2. Augmentation: If { X Y} then XZYZ.
3. Transitive: If { XY, YZ } then X Z.
4. Decomposition: If { XYZ } then X Y.
5. Additive (Union): If {XY, XZ } then X YZ.
6. Pseudotransitive: If {XY, WYZ } then W X Z.
CSE4701
Chaps15&16-59
Inference Rules for FDs
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
CSE4701
Chaps15&16-60
Algorithm 14.1 for Closure
Given a Set of FDs, Each X on LHS of a FD, Closure X+ are the Attributes Functionally Determined by X
Algorithm 14.1 Determining X+, Closure of X under F
X+ := X;
repeat
1. oldX+ := X+ ;
2. for each FD Y Z in F do
3. if X+ Y then X+ := X+ Z;
until (X+ = old X+ );
CSE4701
Chaps15&16-61
Recall Schema and its 3 FDs
CSE4701
Chaps15&16-62
Applying Algorithm 14.1Consider the following set of FDs:
Let F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }
Compute Closure by iterating through each FD
Closure of {SSN}+ First FD: SSN ENAMEAdd in All Attributes into {SSN}+Add in SSN and ENAME to get:
{SSN}+ = {SSN, NAME}Are Any Attributes in {SSN}+ LHS of another FD?
No – so stop the process
CSE4701
Chaps15&16-63
Applying Algorithm 14.1Consider the following set of FDs:
Let F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }
Closure So Far{SSN}+ = {SSN, NAME}
Next Closure of {PNUMBER}+ Second FD: PNUMBER {PNAME, PLOCATION}Add in All Attributes into {PNUMBER}+Add in PNAME and PLOCATION to get:{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}Are Any Attributes in {PNUMBER}+
in the LHS of another FD? No – so stop the process
CSE4701
Chaps15&16-64
Applying Algorithm 14.1Next Closure of {SSN, PNUMBER}+
For First FD: SSN ENAME Is {SSN, PNUMBER}+ SSN Yes – Add in ENAME since SSN ENAME to
get: {SSN, PNUMBER, ENAME}+
For Second FD: PNUMBER {PNAME, PLOCATION} Is {SSN, PNUMBER, ENAME}+ PNUMBER
Yes – Add in PNAME, PLOCATION RHS of FD
For Third FD: {SSN, PNUMBER} HOURS Is {SSN, PNUMBER, ENAME, PNAME, PLOC}+
{SSN,NUMBER} Yes – Add in HOURS RHS of FD into Set
{SSN}+ = {SSN, NAME}{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}{SSN,PNUMBER}+ = {SSN, PNUMBER, PNAME,
PLOCATION, HOURS}
CSE4701
Chaps15&16-65
SummaryLet F = {SSN ENAME,
PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }
{SSN}+ = {SSN, ENAME}{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION } {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME,
PNAME, PLOCATION, HOURS}
For Example: Compute {SSN, PNUMBER}+
Is {SSN, PNUMBER} SSN? Yes - Add in ENAME
Is {SSN, PNUMBER, ENAME} PNUMBER? Yes - Add in PNAME, PLOCATION
Is {SSN, PNUMBER, ENAME, PNAME, PLOCATION} {SSN, PNUMBER}?
Yes - Add in HOURS
CSE4701
Chaps15&16-66
Interpreting Closure SetsWhat Does {SSN}+ = {SSN, ENAME} Equate to? SSN ENAME, SSN SSN, SSN SSN, ENAMEWhat about {PNUMBER}+ ={PNUMBER,PNAME,PLOCATION} PNUMBER PNAME
PNUMBER PLOCATION
PNUMBER PNUMBER
PNUMBER {PNAME, PLOCATION}
PNUMBER {PNUMBER, PLOCATION}
PNUMBER {PNAME, PNUMBER}
PNUMBER {PNUMBER, PNAME, PLOCATION}Can you do:{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME,
PNAME, PLOCATION, HOURS}
CSE4701
Chaps15&16-67
Problem 14.20
Compute {DNUMBER}+
Is {DNUMBER} SSN ? No - Continue to Next FD
Is {DNUMBER} DNUMBER ? Yes - Add in DNAME, DMGRSSN
{DNUMBER}+ = {DNUMBER, DNAME, DMGRSSN}
FYI: {SSN} + ={SSN, ENAME, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN}
Let G = {SSN {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER {DNAME, DMGRSSN} }
CSE4701
Chaps15&16-68
Inference Rule Properties
Theorem 1:
X A1A2... An Holds in a Relation Scheme R if and
only if X Ai Holds for all i = 1, ..., n Theorem 2:
Armstrong's Inference Rules are Sound and Complete Given a set F of Functional Dependencies:
By Sound we mean every FD that we can infer from F by using Armstrong’s Inference Rules is in F+
By Complete we mean every FD in F+ (that F implies) can be Inferred from F by using Armstrong’s Inference Rules
CSE4701
Chaps15&16-69
When are Sets of FDs E and F Equivalent?
E is Covered by F (F is said to cover E), if Every FD in E is also in F+
Two sets of FDs E and F are equivalent if: Every FD in E can be Inferred from F, and Every FD in F can be Inferred from E
Hence, F and G are Equivalent if F+=G+
Definition: F covers F if every FD in F can be Inferred from F (i.e., if F+ E+)
E and F are Equivalent if E Covers F and F Covers E How Can we Use Algorithm 14.1 for this Purpose?
CSE4701
Chaps15&16-70
Minimal Sets of FDs
A set of FDs F is Minimal if it satisfies the Conditions:(1) Every FD in F has a Single Attribute for its RHS(2) We can’t Remove any FD from F and have a set of
FDs that is equivalent to F(3) We can’t Replace any FD XA in F with a FD
YA , where Y X and still have a set of FDs that is Equivalent to F
Every set of FDs has an Equivalent Minimal Set There can be Several Equivalent Minimal sets There is no Simple Algorithm for Computing a
Minimal set of FDs that is Equivalent to a set F of FDs
CSE4701
Chaps15&16-71
Algorithm 14.2 for Minimal Cover
Algorithm 14.2 Finding a minimal cover G for F
1. Set G := F;
2. Replace each functional dependency X A1 A2 ... An in G by n functional dependencies X A1, …, X An
3. For each functional dependency X A in G
for each attribute B that is an element of X
if (( G - {X A }) {(X - {B}) A}) is equivalent to G then
replace X A with (X - {B}) A in G.
4. For each remaining functional dependency X A in Gif (G - {X A }) is equivalent to G then
remove X A from G.
CSE4701
Chaps15&16-72
Towards Normalization of Relations We take each Relation Individually and “Improve”
Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller
Relations that Results in No Information Loss Support for Reconstruction
No Spurious Joins Query Execution Time May Increase
Denormalization May Be Necessary Later on Objectives: Minimizing
Redundancy Insertion, Deletion, and Update Anomalies
CSE4701
Chaps15&16-73
What is the Normalization Process?
Provides DB Designers with the Ability to “Improve” their Relations
Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with
A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes
A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree
CSE4701
Chaps15&16-74
What are Normal Forms?
A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) - Chapter 15 Join Dependencies (5NF) - Chapter 15
5 NF4NF
3NF
2NF
1NF
CSE4701
Chaps15&16-75
How is Normalization Attained?
Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies
In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property
Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations
Dependency Preservation PropertyEnsures that each FD is Represented in some Individual Relation(s) after Decomposition
Premise: Relational Schema with Primary Keys and Functional Dependencies Specified
CSE4701
Chaps15&16-76
Recall Key Constraints
Superkey (SK): Any Subset of Attributes Whose Values are
Guaranteed to Distinguish Among Tuples Candidate Key (CK):
A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity)
A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple
There may be Multiple Candidate Keys
CSE4701
Chaps15&16-77
Recall Key Constraints
Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined
Foreign Key (FK): An Attribute or a Combination of Attributes (Say A)
of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain)
Allows Linkages Between Relations that are Tracked and Establish Dependencies
Useful to Capture ER Relationships
CSE4701
Chaps15&16-78
Superkeys vs. Candidate Keys
Superkey of R: A Superkey SK is a Set of Attributes of R Such that
No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK
Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk]
For Cars, Valid Superkeys Must Contain:SerialNo OR State, Reg# OR Both
For EMPLOYEE {SSN} is a Key and{SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are
all SUPERKEYS
CSE4701
Chaps15&16-79
Superkeys vs. Candidate Keys
Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a
Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey
Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A]
In Previous (State, Reg#, Make, Model) is SKIs it a CK?Why or Why Not?
CSE4701
Chaps15&16-80
Example and Remaining Definitions
Example: CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys)
Key1 = {State, Reg#} Key2 = {SerialNo}
{SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is
Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is
not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME
CSE4701
Chaps15&16-81
First Normal Form (1NF)
All Attributes Must Be Atomic Values: Only Simple and Indivisible Values in the Domain
of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued
Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value :
A Set of Values (Set-Value) A Tuple of Values (Nested Relation)
1NF is a Standard Assumption of Relation DBs
CSE4701
Chaps15&16-82
One Example of 1NF
Consider Following Department Relation What is the Inherent Problem?
DLOCATIONS is Multi-valued
CSE4701
Chaps15&16-83
What are Possible Solutions?
Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION)
Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below)
Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations
Problems with Each? Best Solution?
CSE4701
Chaps15&16-84
Another 1NF Example - Nested Relations
EMP_PROJ - Table and Tuples
Transition to:
CSE4701
Chaps15&16-85
Second Normal Form (2NF)
Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies
Intuitively: A Relation Schema R is in Second Normal Form
(2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key
R can be Decomposed into 2NF Relations via the Process of 2NF Normalization
Successful Process Typically Involves Decomposing R into Two or More Relations
Iteratively Applying to Each Relation in Schema
CSE4701
Chaps15&16-86
Full Functional Dependency
Full FD - Formally:Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over R, then Y is fully dependent on X, denoted as XY
Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER} HOURS is full since Neither
SSN -> HOURS nor PNUMBER HOURS holds What about in the Following:
f
{S#, CN}Grade
CSE4701
Chaps15&16-87
Partial Functional Dependency
Partial FD - Formally:Given R(U) and X, YU. If XY holds but Y is not fully dependent on X ( XY), then Y is partially functional dependent on X, denoted by XY
Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER} ENAME is Partial since
Removing PNUMBER still Results in the Valid FD SSN ENAME
Are Following Full or Partial?
p
{S#, CN}CN, {S#, CN}S#
{S#, CN, DNAME}Grade
f
CSE4701
Chaps15&16-88
Second Normal Form (2NF)
Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully
Functional Dependent on Every Key. Alternative Definition:
R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key.
Reason: Partial Functional Dependencies may cause Update Problems
CSE4701
Chaps15&16-89
Another Way to View the Problem If the Primary Key Contains a Single Attribute, than No
Need to Test for Problems This is 1NF but not 2NF since
Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN)
Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)
CSE4701
Chaps15&16-90
One Example of 2NF
Consider the Example Below
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
STUDENT_DEPT 1NF
“{S#, CN} DName, DHead” since S# DName and DName DHead is a Partial FD causes Anomalies
But STUDENT_DEPT 2NF
S# DHead CN GradeDName
fd1
fd2
fd3
CSE4701
Chaps15&16-91
Recall the Anomalies…
Insertion Anomalies: No Department Can Be Recorded if it has No
Student Who Enrolls Courses Deletion Anomalies:
Delete the Last Student in a Department will also Delete the Department
Update Anomalies: Change a Head of a Department must Modify All
Students in that Department Due to Redundancies
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
CSE4701
Chaps15&16-92
One Example of 2NF (Continued)
Decomposition into 2NF by Separating Course Information from Department Information (Link S#)
S_D(S#, DName, DHead)
DHeadDName
fd2
fd3
S#
S_C(S#, CN, Grade)
fd1
S# CN Grade
CSE4701
Chaps15&16-93
Another Example of 2NF
EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN ENAME - Means ENAME, a Non-Prime
Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both
PNUMBER {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both
CSE4701
Chaps15&16-94
Another Example of 2NF
What Does Decomposition Below Accomplish? ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER
Result: 2NF for EP1, EP2, and EP3
CSE4701
Chaps15&16-95
Yet Another Example of 2NF
Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem?
FD3: COUNTY_NAME TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#}
All Other Non-Prime Attributes are Fine
CSE4701
Chaps15&16-96
Yet Another Example of 2NF
What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME
Result: 2NF for LOTS1 and LOTS2
CSE4701
Chaps15&16-97
Third Normal Form (3NF)
Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies
Intuitively: A Relation Schema R is in Third Normal Form
(3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key
R can be Decomposed into 3NF Relations via the Process of 3NF Normalization
In XY and Y Z , with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN Emp# Salary isn’t Problem Since Emp# is a Candidate Key
CSE4701
Chaps15&16-98
Transitive Partial FDs
Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X.
Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN ENAME is non-transitive Since there is no set of
Attributes X where SSN X and X ENAME For FD X Z that can be derived from two FDs XY
and YZ, if Y is a Candidate Key – No Problem
CSE4701
Chaps15&16-99
Third Normal Form (3NF)
Formal 3NF Definition R 3NF iff
(i) R 2NF;
(ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key.
Alternative Definition: R 3NF iff for every FD X Y, either X is a superkey, or Y is a key attribute.
Reason: Transitive Functional Dependencies may cause Update Problems
CSE4701
Chaps15&16-100
One Example of 3NFSTUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF
S_C(S#, CN, Grade) 2NFS_D(S#, DName, DHead) 2NF S_D 3NF
S_C 3NF
“S# DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X) Dname (Y) and DName (Y) DHead (Z)
S#DHead
S# DHead CN GradeDNAME
fd1
fd2
fd3
CSE4701
Chaps15&16-101
One Example of 3NF
S_C(S#, CN, Grade) 2NF
S_D(S#, DName, DHead) 2NF
S_D (S#, DName)
DEPT(DName, DHead)3NF
fd2 S# DName
fd3 DName DHead
DHeadDNameS#fd S# DHead
Decompose to Eliminate the Transitivity Within S_D
CSE4701
Chaps15&16-102
Another Example of 3NF
EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN DNUMBER and DNUMBER DNAME
Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN
SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and
DMGRSSN via DNUMBER
CSE4701
Chaps15&16-103
Another Example of 3NF
To Attain 3NF, Decompose into ED1 and ED2 Intuitively - we are Separating Out Employees and
Departments from One Another
CSE4701
Chaps15&16-104
Yet Another Example of 3NF
Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn.
In LOTS1, FD4 AREA PRICEAREA is not a SuperkeyPRICE not a Prime Attribute of LOTS1
CSE4701
Chaps15&16-105
Yet Another Example of 3NF
Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B
CSE4701
Chaps15&16-106
1NF and 2NF – Maintain FDs!
CSE4701
Chaps15&16-107
Transition to 3NF – Maintain FDs!
CSE4701
Chaps15&16-108
Summary of Progression – Maintain FDs!STUDENT_DEPT
1NF
S# DHead CN GradeDName
fd1
fd2
fd3
S_C S_D2NF
eliminate partial FDs
fd1
S# CN Grade DHeadDName
fd2
fd3
S#
DHead
S#S_D
DName
DEPT
S_C
3NF
eliminate transitive FDs
fd1
S# CN Grade
DName
fd3
fd2
CSE4701
Chaps15&16-109
Summary of 1NF, 2NF, 3NF ConceptsTest Remedy (Normalization)
1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation. or nested relations.
2NF For relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key. functionally dependent on it.
3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.
CSE4701
Chaps15&16-110
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs
Intuitively: A Relation Schema R is in Boyce-Codd Normal
Form (BCNF) if Whenever an FD X A Holds in R, then X is a Superkey of R
R can be Decomposed into BCNF Relations via the Process of BCNF Normalization
There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF)
CSE4701
Chaps15&16-111
Boyce-Codd Normal Form (BCNF)
Formal BCNF Definition R BCNF iff
(i) R 1NF;
(ii) for every FD X Y, X is a Superkey, i.e., if X Y and YX, then X Contains a Key.
Properties of BCNF R BCNF iff for every FD X Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that
they do not Belong to No Attribute Fully Dependent on any Set of Non-key
Attributes
CSE4701
Chaps15&16-112
Comparing the Normal Forms
1NF
2NF
3NF
BCNF
Eliminate the non-trivial functional
dependencies of non-key
attributes to key
Eliminate partial FDs of non-key attributes to key
Eliminate transitive FDs of non-key attributes to key
Eliminate partial and transitive FDs of key attributes to key
Poor Relational Schema DesignDeveloped as Stepping Stone
Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies
CSE4701
Chaps15&16-113
One Example of BCNF
Recall 3NF Solution for Building Lots Problem Suppose that AREA is Sizes in Acres with
AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0
Adding FD5: “AREA COUNTYNAME” What Does Data in LOTS1A Look like for Given Set
of Properties?
CSE4701
Chaps15&16-114
LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9
One Example of BCNF
What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination
Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A
Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA COUNTY_NAME”
CSE4701
Chaps15&16-115
Transition to BCNF – Maintain FDs!
Add new FD5
CSE4701
Chaps15&16-116
One Example of BCNF
FD5: “AREA COUNTY_NAME” Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A
So Do One More Split
CSE4701
Chaps15&16-117
One Example of BCNF
LOTS1AX PROPERTY_ID# LOT# AREAT11 L1 0.5 T12 L2 0.8 W13 L6 1.5 W11 L1 1.1W12 L4 1.6 T10 L3 0.9
LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6T10 Tolland L3 0.9
LOTS1AY AREA COUNTY_NAME0.5 Tolland... Tolland1.0 Tolland1.1 Windham... Windham2.0 Windham
CSE4701
Chaps15&16-118
Consider the TEACH Relation:
in 3NF but NOT BCNF with FD1: {STUDENT, COURSE} INSTRUCTOR FD2: INSTRUCTOR COURSE
3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT)
All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and
Doesn’t Generate any Spurious Tuples
TEACH(STUDENT, COURSE, INSTRUCTOR)
Another Example of BCNF
CSE4701
Chaps15&16-119
What Does Table Look Like?
Note TEACH in 3NF but NOT BCNF
CSE4701
Chaps15&16-120
Reflections on Normalization
Normalization A Tool for Validating the Quality of the Schema,
Rather than Merely as a Method for Designing a Relational Schema
Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema
Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Top-
down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions
CSE4701
Chaps15&16-121
Relational DB Design Process
Normalization Process Focused on Decomposition Raises Number of Questions
How do we Decompose a Schema into a Desirable Normal Form?
What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema?
Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition?
CSE4701
Chaps15&16-122
S# DName DHeadR = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
S1S2S3S4
D1D1D2D3
JohnJonhSmithBlack
Recall Transitive FD/Update Anomalies
S# Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head
Information cannot be Stored in this Database Update Head of Any Department Requires an
Update to Every Student Enrolled in the Dept.
CSE4701
Chaps15&16-123
What are Possible Decompositions?
S#
S1S2S3S4
D1D1D2D3
DHead
JohnJohnSmithBlack
DName
Information Based
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1(S#, ), R2(DName, R3(DHead, )}
is Neither Lossless nor FD-Preserving
CSE4701
Chaps15&16-124
What are Possible Decompositions?
S# DName
S1S2S3S4
D1D1D2D3
S# DHead
S1S2S3S4
JohnJohnSmithBlack
• Lossless Decomposition but not Dependency-Preserving
• DNameDHead is lost in the decomposition
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1({S# ,DName}, {S#DName}),
R2({S#, DHead}, {S#DHead})}
2is Lossless but not FD-Preserving
CSE4701
Chaps15&16-125
What are Possible Decompositions?
S# DName
S1S2S3S4
D1D1D2D3
DName DHead
JohnJohn
D1D1D2D3
Lossless & dependency-preserving decomposition
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1({S# ,DName}, {S# DName})
R3({DName, DHead}, {Dname DHead})}
is both Lossless and FD-Preserving
CSE4701
Chaps15&16-126
Summary of Normalization
2NF
3NF
BCNF
1NF
Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key
Lossless Decompositionbut not Dependency Preserving
Lossless Decompositionand Dependency Preserving
CSE4701
Chaps15&16-127
The Entire Normalization Picture1NF
2NF
3NF
BCNF
Eliminate Partial FDs of Non-prime Attributes to Key
Eliminate Transitive FDs of Non-prime Attributes to Key
Eliminate Partial and Transitive FDs of Prime Attributes to Key
4NF
Eliminate Non-trivial and Non-functional Multi-Valued Dependencies
5NF
Eliminate Join Dependencies that are Not Implied by Candidate Key
CSE4701
Chaps15&16-128
What are Multi-Valued Dependencies?
Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X
Corresponds to Multiple Values of Y Consider EMP with MVDs:
ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)
CSE4701
Chaps15&16-129
What is Fourth Normal Form (4NF)?
A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X Y in F+, X is a Superkey for R
Reconsider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)
ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish
We need to Decompose EMP!
CSE4701
Chaps15&16-130
Decomposition into 4NF
ENAME PNAME is Trivial MVD: ENAME PNAME is
Equal to EMP_PROJECTS (same for ENAME DNAME)
CSE4701
Chaps15&16-131
What about the Supply Table?
In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects
Removes Join Dependencies – Many-many-many
ANN.132
CSE4701
Problem 15.29 6th ed
Consider the table:Orders (O#,I#,Odate, Cust#, Total_amount, Qty_ordered, Total_price, Discount%)
What are the Functional Dependencies?:
Is it in 2 NF ?
O# Odate Cust# Total_amt
O# I# Qty_ordered Total_price Discount%
No! Odate, Cust#, Total_amt Partially Dependent on only Part of Key O# I# namely O#
ANN.133
CSE4701
Problem 15.29 6th ed
How do you Fix the Single Orders Table?
Is it in 3NF? Yes – No transitive functional
dependencies XY and Y Z , with X as the Primary Key, where Y is not a candidate key
Create Two Tables Order (O#,Odate, Cust#, Total_amount)OrderedItem (O#,I#, Qty_ordered, Total_price, Discount%)
ANN.134
CSE4701
Problem 15.30 6th ed
Consider the table:CAR_SALE( Car# , Date_sold, Salesman#, Commision%, Discount_amt)
Assumptions Car can be sold by Multiple Salepersons Thus, Primary Key: Car# , Salesman# w/ FDs
Car# Date_sold Car# Discount_amt Car# Salesman# Salesman# Commission%
Is it in 2NF?
No! Car# Date_sold and Car# Discount_amt are not FFD on Primary Key – since Depend on Only Part
ANN.135
CSE4701
Problem 15.30
How Do you Convert to 2NF? Recall FDs
Split into Three Tables
CAR1( Car# , Date_sold, Discount_amt)CAR2( Car# , Salesman#)CAR3(Salesman#, Commision%)
Car# Date_soldCar# Discount_amtCar# Salesman# Salesman# Commission%
ANN.136
CSE4701
Problem 15.33 6th ed
Consider the table:TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)
Assumptions and FDs Patient Treated by Physician on Date with a Diagnosis,
Treatment Code, and Charge Every Treatment Code has a Charge {Doctor#, Patient#, Date}{Diagnosis, Treat_code, Charge} {Treat_code}{Charge}
Is it in 3NF?
No - Convert
ANN.137
CSE4701
Problem 15.33
What do you look for? Transitivity What is the Problem?
TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code)
BillingAmount(Treat_code, Charge)
Charge a non-Key Attribute is Dependent on Treat Code another non-Key Attribute
Split into Two Tables FDs Now are Fine
ANN.138
CSE4701
Problem 15.35
Given Table Below - What is Candidate Key? BOOK (Book_Name, Author, Edition, Year)
Book_Name, Author, Edition Why? Sometimes Edition Issues Twice in 1 Year
What is the main FD? Book_Name, Edition Year
Is it in 2NF? No: Since Year Dependent on only Part of Key
Convert:
BOOK (Book_Name, Author, Edition)BOOK_YEAR (Book_Name, Edition, Year)
ANN.139
CSE4701
Problem 15.35
What are the Multi-Valued Dependencies?
How Do You Separate the Dependencies
BOOK(Book_Name, Author, Edition)BOOK_YEAR(Book_Name, Edition,Year)
Book_Name AuthorBook_Name Edition.
SPLIT INTO THREE TABLES
BOOK (Book_Name, Edition)BOOK_AUTHOR (Book_Name, Edition, Author)BOOK_YEAR (Book_Name, Edition, Year)
CSE4701
Chaps15&16-140
Concluding Remarks What have we Learned in Chapter 14?
Guidelines for “Good” Relational Design Avoiding Anomalies Functional Dependencies Augment Schema Normalization “Improves” Design Lossless Joins and Dependency Preservation Quick Look at 4NF (Informally)
How is Chapter 14 Related to the Semester Project? Phase II in the Semester Project
Step 1: ER to Relational Transformation (Chapter 9)Step 2: Relational Normalization (Chapter 14) which
Includes Identification of FDs!