CSE 4701 Chaps15&16-1 Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and...

CSE4701

Chaps15&16-1

Chap 15 & 16 6e - 14 5e: Relational DB Design,

Functional Dependencies and Normalization

Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department

The University of Connecticut191 Auditorium Road, Box U-155

Storrs, CT [email protected]

http://www.engr.uconn.edu/~steve(860) 486 - 4818

A portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech.

Other slides and figures have been adapted from the AWL web site for the textbook.

CSE4701

Chaps15&16-2

Normalizing a Relational DB Schema Recall: Defining Relations - Deciding which Attributes

belong Together in Each Relation Choosing Appropriate Names for the Relations and

Their Attributes (with Domains and Data Types) Identifying the Candidate Keys and Choosing a PK

for Each Relation, and Specifying All Foreign Keys Two Techniques for Relational Schema Design

Using ER-to-Relational Mapping (Chapter 9) Relational Normalization Theory (Chaps 15/16)

In Normalization, we Strive for an “Optimal” Design in Terms of Redundancy - Improve Performance Anomalies - Eliminate “Problems”

CSE4701

Chaps15&16-3

Design Process - Where are we?Conceptual

Design

ConceptualSchema

(ER Model)

LogicalDesign

Logical Schema(Relational Model)

Analysisof Schema

Step 2: NormalizationAnalyzing the Schema fromPerformance/Efficiency Perspectivesto arrive at “Optimal” Schema

Normalized Schema

CSE4701

Chaps15&16-4

Focus of this Chapter Informal Design Guidelines for Relational Databases

Semantics of the Relation Attributes Redundant Information in Tuples/Update Anomalies Null Values in Tuples Spurious Tuples

Functional Dependencies (FDs) Recall Key Concepts from Chapter 7 Inference Rules for FDs Equivalence of Sets of FDs

Normal Form and Normalization First, Second, and Third Normal Forms Boyce-Codd Normal Form

CSE4701

Chaps15&16-5

Informal DB Design Guidelines

What is Relational Database Design? The Grouping of Attributes to Form "Good"

Relation Schemas Two Levels of Relation Schemas:

The Logical "User View" Level The Storage "Base Relation" Level

Design is Concerned Mainly with Base Relations What are the Criteria for "Good" Base Relations? We’ll Start with Informal Guidelines for Good

Relational Design

CSE4701

Chaps15&16-6

The Four

Commandments:

• Thou Shalt Commit

No Redundancy of

Fact

• Thou Shalt Clutter

No Facts

• Thou Shalt Preserve

Information


Functional

Dependencies

The Four

Commandments:

• Thou Shalt Commit

No Redundancy of

Fact

• Thou Shalt Clutter

No Facts


Information


Functional

Dependencies

What are Commandments for DB Design?

© Leo Mark, Database Group, Georgia Tech

CSE4701

Chaps15&16-7

What is a “Good” DB Schema?

Focus on the “Semantics” of the Relations What Does Each Relation Mean? Do the “Semantics” of Each Relation Make Sense?

Each Relation has a Consistent Meaning Dependencies Between Relations are Clear

What about Keys? Are Primary Keys Well Defined? Do Links to Foreign Keys Make Sense?

How Does Relational Schema Relate to ER or EER Predecessor?

What is an Example of a “Good” Schema? Why?

CSE4701

Chaps15&16-8

A Well Defined DB Schema

CSE4701

Chaps15&16-9

Relational Instances for Prior Example

What does DEPT_LOCATIONS Represent?

CSE4701

Chaps15&16-10

Relational Instances for Prior Example

What does WORKS_ONRepresent?

CSE4701

Chaps15&16-11

Guideline 1: Represent a Single Entity GUIDELINE 1: Informally, Each Tuple in a Relation

Should Represent One Entity or Relationship Instance (Applies to Individual Relations and their Attributes)

Attributes of Different Entities should not be Mixed in the Same Relation

Only FKs should be used to Refer to Other Entities Entity and Relationship Attributes should be Kept

Apart as Much as Possible Bottom Line:

Design a Schema that can be Explained Easily Relation by Relation

The Semantics of Attributes should be Easy to Interpret

CSE4701

Chaps15&16-12

What is a “Lousy” Relation? Why?

Represents a “Single” Employee in Each Line as Identified via SSN

What is it Trying to Represent? Each Employee Works in a Department Identified by

DNUMBER, DNAME, and DMGRSSN What is the Problem with this Design? What Happens When you Update? Delete?

CSE4701

Chaps15&16-13

What is a “Lousy” Relation? Why?

What is the Problem with this Design? Mixing Attributes from EMPLOYEE and PROJECT

Relations! Significant Amounts of Redundant Data More Critically - Anomalies in Update, Delete, Insert

CSE4701

Chaps15&16-14

Guideline 2: Redundant Information and Update Anomalies

Mixing Attributes of Multiple Entities (see Prior Two Slides) May Cause Problems

Key Problem: Information is Stored Redundantly There are Two Consequences:

Wasting Storage Problems with Update Anomalies

Insertion Anomalies - Inserting New TuplesDeletion Anomalies - Removing Existing TuplesModification Anomalies - Changing Existing Tuples

CSE4701

Chaps15&16-15

Insertion Anomalies

What Happens When Insert a new Employee Who Works in Department 5?

Must Enter DNUMBER, DNAME, and DMGRSSN Must Be Exact w.r.t. Other Dept. 5 Employees! What Happens if you Enter: “Resaerch” or

“333445565”? What are Implications?

CSE4701

Chaps15&16-16

Insertion Anomalies

What Happens When you Want to Insert a New Department? (3, “Education”, 123123123)

Can you do the Insert? If so, How? If Not, Why Not?

CSE4701

Chaps15&16-17

Deletion Anomalies

What Happens When you Delete “Borg, James” from the EMP_DEPT Table?

Is the Resulting Table OK? Why or Why Not?

CSE4701

Chaps15&16-18

Modification Anomalies

What Happens When you Want to Change “Research” to “R and D”?

What is Required in this Case? What is the Responsibility of the DB Application

Programmer or Anyone Doing an Update?

CSE4701

Chaps15&16-19

Another Schema with Problems

Slide 10- 19

Two relation schemas suffering from update anomalies

CSE4701

Chaps15&16-20

Database Instances

Slide 10- 20

Base Relations EMP_DEPT and EMP_PROJ formed after a Natural Join : with redundant information

CSE4701

Chaps15&16-21Slide 10- 21

Example of an Update Anomaly

Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project number P1 from

“Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.

CSE4701


Example of an Insert Anomaly


Insert Anomaly: Cannot insert a project unless an employee is

assigned to it. Conversely

Cannot insert an employee unless a he/she is assigned to a project.

CSE4701


Example of an Delete Anomaly


Delete Anomaly: When a project is deleted, it will result in deleting

all the employees who work on that project. Alternately, if an employee is the sole employee on a

project, deleting that employee would result in deleting the corresponding project.

CSE4701

Chaps15&16-24

Yet Another Example

1. Each Dept. has several students, and a student may enroll in one Dept. for his/her major

3. A student may register more than one course, and each course may have many students. 4. Each student registered for a course must have a corresponding grade

2. A Dept. has only one head, i.e., the Dept. chair

STUDENT_DEPT(S#, DName, DHead, CN, Grade)

CSE4701

Chaps15&16-25


Update Anomalies

Insertion: What Must Occur When a New Student is Inserted?

(Insert “Correct” DName and DHead) How is the New “BioInformatics” Dept. Added? Can you even Add New Dept Easily?

Deletion: What Happens When the Last Student in the

“Puppetry” Department is Deleted? Update:

What Must Occur When a New DHead Takes Over?

CSE4701

Chaps15&16-26

Guideline 3: Null Values in Tuples

Guideline 3: Relations should be Designed such that their Tuples will have as Few NULL Values as Possible

Attributes that are NULL Frequently Could Be Placed in Separate Relations (With the Primary Key)

Reasons for Null ValuesAttribute Not Applicable or InvalidAttribute Value Unkown (May Exist)Value Known to Exist, but Unavailable

So Many Types of Null Values Become Impossible to Assess/Understand

CSE4701

Chaps15&16-27

Guideline 3 Why do we Need NULL in a Relation?

A “Flat” Relation With Many Attributes May Have Faster Queries (Since Fewer Joins Required) More Null Values

Example: Not Every Student Will Enroll in Every Course Offered by the Department

Problems with NULL Waste of Space at Storage Level (less of an issue) Aggregate Operations (Sum, Count) Cannot Apply Cannot Join Multiple Meanings, e.g., Unknown, Not Available,

Known but Absent

CSE4701

Chaps15&16-28

How Else Can Null Values Occur?

Recall Options 3 and 4 on Specialization For Each Specialization with m Subclasses {S1, …, Sm}

and Generalized Superclass C, where the Attributes of C are {k, A1, …, An} (k is the PK), Convert According to the Following:

Option 3: For Disjoint Subclasses:Create a Single Relation U which Contains all the

Attributes of all Si and {k, A1, …, An} and t

Use k as the primary key of Ui

The Attribute t Indicates the Type Attribute According to which Specialization is Performed

CSE4701

Chaps15&16-29

Step 8 – Option 3 Example

Secretary Tech Engr

What is True for the ThreeAttributes Boxed Above?

CSE4701

Chaps15&16-30

How Else Can Null Values Occur?

Recall Options 3 and 4 on Specialization

Option 4: For Overlapping Subclasses:Create a single relation U which contains all

Attributes of all Si and all Attributes of C ({k, A1, …, An}) and {t1, …, tm}

Use k as the Primary Key of Ui

The Attributes ti are Boolean Valued, Indicating if a Tuple Belongs to Subclass Si

Note: May Generate a Large Number of Null Values in the Relation

CSE4701

Chaps15&16-31

Step 8 – Option 4 Example

Boolean

Boolean

What is True regardingthe AttributesIntroduced for

MANUFATURED_PARTand PURCHASED_PART?

CSE4701

Chaps15&16-32

Information Loss and Spurious Tuples

We’ve had Guidelines for: One Concept/Relation, Avoiding Update Anomalies, Null Values

Two Other “Related” Concerns Can Arise First, in Decomposing (Splitting) a Relation Apart,

we May “Lose” Information Second, in Attempting to Reassemble Two or More

Relations into One (via a Join), Spurious Tuples may Result

A Spurious Tuple “Wasn’t” Present Originally and Makes No Sense - Didn’t Exist and its Existence is Inconsistency

CSE4701

Chaps15&16-33

Suppose Split EMP_PROJ

CSE4701

Chaps15&16-34

What are Semantics of Split?

EMP_LOCS Means the Employee ENAME Works on Some Project at PLOCATION

EMP_PROJ1 Means the Employee Identified by SSN Works HOURS per Week on Project Identified by PNAME, PNUMBER, PLOCATION

What has been Lost in the Split? Can the Information Even be Recovered?

CSE4701

Chaps15&16-35

Recall EMP_PROJ

CSE4701

Chaps15&16-36

What are Tuples After Split?

CSE4701

Chaps15&16-37

What is the Issue?

Suppose EMP_PROJ1 and EMP_LOCS used in Place of EMP_PROJ

The Split is Legitimate if we Can Recover the Information Originally in EMP_PROJ

How could you Recover the Information? Natural Join on EMP_PROJ1 and EMP_LOCS What would be the Result?

Note: “*’ed” Entries are Spurious Tuples We do not Obtain the “Correct” Information We have Conducted a “Lossy” Decomposition

CSE4701

Chaps15&16-38

What Happens When we Join?

What do “*”ed Tuples Represent?

CSE4701

Chaps15&16-39

Guideline 4: Spurious Tuples

Guideline 4: The Relations should be Designed to Satisfy the

Lossless Join Condition No Spurious Tuples Should Be Generated by Doing

a Natural-join of Any Relations Two Important Properties of Decompositions:

a. Non-additive(Losslessness) of Corresponding Join

b. Preservation of the Functional Dependencies Property (a) is Extremely Important and Cannot Be

Sacrificed Property (b) is Less Stringent and May Be

Sacrificed

CSE4701

Chaps15&16-40

R = (A, B, C) S = (D, C)

b2b2b4

c1c1c2

A B C

c1c2c2c3

d1d2d4d5

D C

a1a2a3a3

a1a2a3

b2b2b4b4

c1c1c2c2

d1d1d2d4

A B C D

RS(A, B, C, D)

lost info.

Guideline 4: Lost Information

A First Example of Lost Information What is Lost in the Join of R and S?

CSE4701

Chaps15&16-41

Guideline 4: Spurious Tuples

A Second Example of Spurious Tuples What are Spurious in the Join of R1and R2?

a1a2a3a4

b1b2b1b2

c1c2c1c2

d1d1d2d3

A B C D

R(A, B, C, D) R1(B, D)

B C

b1b2b1b2

d1d1d2d3

D

d1d1d2d3

A

a1a2a3a4

R2(A, D)

a1a2a1a2a3a4

b1b1b2b2b1b2

c1c2c2c2c1c2

d1d1d1d1d2d3

A B C D

R1 and R2 Join

CSE4701

Chaps15&16-42

Let’s Review Some Other Examples

Some NULL values for Join Attribute DNUM

CSE4701

Chaps15&16-43


Natural Join on Employee Department Shown Below What is the Problem?

Berger and Benitez Employees are no Longer Present

CSE4701

Chaps15&16-44


The Outer Join as Described in Chapter 8 Works

CSE4701

Chaps15&16-45

Dangling Tuples

Decompose

CSE4701

Chaps15&16-46

Dangling Tuples

If Decompose when Try to Rejoin on SSN Lose Both Berger and Benitez These are Called Dangling Tuples

CSE4701

Chaps15&16-47

Functional Dependencies (FDs)

FDs are used to Specify Formal Measures of the "Goodness" of Relational Designs

FDs and Keys are used to Define Normal Forms for Relations

FDs are Constraints that are Derived from the Meaning and Interrelationships of the Data Attributes

A Set of Attributes X Functionally Determines a Set of Attributes Y if the Value of X Determines a Unique Value for Y

FDs are Derived from the Real-World Constraints on the Attributes

A Relational Schema is Relations with Keys and FDs!

CSE4701

Chaps15&16-48

Functional Dependencies

A Functional Dependency Exists Between Two (or Two Set Of) Single Valued Attributes X and Y of Relation R, if Each Value of X Corresponds to Precisely One Value of Y

Denoted by X Y X is Called the Left Hand Side of FD Y is Called the Right Hand Side of FD

Read as X Functionally Determines Y in R

For any t1, t2 r(R), if t1[X] =t2[X], then

t1[Y] =t2[Y], We say that X Y hold in R

CSE4701


Functional Dependencies – Another view

X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R): If

t1[X]=t2[X], then t1[Y]=t2[Y] X -> Y in R specifies a constraint on all relation

instances r(R) Written as X -> Y; can be displayed graphically on a

relation schema as in Figures. (denoted by the arrow) FDs are derived from the real-world constraints on the

attributes

CSE4701

Chaps15&16-50

Examples of FD constraints Social security number determines employee name

SSN ENAME Project number determines project name and location

PNUMBER {PNAME, PLOCATION} Employee ssn and project number determines the hours

per week that the employee works on the project {SSN, PNUMBER} HOURS

CSE4701

Chaps15&16-51

Examples of FD constraints An FD is a Property of Attributes in the Schema FDs Must Hold on Every Relation Instance R If K is a Key of R, then K Functionally Determines All

Attributes in R Since we Never have Two Distinct Tuples with

T1[k]=t2[k]

CSE4701

Chaps15&16-52

Example of FDs

{S#, CN} Grade, S# DNAME, DNAME DHead.

STUDENT_DEPT (S#, DName, DHead, CN, Grade)

FDs over STUDENT_DEPT:

S# DHead CN GradeDNAME

fd1

fd2

fd3

CSE4701

Chaps15&16-53

Example of FDs

SSN ENAMEPNUMBER {PNAME, PLOCATION}{SSN, PNUMBER} HOURS

SSN {ENAME, BDATE, ADDRESS, DNUMBER}DNUMBER {DNAME, DMGRSSN}

CSE4701

Chaps15&16-54

Determining FDs

Must Understand the Semantics of Data Based on Schema or Current/Future Instances

What are FDs Below?TEXT COURSE?COURSE TEXT?

What if I add a row to the table? Need to Understand the Potential Future Data

James Web Databases Al-Nour

CSE4701

Chaps15&16-55

Inference Rules for FDs

Given a set of FDs F, we can Infer Additional FDs that Hold whenever the FDs in F Hold

For Example, Consider:F = {SSN {EName, BDate, Address, DNumber},

DNumber {DName, DMGRSSN} } What are Additional FDs?

SSN EName SSN BDate SSN SSNSSN Address SSN DNumber

DNumber Dname DNumber DMGRSSN

CSE4701

Chaps15&16-56

Inference Rules for FDs Given a set of FDs F, we can infer additional FDs that

hold whenever the FDs in F hold Armstrong's inference rules:

IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ

(Notation: XZ stands for X U Z) IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

IR1, IR2, IR3 form a sound and complete set of inference rules These are rules hold and all other rules that hold can be

deduced from these

CSE4701

Chaps15&16-57

Inference Rules for FDs Some additional inference rules that are useful:

Decomposition: If X -> YZ, then X -> Y and X -> Z

Union: If X -> Y and X -> Z, then X -> YZ

Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z

The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

CSE4701

Chaps15&16-58

Summary of Inference Rules Armstrong’s Inference Rules

Derived Inference Rules

1. Reflective: If X Y, then X Y.2. Augmentation: If { X Y} then XZYZ.

3. Transitive: If { XY, YZ } then X Z.

4. Decomposition: If { XYZ } then X Y.

5. Additive (Union): If {XY, XZ } then X YZ.

6. Pseudotransitive: If {XY, WYZ } then W X Z.

CSE4701

Chaps15&16-59

Inference Rules for FDs

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F

Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X

X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

CSE4701

Chaps15&16-60

Algorithm 14.1 for Closure

Given a Set of FDs, Each X on LHS of a FD, Closure X+ are the Attributes Functionally Determined by X

Algorithm 14.1 Determining X+, Closure of X under F

X+ := X;

repeat

1. oldX+ := X+ ;

2. for each FD Y Z in F do

3. if X+ Y then X+ := X+ Z;

until (X+ = old X+ );

CSE4701

Chaps15&16-61

Recall Schema and its 3 FDs

CSE4701

Chaps15&16-62

Applying Algorithm 14.1Consider the following set of FDs:

Let F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }

Compute Closure by iterating through each FD

Closure of {SSN}+ First FD: SSN ENAMEAdd in All Attributes into {SSN}+Add in SSN and ENAME to get:

{SSN}+ = {SSN, NAME}Are Any Attributes in {SSN}+ LHS of another FD?

No – so stop the process

CSE4701

Chaps15&16-63

Applying Algorithm 14.1Consider the following set of FDs:

Let F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }

Closure So Far{SSN}+ = {SSN, NAME}

Next Closure of {PNUMBER}+ Second FD: PNUMBER {PNAME, PLOCATION}Add in All Attributes into {PNUMBER}+Add in PNAME and PLOCATION to get:{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}Are Any Attributes in {PNUMBER}+

in the LHS of another FD? No – so stop the process

CSE4701

Chaps15&16-64

Applying Algorithm 14.1Next Closure of {SSN, PNUMBER}+

For First FD: SSN ENAME Is {SSN, PNUMBER}+ SSN Yes – Add in ENAME since SSN ENAME to

get: {SSN, PNUMBER, ENAME}+

For Second FD: PNUMBER {PNAME, PLOCATION} Is {SSN, PNUMBER, ENAME}+ PNUMBER

Yes – Add in PNAME, PLOCATION RHS of FD

For Third FD: {SSN, PNUMBER} HOURS Is {SSN, PNUMBER, ENAME, PNAME, PLOC}+

{SSN,NUMBER} Yes – Add in HOURS RHS of FD into Set

{SSN}+ = {SSN, NAME}{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}{SSN,PNUMBER}+ = {SSN, PNUMBER, PNAME,

PLOCATION, HOURS}

CSE4701

Chaps15&16-65

SummaryLet F = {SSN ENAME,

PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS }

{SSN}+ = {SSN, ENAME}{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION } {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME,

PNAME, PLOCATION, HOURS}

For Example: Compute {SSN, PNUMBER}+

Is {SSN, PNUMBER} SSN? Yes - Add in ENAME

Is {SSN, PNUMBER, ENAME} PNUMBER? Yes - Add in PNAME, PLOCATION

Is {SSN, PNUMBER, ENAME, PNAME, PLOCATION} {SSN, PNUMBER}?

Yes - Add in HOURS

CSE4701

Chaps15&16-66

Interpreting Closure SetsWhat Does {SSN}+ = {SSN, ENAME} Equate to? SSN ENAME, SSN SSN, SSN SSN, ENAMEWhat about {PNUMBER}+ ={PNUMBER,PNAME,PLOCATION} PNUMBER PNAME

PNUMBER PLOCATION

PNUMBER PNUMBER

PNUMBER {PNAME, PLOCATION}

PNUMBER {PNUMBER, PLOCATION}

PNUMBER {PNAME, PNUMBER}

PNUMBER {PNUMBER, PNAME, PLOCATION}Can you do:{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME,

PNAME, PLOCATION, HOURS}

CSE4701

Chaps15&16-67

Problem 14.20

Compute {DNUMBER}+

Is {DNUMBER} SSN ? No - Continue to Next FD

Is {DNUMBER} DNUMBER ? Yes - Add in DNAME, DMGRSSN

{DNUMBER}+ = {DNUMBER, DNAME, DMGRSSN}

FYI: {SSN} + ={SSN, ENAME, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN}

Let G = {SSN {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER {DNAME, DMGRSSN} }

CSE4701

Chaps15&16-68

Inference Rule Properties

Theorem 1:

X A1A2... An Holds in a Relation Scheme R if and

only if X Ai Holds for all i = 1, ..., n Theorem 2:

Armstrong's Inference Rules are Sound and Complete Given a set F of Functional Dependencies:

By Sound we mean every FD that we can infer from F by using Armstrong’s Inference Rules is in F+

By Complete we mean every FD in F+ (that F implies) can be Inferred from F by using Armstrong’s Inference Rules

CSE4701

Chaps15&16-69

When are Sets of FDs E and F Equivalent?

E is Covered by F (F is said to cover E), if Every FD in E is also in F+

Two sets of FDs E and F are equivalent if: Every FD in E can be Inferred from F, and Every FD in F can be Inferred from E

Hence, F and G are Equivalent if F+=G+

Definition: F covers F if every FD in F can be Inferred from F (i.e., if F+ E+)

E and F are Equivalent if E Covers F and F Covers E How Can we Use Algorithm 14.1 for this Purpose?

CSE4701

Chaps15&16-70

Minimal Sets of FDs

A set of FDs F is Minimal if it satisfies the Conditions:(1) Every FD in F has a Single Attribute for its RHS(2) We can’t Remove any FD from F and have a set of

FDs that is equivalent to F(3) We can’t Replace any FD XA in F with a FD

YA , where Y X and still have a set of FDs that is Equivalent to F

Every set of FDs has an Equivalent Minimal Set There can be Several Equivalent Minimal sets There is no Simple Algorithm for Computing a

Minimal set of FDs that is Equivalent to a set F of FDs

CSE4701

Chaps15&16-71

Algorithm 14.2 for Minimal Cover

Algorithm 14.2 Finding a minimal cover G for F

1. Set G := F;

2. Replace each functional dependency X A1 A2 ... An in G by n functional dependencies X A1, …, X An

3. For each functional dependency X A in G

for each attribute B that is an element of X

if (( G - {X A }) {(X - {B}) A}) is equivalent to G then

replace X A with (X - {B}) A in G.

4. For each remaining functional dependency X A in Gif (G - {X A }) is equivalent to G then

remove X A from G.

CSE4701

Chaps15&16-72

Towards Normalization of Relations We take each Relation Individually and “Improve”

Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller

Relations that Results in No Information Loss Support for Reconstruction

No Spurious Joins Query Execution Time May Increase

Denormalization May Be Necessary Later on Objectives: Minimizing

Redundancy Insertion, Deletion, and Update Anomalies

CSE4701

Chaps15&16-73

What is the Normalization Process?

Provides DB Designers with the Ability to “Improve” their Relations

Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with

A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes

A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree

CSE4701

Chaps15&16-74

What are Normal Forms?

A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) - Chapter 15 Join Dependencies (5NF) - Chapter 15

5 NF4NF

3NF

2NF

1NF

CSE4701

Chaps15&16-75

How is Normalization Attained?

Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies

In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property

Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations

Dependency Preservation PropertyEnsures that each FD is Represented in some Individual Relation(s) after Decomposition

Premise: Relational Schema with Primary Keys and Functional Dependencies Specified

CSE4701

Chaps15&16-76

Recall Key Constraints

Superkey (SK): Any Subset of Attributes Whose Values are

Guaranteed to Distinguish Among Tuples Candidate Key (CK):

A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity)

A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple

There may be Multiple Candidate Keys

CSE4701

Chaps15&16-77

Recall Key Constraints

Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined

Foreign Key (FK): An Attribute or a Combination of Attributes (Say A)

of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain)

Allows Linkages Between Relations that are Tracked and Establish Dependencies

Useful to Capture ER Relationships

CSE4701

Chaps15&16-78

Superkeys vs. Candidate Keys

Superkey of R: A Superkey SK is a Set of Attributes of R Such that

No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK

Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk]

For Cars, Valid Superkeys Must Contain:SerialNo OR State, Reg# OR Both

For EMPLOYEE {SSN} is a Key and{SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are

all SUPERKEYS

CSE4701

Chaps15&16-79

Superkeys vs. Candidate Keys

Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a

Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey

Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A]

In Previous (State, Reg#, Make, Model) is SKIs it a CK?Why or Why Not?

CSE4701

Chaps15&16-80

Example and Remaining Definitions

Example: CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys)

Key1 = {State, Reg#} Key2 = {SerialNo}

{SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is

Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is

not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME

CSE4701

Chaps15&16-81

First Normal Form (1NF)

All Attributes Must Be Atomic Values: Only Simple and Indivisible Values in the Domain

of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued

Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value :

A Set of Values (Set-Value) A Tuple of Values (Nested Relation)

1NF is a Standard Assumption of Relation DBs

CSE4701

Chaps15&16-82

One Example of 1NF

Consider Following Department Relation What is the Inherent Problem?

DLOCATIONS is Multi-valued

CSE4701

Chaps15&16-83

What are Possible Solutions?

Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION)

Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below)

Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations

Problems with Each? Best Solution?

CSE4701

Chaps15&16-84

Another 1NF Example - Nested Relations

EMP_PROJ - Table and Tuples

Transition to:

CSE4701

Chaps15&16-85

Second Normal Form (2NF)

Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies

Intuitively: A Relation Schema R is in Second Normal Form

(2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key

R can be Decomposed into 2NF Relations via the Process of 2NF Normalization

Successful Process Typically Involves Decomposing R into Two or More Relations

Iteratively Applying to Each Relation in Schema

CSE4701

Chaps15&16-86

Full Functional Dependency

Full FD - Formally:Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over R, then Y is fully dependent on X, denoted as XY

Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER} HOURS is full since Neither

SSN -> HOURS nor PNUMBER HOURS holds What about in the Following:

f

{S#, CN}Grade

CSE4701

Chaps15&16-87

Partial Functional Dependency

Partial FD - Formally:Given R(U) and X, YU. If XY holds but Y is not fully dependent on X ( XY), then Y is partially functional dependent on X, denoted by XY

Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER} ENAME is Partial since

Removing PNUMBER still Results in the Valid FD SSN ENAME

Are Following Full or Partial?

p

{S#, CN}CN, {S#, CN}S#

{S#, CN, DNAME}Grade

f

CSE4701

Chaps15&16-88

Second Normal Form (2NF)

Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully

Functional Dependent on Every Key. Alternative Definition:

R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key.

Reason: Partial Functional Dependencies may cause Update Problems

CSE4701

Chaps15&16-89

Another Way to View the Problem If the Primary Key Contains a Single Attribute, than No

Need to Test for Problems This is 1NF but not 2NF since

Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN)

Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)

CSE4701

Chaps15&16-90

One Example of 2NF

Consider the Example Below


STUDENT_DEPT 1NF

“{S#, CN} DName, DHead” since S# DName and DName DHead is a Partial FD causes Anomalies

But STUDENT_DEPT 2NF

S# DHead CN GradeDName

fd1

fd2

fd3

CSE4701

Chaps15&16-91

Recall the Anomalies…

Insertion Anomalies: No Department Can Be Recorded if it has No

Student Who Enrolls Courses Deletion Anomalies:

Delete the Last Student in a Department will also Delete the Department

Update Anomalies: Change a Head of a Department must Modify All

Students in that Department Due to Redundancies


CSE4701

Chaps15&16-92

One Example of 2NF (Continued)

Decomposition into 2NF by Separating Course Information from Department Information (Link S#)

S_D(S#, DName, DHead)

DHeadDName

fd2

fd3

S#

S_C(S#, CN, Grade)

fd1

S# CN Grade

CSE4701

Chaps15&16-93

Another Example of 2NF

EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN ENAME - Means ENAME, a Non-Prime

Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both

PNUMBER {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both

CSE4701

Chaps15&16-94


What Does Decomposition Below Accomplish? ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER

Result: 2NF for EP1, EP2, and EP3

CSE4701

Chaps15&16-95

Yet Another Example of 2NF

Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem?

FD3: COUNTY_NAME TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#}

All Other Non-Prime Attributes are Fine

CSE4701

Chaps15&16-96


What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME

Result: 2NF for LOTS1 and LOTS2

CSE4701

Chaps15&16-97

Third Normal Form (3NF)

Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies

Intuitively: A Relation Schema R is in Third Normal Form

(3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key

R can be Decomposed into 3NF Relations via the Process of 3NF Normalization

In XY and Y Z , with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN Emp# Salary isn’t Problem Since Emp# is a Candidate Key

CSE4701

Chaps15&16-98

Transitive Partial FDs

Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X.

Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN ENAME is non-transitive Since there is no set of

Attributes X where SSN X and X ENAME For FD X Z that can be derived from two FDs XY

and YZ, if Y is a Candidate Key – No Problem

CSE4701

Chaps15&16-99

Third Normal Form (3NF)

Formal 3NF Definition R 3NF iff

(i) R 2NF;

(ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key.

Alternative Definition: R 3NF iff for every FD X Y, either X is a superkey, or Y is a key attribute.

Reason: Transitive Functional Dependencies may cause Update Problems

CSE4701

Chaps15&16-100

One Example of 3NFSTUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF

S_C(S#, CN, Grade) 2NFS_D(S#, DName, DHead) 2NF S_D 3NF

S_C 3NF

“S# DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X) Dname (Y) and DName (Y) DHead (Z)

S#DHead

S# DHead CN GradeDNAME

fd1

fd2

fd3

CSE4701

Chaps15&16-101

One Example of 3NF

S_C(S#, CN, Grade) 2NF

S_D(S#, DName, DHead) 2NF

S_D (S#, DName)

DEPT(DName, DHead)3NF

fd2 S# DName

fd3 DName DHead

DHeadDNameS#fd S# DHead

Decompose to Eliminate the Transitivity Within S_D

CSE4701

Chaps15&16-102


EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN DNUMBER and DNUMBER DNAME

Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN

SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and

DMGRSSN via DNUMBER

CSE4701

Chaps15&16-103


To Attain 3NF, Decompose into ED1 and ED2 Intuitively - we are Separating Out Employees and

Departments from One Another

CSE4701

Chaps15&16-104


Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn.

In LOTS1, FD4 AREA PRICEAREA is not a SuperkeyPRICE not a Prime Attribute of LOTS1

CSE4701

Chaps15&16-105


Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B

CSE4701

Chaps15&16-106

1NF and 2NF – Maintain FDs!

CSE4701

Chaps15&16-107

Transition to 3NF – Maintain FDs!

CSE4701

Chaps15&16-108

Summary of Progression – Maintain FDs!STUDENT_DEPT

1NF

S# DHead CN GradeDName

fd1

fd2

fd3

S_C S_D2NF

eliminate partial FDs

fd1

S# CN Grade DHeadDName

fd2

fd3

S#

DHead

S#S_D

DName

DEPT

S_C

3NF

eliminate transitive FDs

fd1

S# CN Grade

DName

fd3

fd2

CSE4701

Chaps15&16-109

Summary of 1NF, 2NF, 3NF ConceptsTest Remedy (Normalization)

1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation. or nested relations.

2NF For relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key. functionally dependent on it.

3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.

CSE4701

Chaps15&16-110

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs

Intuitively: A Relation Schema R is in Boyce-Codd Normal

Form (BCNF) if Whenever an FD X A Holds in R, then X is a Superkey of R

R can be Decomposed into BCNF Relations via the Process of BCNF Normalization

There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF)

CSE4701

Chaps15&16-111

Boyce-Codd Normal Form (BCNF)

Formal BCNF Definition R BCNF iff

(i) R 1NF;

(ii) for every FD X Y, X is a Superkey, i.e., if X Y and YX, then X Contains a Key.

Properties of BCNF R BCNF iff for every FD X Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that

they do not Belong to No Attribute Fully Dependent on any Set of Non-key

Attributes

CSE4701

Chaps15&16-112

Comparing the Normal Forms

1NF

2NF

3NF

BCNF

Eliminate the non-trivial functional

dependencies of non-key

attributes to key

Eliminate partial FDs of non-key attributes to key

Eliminate transitive FDs of non-key attributes to key

Eliminate partial and transitive FDs of key attributes to key

Poor Relational Schema DesignDeveloped as Stepping Stone

Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies

CSE4701

Chaps15&16-113

One Example of BCNF

Recall 3NF Solution for Building Lots Problem Suppose that AREA is Sizes in Acres with

AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0

Adding FD5: “AREA COUNTYNAME” What Does Data in LOTS1A Look like for Given Set

of Properties?

CSE4701

Chaps15&16-114

LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9

One Example of BCNF

What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination

Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A

Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA COUNTY_NAME”

CSE4701

Chaps15&16-115

Transition to BCNF – Maintain FDs!

Add new FD5

CSE4701

Chaps15&16-116

One Example of BCNF

FD5: “AREA COUNTY_NAME” Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A

So Do One More Split

CSE4701

Chaps15&16-117

One Example of BCNF

LOTS1AX PROPERTY_ID# LOT# AREAT11 L1 0.5 T12 L2 0.8 W13 L6 1.5 W11 L1 1.1W12 L4 1.6 T10 L3 0.9

LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6T10 Tolland L3 0.9

LOTS1AY AREA COUNTY_NAME0.5 Tolland... Tolland1.0 Tolland1.1 Windham... Windham2.0 Windham

CSE4701

Chaps15&16-118

Consider the TEACH Relation:

in 3NF but NOT BCNF with FD1: {STUDENT, COURSE} INSTRUCTOR FD2: INSTRUCTOR COURSE

3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT)

All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and

Doesn’t Generate any Spurious Tuples

TEACH(STUDENT, COURSE, INSTRUCTOR)

Another Example of BCNF

CSE4701

Chaps15&16-119

What Does Table Look Like?

Note TEACH in 3NF but NOT BCNF

CSE4701

Chaps15&16-120

Reflections on Normalization

Normalization A Tool for Validating the Quality of the Schema,

Rather than Merely as a Method for Designing a Relational Schema

Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema

Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Top-

down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions

CSE4701

Chaps15&16-121

Relational DB Design Process

Normalization Process Focused on Decomposition Raises Number of Questions

How do we Decompose a Schema into a Desirable Normal Form?

What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema?

Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition?

CSE4701

Chaps15&16-122

S# DName DHeadR = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

S1S2S3S4

D1D1D2D3

JohnJonhSmithBlack

Recall Transitive FD/Update Anomalies

S# Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head

Information cannot be Stored in this Database Update Head of Any Department Requires an

Update to Every Student Enrolled in the Dept.

CSE4701

Chaps15&16-123

What are Possible Decompositions?

S#

S1S2S3S4

D1D1D2D3

DHead

JohnJohnSmithBlack

DName

Information Based

R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

= { R1(S#, ), R2(DName, R3(DHead, )}

is Neither Lossless nor FD-Preserving

CSE4701

Chaps15&16-124


S# DName

S1S2S3S4

D1D1D2D3

S# DHead

S1S2S3S4

JohnJohnSmithBlack

• Lossless Decomposition but not Dependency-Preserving

• DNameDHead is lost in the decomposition


= { R1({S# ,DName}, {S#DName}),

R2({S#, DHead}, {S#DHead})}

2is Lossless but not FD-Preserving

CSE4701

Chaps15&16-125


S# DName

S1S2S3S4

D1D1D2D3

DName DHead

JohnJohn

D1D1D2D3

Lossless & dependency-preserving decomposition


= { R1({S# ,DName}, {S# DName})

R3({DName, DHead}, {Dname DHead})}

is both Lossless and FD-Preserving

CSE4701

Chaps15&16-126

Summary of Normalization

2NF

3NF

BCNF

1NF

Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key

Lossless Decompositionbut not Dependency Preserving

Lossless Decompositionand Dependency Preserving

CSE4701

Chaps15&16-127

The Entire Normalization Picture1NF

2NF

3NF

BCNF

Eliminate Partial FDs of Non-prime Attributes to Key

Eliminate Transitive FDs of Non-prime Attributes to Key

Eliminate Partial and Transitive FDs of Prime Attributes to Key

4NF

Eliminate Non-trivial and Non-functional Multi-Valued Dependencies

5NF

Eliminate Join Dependencies that are Not Implied by Candidate Key

CSE4701

Chaps15&16-128

What are Multi-Valued Dependencies?

Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X

Corresponds to Multiple Values of Y Consider EMP with MVDs:

ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)

CSE4701

Chaps15&16-129

What is Fourth Normal Form (4NF)?

A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X Y in F+, X is a Superkey for R

Reconsider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)

ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish

We need to Decompose EMP!

CSE4701

Chaps15&16-130

Decomposition into 4NF

ENAME PNAME is Trivial MVD: ENAME PNAME is

Equal to EMP_PROJECTS (same for ENAME DNAME)

CSE4701

Chaps15&16-131

What about the Supply Table?

In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects

Removes Join Dependencies – Many-many-many

ANN.132

CSE4701

Problem 15.29 6th ed

Consider the table:Orders (O#,I#,Odate, Cust#, Total_amount, Qty_ordered, Total_price, Discount%)

What are the Functional Dependencies?:

Is it in 2 NF ?

O# Odate Cust# Total_amt

O# I# Qty_ordered Total_price Discount%

No! Odate, Cust#, Total_amt Partially Dependent on only Part of Key O# I# namely O#

ANN.133

CSE4701


How do you Fix the Single Orders Table?

Is it in 3NF? Yes – No transitive functional

dependencies XY and Y Z , with X as the Primary Key, where Y is not a candidate key

Create Two Tables Order (O#,Odate, Cust#, Total_amount)OrderedItem (O#,I#, Qty_ordered, Total_price, Discount%)

ANN.134

CSE4701


Consider the table:CAR_SALE( Car# , Date_sold, Salesman#, Commision%, Discount_amt)

Assumptions Car can be sold by Multiple Salepersons Thus, Primary Key: Car# , Salesman# w/ FDs

Car# Date_sold Car# Discount_amt Car# Salesman# Salesman# Commission%

Is it in 2NF?

No! Car# Date_sold and Car# Discount_amt are not FFD on Primary Key – since Depend on Only Part

ANN.135

CSE4701

Problem 15.30

How Do you Convert to 2NF? Recall FDs

Split into Three Tables

CAR1( Car# , Date_sold, Discount_amt)CAR2( Car# , Salesman#)CAR3(Salesman#, Commision%)

Car# Date_soldCar# Discount_amtCar# Salesman# Salesman# Commission%

ANN.136

CSE4701


Consider the table:TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)

Assumptions and FDs Patient Treated by Physician on Date with a Diagnosis,

Treatment Code, and Charge Every Treatment Code has a Charge {Doctor#, Patient#, Date}{Diagnosis, Treat_code, Charge} {Treat_code}{Charge}

Is it in 3NF?

No - Convert

ANN.137

CSE4701

Problem 15.33

What do you look for? Transitivity What is the Problem?

TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code)

BillingAmount(Treat_code, Charge)

Charge a non-Key Attribute is Dependent on Treat Code another non-Key Attribute

Split into Two Tables FDs Now are Fine

ANN.138

CSE4701

Problem 15.35

Given Table Below - What is Candidate Key? BOOK (Book_Name, Author, Edition, Year)

Book_Name, Author, Edition Why? Sometimes Edition Issues Twice in 1 Year

What is the main FD? Book_Name, Edition Year

Is it in 2NF? No: Since Year Dependent on only Part of Key

Convert:

BOOK (Book_Name, Author, Edition)BOOK_YEAR (Book_Name, Edition, Year)

ANN.139

CSE4701

Problem 15.35

What are the Multi-Valued Dependencies?

How Do You Separate the Dependencies

BOOK(Book_Name, Author, Edition)BOOK_YEAR(Book_Name, Edition,Year)

Book_Name AuthorBook_Name Edition.

SPLIT INTO THREE TABLES

BOOK (Book_Name, Edition)BOOK_AUTHOR (Book_Name, Edition, Author)BOOK_YEAR (Book_Name, Edition, Year)

CSE4701

Chaps15&16-140

Concluding Remarks What have we Learned in Chapter 14?

Guidelines for “Good” Relational Design Avoiding Anomalies Functional Dependencies Augment Schema Normalization “Improves” Design Lossless Joins and Dependency Preservation Quick Look at 4NF (Informally)

How is Chapter 14 Related to the Semester Project? Phase II in the Semester Project

Step 1: ER to Relational Transformation (Chapter 9)Step 2: Relational Normalization (Chapter 14) which

Includes Identification of FDs!

CSE 4701 Chaps15&16-1 Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and...

Documents

Transcript of CSE 4701 Chaps15&16-1 Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and...