The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema...

30
1 1 The Relational Data Model Lecture 6 2 Outline Relational Data Model Functional Dependencies Logical Schema Design Reading Chapter 8

Transcript of The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema...

Page 1: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

1

1

The Relational Data Model

Lecture 6

2

Outline

• Relational Data Model• Functional Dependencies• Logical Schema Design

ReadingChapter 8

Page 2: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

2

3

The Relational Data Model

DataModeling

Relational Schema

Physicalstorage

E/R diagrams Tables: column names: attributes rows: tuples

Complexfile organizationand index structures.

Have seenthis in SQL

Have seenthis too

Discuss next

4

Terminology

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

Tuples or rows or records

Attribute namesTable name or relation name

Products:

Page 3: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

3

5

SchemasRelational Schema:

– Relation name plus attribute names– E.g. Product(Name, Price, Category, Manufacturer)– In practice we add the domain for each attribute

Database Schema– Set of relational schemas– E.g. Product(Name, Price, Category, Manufacturer),

Company(Name, Address, Phone), . . . . . . .

6

Instances• Relational schema = R(A1,…,Ak)

Instance = relation with k attributes (of “type” R)– values of corresponding domains

• Database schema = R1(…), R2(…), …, Rn(…)Instance = n relations, of types R1, R2, ..., Rn

This is all mathematics, not to be confused with SQL tables !(What's a difference?)

Page 4: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

4

7

Example

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

Relational schema:Product(Name, Price, Category, Manufacturer)Instance:

8

First Normal Form (1NF)• A database schema is in First Normal

Form if all tables are flat

3.9Carol

3.7Bob

3.8Alice

CoursesGPAName

OS

DB

Math

OS

DB

OS

Math

Student

3.9Carol3.7Bob3.8Alice

GPAName

Student

Course

OSDBMath

OSCarolOSAliceDBBob

AliceCarolAliceStudent Course

DBMathMath

Takes Course

May needto add keys

Page 5: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

5

9

Functional Dependencies

• A form of constraint– hence, part of the schema

• Finding them is part of the database design• Also used in normalizing the relations

• Warning: this is the most abstract, and“hardest” part of the course.

10

Functional DependenciesDefinition:

If two tuples agree on the attributes

then they must also agree on the attributes

Formally:

A1, A2, …, An B1, B2, …, Bm

A1, A2, …, An

B1, B2, …, Bm

Page 6: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

6

11

Examples

• EmpID Name, Phone, Position• Position Phone• but Phone Position

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer

12

In General• To check A B, erase all other columns

• check if the remaining relation is many-one(called functional in mathematics)

… A … B

X1 Y1

X2 Y2

… …

Page 7: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

7

13

Example

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer

Position Phone

14

Typical Examples of FDs

Product: name price, manufacturer

Person: ssn name, age

Company: name stockprice, president

Page 8: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

8

15

Example

Product(name, category, color, department, price)

name colorcategory departmentcolor, category price

Consider these FDs:

What do they say ?

16

ExampleFD’s are constraints on relations:• On some instances they hold• On others they don’t

99ToysGreenGadgetTweaker

49ToysGreenGadgetGizmo

pricedepartmentcolorcategoryname

Does this instance satisfy all the FDs ?

name colorcategory departmentcolor, category price

Page 9: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

9

17

Example

59Office-supp.GreenStationaryGizmo

99ToysBlackGadgetTweaker

49ToysGreenGadgetGizmo

pricedepartmentcolorcategoryname

What about this one ?

name colorcategory departmentcolor, category price

18

Example

If some FDs are satisfied, thenothers are satisfied too

If all these FDs are true:name colorcategory departmentcolor, category price

Then this FD also holds: name, category price

Why ??

Page 10: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

10

19

Inference Rules for FD’s

Is equivalent to

Splitting rule and Combining rule

Bm...B1Am...A1

A1, A2, …, An B1, B2, …, Bm

A1, A2, …, An B1A1, A2, …, An B2 . . . . .A1, A2, …, An Bm

20

Inference Rules for FD’s(continued)

Trivial Rule

Why ?Am…A1

where i = 1, 2, ..., n

A1, A2, …, An Ai

Page 11: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

11

21

Inference Rules for FD’s(continued)

Transitive Closure Rule

If

and

then

Why ?

A1, A2, …, An B1, B2, …, Bm

B1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, Cp

22

...C1 CpBm…B1Am…A1

Page 12: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

12

23

Example (continued)

Start from the following FDs:

Infer the following FDs:

1. name color2. category department3. color, category price

8. name, category price7. name, category color, category6. name, category category5. name, category color4. name, category name

Which Ruledid we apply ?Inferred FD

24

Example (continued)Answers:

Transitivity on 3, 78. name, category priceSplit/combine on 5, 67. name, category color, categoryTrivial rule6. name, category categoryTransitivity on 4, 15. name, category colorTrivial rule4. name, category name

Which Ruledid we apply ?Inferred FD

1. name color2. category department3. color, category price

Page 13: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

13

25

Another Example• Enrollment(student, major, course, room, time)

student majormajor, course roomcourse time

What else can we infer ?

26

Another Rule

If

then

Augmentation follows from trivial rules and transitivityHow ?

A1, A2, …, An B

A1, A2, …, An , C1, C2, …, Cp B

Augmentation

Page 14: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

14

27

Problem: infer ALL FDsGiven a set of FDs, infer all possible FDs

How to proceed ?• Try all possible FDs, apply all 3 rules

– E.g. R(A, B, C, D): how many FDs are possible ?• Drop trivial FDs, drop augmented FDs

– Still way too many• Better: use the Closure Algorithm (next)

28

Closure of a set of AttributesGiven a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B

name colorcategory departmentcolor, category price

Example:

Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Page 15: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

15

29

Closure AlgorithmStart with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X. {name, category}+ =

{name, category, color, department, price}

name colorcategory departmentcolor, category price

Example:

30

Example

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

R(A,B,C,D,E,F) A, B CA, D EB DA, F B

Page 16: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

16

31

Using Closure to Infer ALL FDsA, B CA, D BB D

Example:

Step 1: Compute X+, for every X:

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD

Step 2: Enumerate all FD’s X Y, s.t. Y ⊆ X+ and X∩Y = ∅:

AB CD, ADBC, ABC D, ABD C, ACD B

32

Problem: Finding FDs• Approach 1: During Database Design

– Designer derives them from real-world knowledge ofusers

– Problem: knowledge might not be available• Approach 2: From a Database Instance

– Analyze given database instance and find all FD’ssatisfied by that instance

– Useful if designers don’t get enough information fromusers

– Problem: FDs might be artifical for the given instance

Page 17: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

17

33

Find All FDs

020CircuitsEEFrank045DBCSEElsa050JavaCSEDan045DBCSECarol040HWEEAlice020C++CSEBob020C++CSEAlice

RoomCourseDeptStudent

Do all FDsmake sensein practice ?

34

Answer

Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course

Do all FDsmake sensein practice ?

Page 18: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

18

35

Keys

• A key is a set of attributes A1, ..., An s.t. forany other attribute B, we have A1, ..., An B

• A minimal key is a set of attributes which isa key and for which no subset is a key

• Note: book calls them superkey and key

36

Computing Keys

• Compute X+ for all sets X• If X+ = all attributes, then X is a key• List only the minimal keys

Note: there can be many minimal keys !• Example: R(A,B,C), ABC, BCA

Minimal keys: AB and BC

Page 19: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

19

37

Examples of Keys• Product(name, price, category, color)

name, category pricecategory color

Keys are: {name, category} and all supersets

• Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time

38

Relational Schema Design(or Logical Schema Design)

Main idea:• Start with some relational schema• Find out its FD’s• Use them to design a better relational

schema

Page 20: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

20

39

Data AnomaliesWhen a database is poorly designed we get

anomalies:

Redundancy: data is repeated

Update anomalies: need to change in several places

Delete anomalies: may lose data when we don’t want

40

Relational Schema Design

Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:

what is his city ?

Example: Persons with several phones

SSN Name, City

Westfield908-555-2121987-65-4321JoeSeattle206-555-6543123-45-6789FredSeattle206-555-1234123-45-6789FredCityPhoneNumberSSNName

but not SSN PhoneNumber

Page 21: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

21

41

Relation DecompositionBreak the relation into two:

Westfield987-65-4321JoeSeattle123-45-6789FredCitySSNName

908-555-2121987-65-4321206-555-6543123-45-6789206-555-1234123-45-6789PhoneNumberSSN

Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)

Westfield908-555-2121987-65-4321Joe

Seattle206-555-6543123-45-6789Fred

Seattle206-555-1234123-45-6789Fred

CityPhoneNumberSSNName

42

Relational Schema DesignPersonbuysProduct

name

price name ssn

Conceptual Model:

Relational Model:plus FD’s

Normalization:Eliminates anomalies

Page 22: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

22

43

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

44

Decomposition

• Sometimes it is correct:

Camera19.99GizmoCamera24.99OneClickGadget19.99Gizmo

CategoryPriceName

19.99Gizmo24.99OneClick19.99Gizmo

PriceName

CameraGizmoCameraOneClickGadgetGizmo

CategoryName

Lossless decomposition

Page 23: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

23

45

Incorrect Decomposition

• Sometimes it is not:

Camera19.99GizmoCamera24.99OneClickGadget19.99Gizmo

CategoryPriceName

CameraGizmoCameraOneClickGadgetGizmo

CategoryName

Camera19.99Camera24.99Gadget19.99

CategoryPrice

What’sincorrect ??

Lossy decomposition

46

Decompositions in GeneralR(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

If A1, ..., An B1, ..., Bm Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Example: name price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An C1, ..., Cp

Page 24: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

24

47

Normal FormsFirst Normal Form = all attributes are atomic

Second Normal Form (2NF) = old and obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

48

Boyce-Codd Normal FormA simple condition for removing anomalies from relations:

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R.

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

Page 25: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

25

49

BCNF Decomposition Algorithm

A’s OthersB’s

R1

Is there a 2-attribute relation that isnot in BCNF ?

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2Until no more violations

R2

50

Example

What are the dependencies?SSN Name, City

What are the keys?{SSN, PhoneNumber}

Is it in BCNF?

Westfield908-555-1234987-65-4321JoeWestfield908-555-2121987-65-4321JoeSeattle206-555-6543123-45-6789FredSeattle206-555-1234123-45-6789FredCityPhoneNumberSSNName

Page 26: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

26

51

Decompose it into BCNF

Westfield987-65-4321JoeSeattle123-45-6789FredCitySSNName

908-555-1234987-65-4321908-555-2121987-65-4321206-555-6543123-45-6789206-555-1234123-45-6789PhoneNumberSSN

SSN Name, City

Let’s check anomalies:• Redundancy ?• Update ?• Delete ?

52

Summary of BCNFDecomposition

Find a dependency that violates the BCNF condition:

A’sOthers B’s

R1 R2

Heuristics: choose B , B , … B “as large as possible”1 2 m

Decompose:

2-attribute relations are BCNF

Continue untilthere are noBCNF violationsleft.

A1, A2, …, An B1, B2, …, Bm

Page 27: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

27

53

Example DecompositionPerson(name, SSN, age, hairColor, phoneNumber)

SSN name, ageage hairColor

Decompose in BCNF (in class):

Step 1: find all keys (How ? Compute S+, for various sets S)

Step 2: now decompose

54

Other Example

• R(A,B,C,D) A B, B C

• Key: AD• Violations of BCNF: A B, A C, ABC• Pick A BC: split into R1(A,BC) R2(A,D)• What happens if we pick A B first ?

Page 28: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

28

55

Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decompose

Recover

56

Lossless Decompositions

• Given R(A,B,C) s.t. AB, thedecomposition into R1(A,B), R2(A,C) islossless

Page 29: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

29

57

3NF: A Problem with BCNFUnit Company Product

Unit Company

Unit Product

FD’s: Unit → Company; Company, Product → UnitSo, there is a BCNF violation, and we decompose.

Unit → Company

No FDs

Notice: we loose the FD: Company, Product Unit

58

So What’s the Problem?

Unit Company Product

Unit Company Unit ProductGalaga99 UW Galaga99 databasesBingo UW Bingo databases

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again (anomalies?):

Galaga99 UW databasesBingo UW databases

Violates the dependency: company, product -> unit!

Page 30: The Relational Data Model - EPFLlsir...2 3 The Relational Data Model Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex

30

59

Solution: 3rd Normal Form (3NF)A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An → Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.

Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies