Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling...
Transcript of Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling...
![Page 1: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/1.jpg)
1
Data ModelingCMPSCI 445
Fall 2008
![Page 2: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/2.jpg)
Exercise
• Design a schema to store the data used by iTunes (or a similar application)
• Things to keep in mind:– Can all necessary information be represented?– Are properties represented in more than one
place?– Are common operations efficient?
2
![Page 3: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/3.jpg)
iTunes information
• Database includes: songs, artists, albums, album artwork, playlists
• Sample fields include: song name, artist, year, albumName, trackNum, genre, lastPlayedDate, playCount, albumArtwork, songRating, albumRating?, albumReleaseDate, artistDOB
• playLists: an ordered list of songs 3
![Page 4: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/4.jpg)
4
Relational Schema Design
PersonbuysProduct
name
price name ssn
ER Model
Relational Schemaplus Integrity Constraints
Normalized schema
Conceptual Design
Logical design
SchemaRefinement
![Page 5: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/5.jpg)
5
Entity / Relationship Diagrams
Entity sets Product
address
buys
Attributes
Relationships
![Page 6: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/6.jpg)
6address name ssn
Person
buys
makes
employs
CompanyProduct
name category
stockprice
name
price
![Page 7: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/7.jpg)
7
Keys in E/R Diagrams
• Every entity set must have a key
Product
name category
price
![Page 8: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/8.jpg)
8
Multiplicity of E/R Relations
• one-one:
• many-one
• many-many
123
abcd
123
abcd
123
abcd
![Page 9: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/9.jpg)
9address name ssn
Person
buys
makes
employs
CompanyProduct
name category
stockprice
name
price
“A product is made by at most one company”
![Page 10: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/10.jpg)
10
Referential Integrity Constraints
CompanyProduct makes
CompanyProduct makes
Each product made by at most one company.(Some products made by no company)
Each product made by exactly one company.
![Page 11: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/11.jpg)
11
Multi-way Relationships
Purchase
Product
Person
Store
![Page 12: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/12.jpg)
Q: what does the arrow mean ?
Rental
VideoStore
Person
Movie
Invoice
Arrows in Multiway Relationships
A: if I know the store, person, invoice, I know the movie too
![Page 13: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/13.jpg)
13
Q: what do these arrow mean ?
Rental
VideoStore
Person
Movie
Invoice
Arrows in Multiway Relationships
A: store, person, invoice determines movie and store, invoice, movie determines person
![Page 14: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/14.jpg)
14
Roles in Relationships
Purchase
What if we need an entity set twice in one relationship?
Product
Person
Store
salesperson buyer
![Page 15: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/15.jpg)
15
Attributes on Relationships
Purchase
Product
Person
Store
date
![Page 16: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/16.jpg)
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
Subclasses
![Page 17: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/17.jpg)
17
Design Principles
PurchaseProduct Person
What’s wrong?
![Page 18: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/18.jpg)
18
Design Principles:What’s Wrong?
Purchase
Product
Store
date
personNamepersonAddr
Moral: pick the right kind of entities.
![Page 19: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/19.jpg)
19
Design Principles:What’s Wrong?
Purchase
Product
Person
Store
dateDates
Moral: don’t complicate life unnecessarily.
![Page 20: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/20.jpg)
20
From E/R Diagramsto a Relational Schema
• Entity set relation• Relationship relation
![Page 21: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/21.jpg)
21
Entity Set to Relation
Product
name category
price
Product(name, category, price)
name category price
gizmo gadgets $19.99
![Page 22: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/22.jpg)
22
Relationships to Relations
makes CompanyProduct
name category
Stock price
name
Makes(product-name, product-category, company-name, year)
Start Yearprice
(watch out for attribute name conflicts)
![Page 23: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/23.jpg)
23
Relationships to Relations
makes CompanyProduct
name category
Stock price
name
No need for Makes. Modify Product:
Product(name, category, price, startYear, companyName)
Start Yearprice
![Page 24: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/24.jpg)
24
Multi-way Relationships to Relations
Purchase
Product
Person
Storename price
ssn name
name address
Purchase(prodName,stName,ssn)
![Page 25: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/25.jpg)
25
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
Name Price Category
Gizmo 99 gadget
Camera 49 photo
Toy 39 gadget
Name platforms
Gizmo unix
Name Age Group
Gizmo todler
Toy retired
Product
Sw.Product
Ed.Product
![Page 26: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/26.jpg)
Normalization
26
![Page 27: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/27.jpg)
Evils of Redundancy• When a database schema is poorly designed we get
anomalies.• Redundancy is at the root of several problems
associated with relational schemas:
Redundant storage: data is repeatedUpdate anomalies: need to change in several placesInsertion anomalies: may not be able to add data we want toDeletion anomalies: may lose data when we donʼt want to
![Page 28: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/28.jpg)
AnomaliesHourly_emps( ssn, name, lot, rating, hourly_wages, hours_worked)
Suppose hourly wages is determined by rating. rating → hourly_wages
ssn name lot rating hourly_wages hours_worked
29 brutus 48 8 10 40
85 art 22 8 10 30
95 bob 35 5 7 30
96 frodo 35 5 7 32
22 dustin 35 8 10 40
•Redundant storage: association between rating 8 and hourly wages 10 repeated 3 times. •Update anomalies: hourly_wages updated in first tuple but not second•Insertion anomalies: must know hourly_wage for rating value•Deletion anomalies: delete all tuples with certain rating value, lost assoc.
![Page 29: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/29.jpg)
Can null values fix problems?• Not really.• Insertion anomaly:
– What if we know rating and hourly_wages for some rating, but there is no employee with that rating?
– No. ssn canʼt be null.• Deletion anomaly:
– If last employee with some rating and hourly_wages value is deleted, replace with nulls?
– No. ssn canʼt be null.29
![Page 30: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/30.jpg)
Schema Refinement
• Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.
• Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).
• Decomposition should be used judiciously: Is there reason to decompose a relation? What problems (if any) does the decomposition cause?
29
![Page 31: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/31.jpg)
Data Anomalies
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number: what is his city ?
Persons may have several phones:
SSN Name, City
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
but not SSN PhoneNumber
![Page 32: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/32.jpg)
32
Relation DecompositionBreak the relation into two:
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joeʼs phone number (how ?)
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
![Page 33: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/33.jpg)
33
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
![Page 34: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/34.jpg)
34
Lossless Decomposition
• Sometimes it is correct:Name Price Category
Gizmo 19.99 GadgetOneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99OneClick 24.99
Name Category
Gizmo GadgetOneClick Camera
Gizmo Camera
![Page 35: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils](https://reader034.fdocuments.net/reader034/viewer/2022050411/5f883f1d68ff4051031934b2/html5/thumbnails/35.jpg)
• Sometimes it is not:
Lossy Decomposition
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
Whatʼswrong ??