Zvi’s changes, if any, are marked in green, they are not copyrighted by the authors, and the...
-
date post
19-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Zvi’s changes, if any, are marked in green, they are not copyrighted by the authors, and the...
Zvi’s changes, if any, are marked in Zvi’s changes, if any, are marked in green, they are not copyrighted by the green, they are not copyrighted by the
authors, and the authors are not authors, and the authors are not responsible for themresponsible for them
©Silberschatz, Korth and Sudarshan7.2Database System Concepts
!!Chapter 7: Relational Database DesignChapter 7: Relational Database Design
Our goal in this chapter is to formalize your intuition from the entity-relationship design approach. It’s actually kind of fun.
If you are into the intellectual aesthetic of what we are doing, notice that with ER, we captured notions that are common to all database applications: 1-1, 1-many, many-to-many relationships, ISA relationships, etc.
When I design a database application I draw those entities and relationships, but when I get down to the details, I use “normalization theory” (aka database design theory) to be sure I get this right.
©Silberschatz, Korth and Sudarshan7.3Database System Concepts
First Normal FormFirst Normal Form
Domain is atomic if its elements are considered to be indivisible units
• Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up into parts
A relational schema R is in first normal form if the domains of all attributes of R are atomic
Non-atomic values complicate storage and encourage redundant (repeated) storage of data
• E.g. Set of accounts stored with each customer, and set of owners stored with each account
• We assume all relations are in first normal form
©Silberschatz, Korth and Sudarshan7.4Database System Concepts
First Normal Form (Contd.)First Normal Form (Contd.)
Atomicity is actually a property of how the elements of the domain are used.
• E.g. Strings would normally be considered indivisible
• Suppose that students are given course numbers which are strings of the form CS0012 or EE1127
• If the first two characters are extracted to find the department, the domain of course numbers is not atomic.
• Doing so is a bad idea: leads to encoding of information in application program rather than in the database.
In practice people do it to answer many queries and that is why SQL DML allows it, but we are in relational algebra…
Story of a certain French bank….
©Silberschatz, Korth and Sudarshan7.5Database System Concepts
Pitfalls in Relational Database DesignPitfalls in Relational Database Design
Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to
• Repetition of Information.
• Inability to represent certain information.
Design Goals:
• Avoid redundant data
• Ensure that relationships among attributes are represented
• Facilitate the checking of updates for violation of database integrity constraints.
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.6Database System Concepts
A canonical exampleA canonical example
We first consider the relation schema R=R(E,G,S), where
• E: Employee number
• G: Grade (as in rank within a company)
• S: Salary
The users specified for us a set of semantic constraints:
• Each value of E has a single value of G associated with it.
• Each value of G has a single value of S associated with it.
For simplicity, we will sometimes refer to relations schemes by listing their attributes; thus we may write EGS instead of R above.
We will spend a lot of time discussing this example, if we understand it, we understand more than half of normalization theory needed in practice!
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.7Database System Concepts
A sample instanceA sample instance
Consider a sample instance of EGS:
• EGS:E G S A 1 B 2 A 1 C 1
By examining the above sample instance, we will note that the design suffers from various deficiencies.
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.8Database System Concepts
Deficiencies of the designDeficiencies of the design
Redundancy: The fact (or more technically, the constraint) that G = A implies S = 1 is stored twice in the relation.
General problems:
• Inability to store partial information: If the salary schedule is augmented by a new grade, G = D, with corresponding salary of S = 3, this fact cannot be stored in the relation, unless E has the value of null.
(But it is quite clear that in this relation E is the only possibly primary key,so we cannot have “several” tuples potentially with the value of null in this column.)
• More difficult to enforce consistent updating.
• Information may disappear: If employee leaves, we forget what salary is given to grade B.
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.9Database System Concepts
DecompositionDecomposition
The above problems can be solved by decomposing the relation EGS into two relations:
• R1(EG)=EG (R): SELECT E, G FROM R;
• R2(GS)=GS(R): SELECT G, S FROM R;
Obtaining,• R1:
E G A B A C
• R2:G SA 1B 2C 1
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.10Database System Concepts
Reconstructing the original relationReconstructing the original relation
How to reconstruct R from R1, R2? It is easy to see that it is enough to compute R = R1 R2 (the
natural join of R1 and R2, written using easily displayed fonts), that is:SELECT E, G, SFROM R1, R2WHERE R1.G = R2.G;
and indeed we obtain.• E G S
A 1 B 2 A 1 C 1
Will this be true of other decompositions?
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.11Database System Concepts
A second decompositionA second decomposition
Let us try decomposing EGS into EG and ES. We obtain:
• ES:E S
1 2 1 1
• EG:
E G A B A C
The natural join gives the original relation back: EG ES = EGS.
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.12Database System Concepts
A third decompositionA third decomposition
Let us attempt another “reasonable'' decomposition: EGS into ES and GS. We obtain:
• ES:E S 1 2 1 1
• GS:G SA 1B 2C 1
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.13Database System Concepts
The natural join of the “small” relationsThe natural join of the “small” relations
Let us compute the natural join of ES and GS. We obtain:
• E G S A 1 C 1 B 2 A 1 C 1 A 1 C 1
This relation is not equal to the original EGS relation
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.14Database System Concepts
Decompositions vs. reconstructionsDecompositions vs. reconstructions
By examining the 3 decompositions we observe that:
• Some decompositions allow us to reconstruct the original relation.
• Some decompositions do not allow us to reconstruct the original relation
• We want a decomposition that allows us to reconstruct the original relation (otherwise we get false facts).
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.15Database System Concepts
Lossless join decompositionsLossless join decompositions
We say that some decomposition of a relation R into relations R1, ..., Rm, where each Ri is the projection of R on some attributes is a lossless join decomposition, if and only if:
• Each attribute of R appears in some Ri
• R is the natural join of R1, ..., Rm
We will also use the term “valid'' for lossless join
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.16Database System Concepts
ExampleExample Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
Redundancy:• Data for branch-name, branch-city, assets are repeated for each loan that a
branch makes
• Wastes space
• Complicates updating, introducing possibility of inconsistency of assets value
Null values• Cannot store information about a branch if no loans exist
• Can use null values, but they are difficult to handle.
©Silberschatz, Korth and Sudarshan7.17Database System Concepts
DecompositionDecomposition
Decompose the relation schema Lending-schema into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number, branch-name, amount)
All attributes of an original schema (R) must appear in the decomposition (R1, R2):
R = R1 R2
Lossless-join decomposition.For all possible relations r on schema R
r = R1 (r) R2 (r)
A very important property: If R = R1 R2, always:
r R1 (r) R2 (r) Therefore lossless means: you do not gain spurious tuples by joining, which seems the only reasonable way to reconstruct the original relation.
©Silberschatz, Korth and Sudarshan7.18Database System Concepts
Example of Non Lossless-Join Decomposition Example of Non Lossless-Join Decomposition
Decomposition of R = (A, B)R2 = (A) R2 = (B)
A B
121
A
B
12
rA(r) B(r)
A (r) B (r)A B
1212
©Silberschatz, Korth and Sudarshan7.19Database System Concepts
Goal — Devise a Theory for the FollowingGoal — Devise a Theory for the Following
Decide whether a particular relation R is in “good” form.
In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that
• each relation is in good form
• the decomposition is a lossless-join decomposition
• And dependencies are preserved (more about this later)
Our theory is based on:
• functional dependencies
• multivalued dependencies (optional) Dennis doesn’t like these because he has never seen them hold in practice. But look up in the book if you like.
©Silberschatz, Korth and Sudarshan7.20Database System Concepts
Functional DependenciesFunctional Dependencies
Constraints on the set of legal relations.
Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.
A functional dependency is a generalization of the notion of a key.
©Silberschatz, Korth and Sudarshan7.21Database System Concepts
Functional Dependencies (Cont.)Functional Dependencies (Cont.) Let R be a relation schema
R and R
The functional dependency
holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[] = t2 [] t1[ ] = t2 [ ]
Example: Consider r(A,B) with the following instance of r.
On this instance, A B does NOT hold, but B A does hold.
1 41 53 7
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.22Database System Concepts
An exampleAn example
Although functional dependencies are properties of a schema, and not only of the individual instances, we will practice, and check if certain FDs are satisfied by some sample instance:• A B C D E F
a1 b1 c1 d1 e1 f1a2 b1 c1 d2 e2 f2a2 b2 c3 d3 e3 f3a1 b1 c1 d1 e1 f4a1 b2 c2 d2 e4 f5a2 b3 c3 d2 e5 f6
• A C No
• AB C Yes
• E CD Yes
• D B No
• F ABC Yes
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.23Database System Concepts
Relative power of some FDsRelative power of some FDs
An important note: A C is always at least as powerful as AB C
that is If a relation satisfies A C it must satisfy AB C
What we are really saying is that if C = f(A), then of course C = f(A,B)
If not obvious to you, then look at it this way. A C says that any two rows that agree on their A values also agree on their C values. AB C says that any two rows that agree on both their A and B values also agree on their C values. Well, if they agree on both their A and B values, then surely they agree on their A values, so if AC held, we would be done. The reverse does not hold.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.24Database System Concepts
Functional Dependencies (Cont.)Functional Dependencies (Cont.)
K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if
• K R, and
• for no proper subset K, R
Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:
Loan-info-schema = (customer-name, loan-number, branch-name, amount).
We expect this set of functional dependencies to hold:
loan-number amountloan-number branch-name
but would not expect the following to hold:
loan-number customer-name
©Silberschatz, Korth and Sudarshan7.25Database System Concepts
Use of Functional DependenciesUse of Functional Dependencies
We use functional dependencies to:
• test relations to see if they are legal under a given set of functional dependencies.
• If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.
• specify constraints on the set of legal relations
• We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.
Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name.
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.26Database System Concepts
Trivial FDsTrivial FDs
An FD X Y, where X and Y are sets of attributes is trivial
if and only if
YX.
(Such FD gives no constraints, as it is always satisfied.)
Example: AB A is trivial.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.27Database System Concepts
Decomposition and union of some FDsDecomposition and union of some FDs
An FD X A1 A2 ... Am, where Ai's are individual attributes
is equivalent to
the set of FDs: X A1, X A2, ..., X Am.
Example: ABC DE is equivalent to ABC D and ABC E.
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.28Database System Concepts
Logical implications of FDsLogical implications of FDs
It will be important to us to determine if a given set of FDs forces other FDs to be true (necessarily implies that additional FDs must hold).
Consider again the EGS relation. Which FDs are satisfied?
E G, G S, E S are all true in the real world.
If the real world tells you only:
E G and G S
Can you deduce on your own, without understanding the application thatE S?
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.29Database System Concepts
Logical implications of FDs (cont.)Logical implications of FDs (cont.)
Yes, by simple logical argument: transitivity.
Take any (set of) tuples that are equal on E. Then given E G we know that they are equal on G. Then given G S we know that they are equal on S. So we have shown that E S must hold
We say that E G, G S logically imply E S and we write E G, G S |= E S.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.30Database System Concepts
Logical implications of FDs (cont.)Logical implications of FDs (cont.)
If the real world tells you only:
• E G and E S,
Can you deduce on your own, without understanding the application that
• G S
No, because of a counterexample:
• E G S A 1 A 2
This relation satisfies E G and E S, but violates G S.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.31Database System Concepts
Closure of a Set of Functional Closure of a Set of Functional DependenciesDependencies
Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F.
• E.g. If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
©Silberschatz, Korth and Sudarshan7.32Database System Concepts
Closure of Attribute SetsClosure of Attribute Sets
Given a set of attributes define the closure of under F (denoted by +) as the set of attributes that are functionally determined by under F:
is in F+ +
Algorithm to compute +, the closure of under F (guilt by association approach)
result := ;while (changes to result) do
for each in F dobegin
if result then result := result end
©Silberschatz, Korth and Sudarshan7.33Database System Concepts
Example of Attribute Set ClosureExample of Attribute Set Closure R = (A, B, C, G, H, I) F = {A B
A C CG HCG IB H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key? 1. Is AG a super key?
1. Does AG R? Or equivalently: (AG)+ = R?
2. Is any subset of AG a superkey?
1. Does A R? or equivalently: A+ = R?
2. Does G R? or equivalently: G+ = R?
©Silberschatz, Korth and Sudarshan7.34Database System Concepts
!!Example Example (order of FDs slightly changed)(order of FDs slightly changed)
Let R = ABCDEFGHIJ Given FDs:
• F BG
• A DE
• B D
• C F
• I J Then
ABC+ = ABCDEFG
and ABC Z (for a set of attributes Z), if and only if Z ABCDEFG
Therefore:• ABC FC is true
• ABC IC is false
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.35Database System Concepts
EGS againEGS again
We can easily show that
• {E G, G S}+ contains E S
• {E G, E S}+ does not contain G S
So what we did in an ad-hoc manner previously can be converted to an algorithmic procedure.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.36Database System Concepts
Uses of Attribute ClosureUses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for superkey:
• To test if is a superkey, we compute +, and check if + contains all attributes of R.
Testing functional dependencies
• To check if a functional dependency holds (or, in other words, is in F+), just check if +.
• That is, we compute + by using attribute closure, and then check if it contains .
• Is a simple and cheap test, and very useful
Computing closure of F
• For each R, we find the closure +, and for each S +, we output a functional dependency S.
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.37Database System Concepts
Keys of relationsKeys of relations
The notion of an FD allows us to formally define keys.
Given R, satisfying a set of FDs, a set of attributes X of R is a key, if and only if:
• X+ = R.
• For every Y X such that Y X, we have Y+R.
Note that if R does not satisfy any (nontrivial) FDs, then R is the only key of R.
In our example E G, G S, E S
E was the only key of EGS.
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.41Database System Concepts
Review of EGSReview of EGS
Let us review the relation EGS. The “new relations” were:
• EG, with the key E and a non-trivial FD E G.
• GS, with the key G and a non-trivial FD G S.
• ES, with the key E and a non-trivial FD E S.
Each of those relations was “good,”
• I.e., there were no redundancies in any of them, each nontrivial FD contained a key on the left side
It turns out tha EG, GS is better than EG, ES as a decomposition. Can you see why?
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.47Database System Concepts
!!Third Normal FormThird Normal Form
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds: is trivial (i.e., )
is a superkey for R, that is check whether + = R
• Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
We now proceed to show how to obtain third normal form with a lossless join decomposition.
©Silberschatz, Korth and Sudarshan7.48Database System Concepts
Canonical CoverCanonical Cover
Sets of functional dependencies may have redundant dependencies that can be inferred from the others• Eg: A C is redundant in: {A B, B C, A C}
Parts of a functional dependency may be redundant• E.g. on RHS: {A B, B C, A CD} can be simplified to
{A B, B C, A D} Note, this is really a redundant dependency issue, as we can say that in {A B, B C, A C, A D} , A C is redundant
• E.g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D}
This is NOT redundant dependency issue, but unnecessarily weak description of dependencies
Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies
©Silberschatz, Korth and Sudarshan7.49Database System Concepts
Extraneous AttributesExtraneous Attributes Consider a set F of functional dependencies and the functional
dependency in F.• Attribute A is extraneous in if A
and F logically implies (F – { }) {( – A) }.
• Attribute A is extraneous in if A and the set of functional dependencies (F – { }) { ( – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one
Example: Given F = {A B, AB C }• B is extraneous in AB C because A C is implied by F
(equivalently C is in A+)
Example: Given F = {A C, AB CD}• C is extraneous in AB CD since AB C can be inferred even
after deleting C
©Silberschatz, Korth and Sudarshan7.50Database System Concepts
Testing if an Attribute is ExtraneousTesting if an Attribute is Extraneous
Consider a set F of functional dependencies and the functional dependency in F.
• To test if attribute A (left side) is extraneous in
1. compute the attribute closure ( {} – A)+ using the dependencies in F including ,
2. check that ( {} – A)+ contains ; if it does, A is extraneous
• To test if attribute A (right side) is extraneous in
1. compute + using only the dependencies in F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous
©Silberschatz, Korth and Sudarshan7.51Database System Concepts
Canonical Cover Algorithm (IMPORTANT)Canonical Cover Algorithm (IMPORTANT)
A canonical cover for F is a set of dependencies Fc such that
• F logically implies all dependencies in Fc, and
• Fc logically implies all dependencies in F, and
• No functional dependency in Fc contains an extraneous attribute, and
• Each left side of functional dependency in Fc is unique.
To compute a canonical cover for F:repeat
Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2
Find a functional dependency with an extraneous attribute either in or in
If an extraneous attribute is found, delete it from until F does not change
Note: Union rule need not be applied until the end. Dennis prefers delaying the union because it makes it easier to spot redundant relations. But the book uses this approach.
©Silberschatz, Korth and Sudarshan7.52Database System Concepts
Canonical Cover Algorithm (Dennis Preference)Canonical Cover Algorithm (Dennis Preference)
To compute a canonical cover for F:
Convert functional dependencies so right hand side of every functional dependency has just one attribute
repeatRemove redundant functional dependenciesFind extraneous attributes from the left side of functional
dependencies and remove themuntil F does not change
Apply the union rule.
You may use either algorithm. They are close but not the same.
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.53Database System Concepts
The EmToPrHoSkLoRo relationThe EmToPrHoSkLoRo relation
The relation deals with employees who use tools on projects and work a certain number of hours per week. An employee may work in various locations and has a variety of skills. All employees having a certain skill and working in a certain location meet in a specified room once a week.
The attributes of the relation are:
• Em: Employee
• To: Tool
• Pr: Project
• Ho: Hours per week
• Sk: Skill
• Lo: Location
• Ro: Room for meeting
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.54Database System Concepts
The FDs of the relationThe FDs of the relation
The relation satisfies the following FDs:
• Each employee uses a single tool: Em To
• Each employee works on a single project: Em Pr.
• Each tool can be used on a single project only: To Pr.
• An employee uses each tool for the same number of hours each week: EmTo Ho.
• All the employees working in a location having a certain skill always meet in the same room: SkLo Ro.
• Each room is in one location only: Ro Lo.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.55Database System Concepts
Our FDsOur FDs
1. Em To
2. Em Pr
3. To Pr
4. EmTo Ho
5. SkLo Ro
6. Ro Lo
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.56Database System Concepts
Run the algorithmRun the algorithm
No RHS can be combined, so we check whether there are any redundant attributes.
We start with FD 1.
• We check whether we can remove To from Em To. This is possible if we can derive Em To using
Em Pr
To Pr
EmTo Ho
SkLo Ro
Ro Lo
Computing the closure of Em using the above FDs gives us only EmPr, so the attribute To must be kept.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.57Database System Concepts
Run the algorithmRun the algorithm
• We check whether we can remove Pr (from FD2). This is possible if we can derive Em Pr using
Em To
To Pr
EmTo Ho
SkLo Ro
Ro Lo
Computing the closure of Em using the above FDs gives us EmToPrHo, so FD 2 can be removed..
©Zvi M. Kedem©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.58Database System Concepts
Run the algorithmRun the algorithm
We now have
1. Em To
2. To Pr
3. EmTo Ho
4. SkLo Ro
5. Ro Lo
The next one is 3.
• We check if Em can be removed from the left. This is possible if we can derive To Ho using all the FDs. Computing the closure of To using the FDs gives ToPr, and therefore Em cannot be removed
• We check if To can be removed. This is possible if we can derive Em Ho using all the FDs. Computing the closure of Em using the FDs gives EmToPrHo, and therefore To can be removed
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.59Database System Concepts
Run the algorithmRun the algorithm
We now have
1. Em To
2. To Pr
3. Em Ho
4. SkLo Ro
5. Ro Lo
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.60Database System Concepts
Run the algorithmRun the algorithm
We now attempt to remove an attribute from the LHS of 4, and an entire FD, but neither is possible. We try again, but we cannot change anything.
Therefore we are done To form the tables, simply combine the left hand and right hand
sides (after combining tables FDs having a common left hand side via the union rule). Then remove any relation whose attributes are subsets of some other relation.
In our example: EmToHo ToPr SkLoRo If any of these tables contains a superkey of all attributes, we
have a lossless join decomposition. That wasn’t the case so we add EmSkLo. (Check that attribute cover gives all attributes and this is minimal.)
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.61Database System Concepts
Why is it necessary to store a global keyWhy is it necessary to store a global key
Why is it necessary to store the global key?
Consider the relation: LnFnVaSa:
• Ln: Last Name
• Fn First Name
• Va Vacation days per year
• Sa Salary
The functional dependencies are:
• Ln Va
• Fn Sa
The key is
LnFn
The relation is not in 3NF
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.62Database System Concepts
Why is it necessary to store . . . (cont.)Why is it necessary to store . . . (cont.)
Our algorithm (without the key being included) will produce the decomposition
• LnVa
• FnSa
This is not a lossless-join decomposition
• In fact we do not know who the employees are (what are the valid pairs of LnFn)
So we decompose
• LnVa
• FnSa
• LnFn
©Zvi M. Kedem
04/18/23 08:39 PM ©Silberschatz, Korth and Sudarshan7.63Database System Concepts
How about EGSHow about EGS
Applying to algorithm to
1. E G
2. G S
3. E S
The only thing we can do is to find that in FD 3, attribute S is redundant, and therefore we get as canonical cover:
1. E G
2. G S
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.64Database System Concepts
Decomposition of EGSDecomposition of EGS
Applying the algorithm to EGS, we get our desired decomposition:
• EG
• GS
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.65Database System Concepts
Back to SQL DDLBack to SQL DDL
How are we going to express in SQL what we have learned?
We need to express:
• keys
• functional dependencies
Expressing keys is very easy, we use the PRIMARY KEY and UNIQUE keywords
Expressing functional dependencies is possible also by means of a CHECK condition
• What we need to say for the relation SkLoRo is that each tuple satisfies the following condition
There are no tuples in the relation with the same value of Ro and different values of Lo
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.66Database System Concepts
Back to SQL DDL (cont.)Back to SQL DDL (cont.)
CREATE TABLE SkLoRo(Sk …,Lo …,Ro…,PRIMARY KEY (Sk,Lo),CHECK (NOT EXISTS SELECT *
FROM SkLoRo AS copyWHERE (SkLoRo.Ro = copy.Ro
AND NOT SkLoRo.Lo = copy.Lo)); Clever but I’ve never seen this in practice.
©Zvi M. Kedem
©Silberschatz, Korth and Sudarshan7.67Database System Concepts
!!Design GoalsDesign Goals
Goal for a relational database design is:
• Third Normal Form.
• Lossless join.
• Dependency preservation.
©Silberschatz, Korth and Sudarshan7.68Database System Concepts
Denormalization for PerformanceDenormalization for Performance
May want to use non-normalized schema for performance
E.g. displaying customer-name along with account-number and balance requires join of account with depositor
Alternative 1: Use denormalized relation containing attributes of account as well as depositor with all above attributes
• faster lookup
• Extra space and extra execution time for updates
• extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view defined as account depositor
• Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors
©Silberschatz, Korth and Sudarshan7.69Database System Concepts
Other Design IssuesOther Design Issues
Some aspects of database design are not caught by normalization
Sometimes performance dictates combining several rows into one:
Instead of earnings(company-id, year, amount), use
• earnings-2000, earnings-2001, earnings-2002, etc., all on the schema (company-id, earnings).
• Above are in third normal form (in fact a slightly stronger optional form called BCNF), but make querying across years difficult and needs new table each year
• company-year(company-id, earnings-2000, earnings-2001, earnings-2002)
• Also in BCNF, but also makes querying across years difficult and requires new attribute each year.
• Is an example of a crosstab, where values for one attribute become column names
• Used in spreadsheets, and in data analysis tools
©Silberschatz, Korth and Sudarshan7.70Database System Concepts
Vertical PartitioningVertical Partitioning
Can you see any reason why
Acctbal(acctid, bal), acctaddress(acctid, street, city, zip)might be better thanacct(acctid, bal, street, city, zip)?
Both are normalized