Post on 22-Jan-2016
description
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
1
C20.0046: Database Management SystemsLecture #8
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2005
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
2
Roadmap Want to remove redundancy/anomalies
Convert to BCNF
Find FDs – closure alg Check if each FD AB is ok
If A contains a key
If not, decompose into R1(A,B), R2(A,rest) Because AB, this will be lossless
Could check by joining R1 and R2 Would get no rows not in original
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
3
Normal Forms First Normal Form = all attributes are atomic
As opposed to set-valued Assumed all along
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
4
Chappaqua789Bill
DC456Hilary
NY123Michael
Mailing-addressSSNName
Decomposition example
The anomalies are gone No more redundant data Easy to for Bill to move Okay for Bill to lose all phones
Break the relation into two:
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Hilary 456 DC 202-222-2222
Hilary 456 DC 914-222-2222
Bill 789 Chappaqua 914-222-2222
Bill 789 Chappaqua 212-333-3333
212-333-3333789
914-222-2222789
914-222-2222456
202-222-2222456
917-111-1111123
212-111-1111123
PhoneSSN
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
5
Boyce-Codd Normal Form Name/phone example is not BCNF:
{ssn,phone} is key FD: ssn name,mailing-address holds
Violates BCNF: ssn is not a superkey
Its decomposition is BCNF Only superkeys anything else
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Name SSN Mailing-address
Michael 123 NY
SSN PhoneNumber
123 212-111-1111
123 917-111-1111
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
6
BCNF motivation Two big ideas:
Only a key field can determine other fields Key values are unique no FD-caused redundancy
Slogan: “Every FD must contain the key, the whole key and nothing but the key.”
More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
7
Design examples Consider situation:
Entities: Parts, Suppliers, Departments Relship: Contracts(P,S,D,id,quant)
Draw E/R
New rule: no department can buy multiple parts from the same supplier (why?)
Translate to FD Normalize
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
8
Design examples Consider situation:
Entities: Emp(ssn,name,lot), Dept(id,name,budg) Relship: Works(E,D,since)
Draw E/R
New info: in each dept, everyone parks in same lot
Translate to FD Normalize
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
9
BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:
Title Year Studio Studio President President Pres-Address Studio President, Pres-Address (why?)
No many-many this time Problem cause: transitive FDs:
Title,year studio president
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
10
BCNF Decomposition Illegal: As Bs, where As don’t include key Decompose: Studio President, Pres-Address
As = {studio} Bs = {president, pres-address} Cs = {title, year}
Result:1. Studios(studio, president, pres-address)2. Movies(studio, title, year)
Is (2) in BCNF? Is in (1) BCNF? Key: Studio FD: President Pres-Address Q: Does president studio? If so, president is a key But if not, it violates BCNF
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
11
BCNF Decomposition Studios(studio, president, pres-address) Illegal: As Bs, where As don’t include key Decompose: President Pres-Address
As = {president} Bs = {pres-address} Cs = {studio}
{Studio, President, Pres-Address} becomes {President, Pres-Address} {Studio, President}
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
12
Decomposition algorithm example R(N,O,R,P) F = {N O, O R, R N}
Key: N,P Violations of BCNF: N O, OR, N OR
which kinds of violations are these? Pick N OR (on board) Can we rejoin? (on board) What happens if we pick N O instead? Can we rejoin? (on board)
Name Office Residence Phone
George Pres. WH 202-…
George Pres. WH 486-…
Dick VP NO 202-…
Dick VP NO 307-…
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
13
BCNF and two-att relations Must a two-attribute relation be in BCNF?
Case 1: there are no non-trivial FDs Case 2: A B but not B A Case 3: B A but not A B Case 4: Both A B and B A
Note that relations may have multiple keys BCNF requires a key on the left, not all keys
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
14
Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A B (assume C!A) Key: A,C
Also goes through if CA BCNF violation: no key on the left
Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C)
Could this be lossy? We will join R1 and R2 on A to find out
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
15
Lossless BCNF decomposition Suppose R contains the rows: (b,a,c) and (b’,a,c’) In projection onto (B,A):
(b,a,c) (b,a), (b’,a,c’) (b’,a) In projection onto (A,C):
(b,a,c) (a,c), (b’,a,c’) (a,c’) In joining, (b’,a) and (a,c) become (b’,a,c), and (b,a) and (a,c’) become (b,a,c’)
Q: Is/must/can this be correct? A: Yes! A B, so b = b’
So this was lossless We assumed C!A, but argument also goes
through when CA Moral: BCNF decomp alg really is lossless
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
16
BCNF summary BCNF decomposition is lossless
Can reproduce original by joining Saw: Every 2-attribute relation is in BCNF Final set of decomposed relations might be
different depends on order of bad FDs chosen
Saw: But all results will be in BCNF
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
17
A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:
Title,N’hood Theater Assume a movie shouldn’t play twice in same
neighborhood Theater N’hood
Keys: {Title, N’hood} {Theater, Title}
Title Theater N’hood
Aviator Angelica Village
Life Aquatic Angelica Village
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
18
A problem with BCNF BCNF violation: Theater N’hood Decompose:
{Theater, N’Hood} {Theater, Title}
Resulting relations:
VillageAngelica
N’hoodTheater
R1
Life AquaticAngelica
AviatorAngelica
TitleTheater
R2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
19
Problem - continued Suppose we add new rows to R1 and R2:
Their join:
Life AquaticVillageFilm Forum
Village
Village
N’hood
Aviator
Life Aquatic
Title
Angelica
Angelica
Theater
(R’)
Theater N’hood
Angelica Village
Film Forum Village
Theater Title
Angelica Life Aquatic
Angelica Aviator
Film Forum Life Aquatic
R1 R2
A and B could not enforce FD Title,N’hood Theater
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
20
Third normal form: motivation There are some situations in which
BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is
important In these cases BCNF is too severe a req.
Solution: define a weaker normal form, called Third Normal Form in which FDs can be checked on individual relations
without performing a join (no inter-relational FDs) to which relations can be converted, preserving both
data and FDs
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
21
Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form
Turns out: this example was already in 3NF
A relation R is in 3rd normal form if :
For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime
A relation R is in 3rd normal form if :
For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime
Tradeoff:BCNF = no FD anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
22
BCNF: vices and virtues Be clear on the problem just described v. the
arg. that BCNF decomp is data-lossless
BCNF decomp does not lose data Resulting relations can be rejoined to obtain the
original
But: it can can lose dependencies After decomp, now legal to add rows whose
corresponding rows would be illegal in (rejoined) original
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
23
Recap: goals of normalization When we decompose a relation R with FDs F into
R1..Rn we want:
1. lossless-join decomposition – no data lost
2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF
3. Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri:
F is the “sum” of the FDs of the new relations (F1 F2 F3 … Fn)+ = F+
Otherwise checking updates for violation of FDs may require computing joins, which is expensive
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
24
Dependency preservation Saw that last req. didn’t hold in move-theater
example Did it hold in R(N,O,R,P) example?
(on board)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
25
Testing for 3NF For each dependency X Y, use attribute closure
to check if X is a superkey If X is not a superkey, verify that each attribute in Y
is prime This test is rather more expensive, since it involves finding
candidate keys Testing for 3NF is NP-complete Interestingly, decomposition into 3NF can be done in
polynomial time Testing for 3NF is harder than decomposing into 3NF!
Optimization: need to check only FDs in F, need not check all FDs in F+ (why?)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
26
3NF Example R = (J, K, L) F = (JK L, L K) Two candidate keys: JK and JL R is in 3NF
JK L JK is a superkey L K K is prime
BCNF decomposition yields R1 = (L,K), R2 = (L,J)
testing for JK L requires a join There is some redundancy in R
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
27
BCNF and 3NF Comparison Example of problems due to redundancy in 3NF
R = (J, K, L) F = (JK L, L K)
A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the
relationship between l2 and k2 when there is no corresponding value for attribute J
J K L
j1 k1 l1
j2 k1 l1
j3 k1 l1
NULL k2 l2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
28
Comparison of BCNF and 3NF It is always possible to decompose a relation
into relations in 3NF such that: the decomposition is lossless the dependencies are preserved
It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve
dependencies But may eliminate more redundancy
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
29
The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: no longer used 3NF: for each FD X Y either
it is trivial, or X is a superkey, or Y is a part of some key
BCNF: 3NF and third 3NF option disallowed
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
30
Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing-
address,Phone) Key: SSN,Phone Partial: ssn name, address
3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood Theater Prime-on-right: Theater N’hood
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
31
Design Goals Goal for a relational database design is:
No redundancy Lossless Join Dependency Preservation
If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve
dependencies data redundancy due to use of 3NF
Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
32
3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)
student, subject teacher (students not allowed in the same subject with two teachers)
teacher subject (each teacher teaches one subject) Subject is prime, so this is 3NF
But we have anomalies: Insertion: cannot insert a teacher until we have a
student taking his subject If we convert to BCNF, we lost student,
subject teacher
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
33
BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:
1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V
What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another
(BCNF) relation SUBJECT-TAUGHT (teacher, subject)
Still (more!) redundancy, but no more insert and delete anomalies
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
34
Normalization Review Q: What’s required for BCNF?
Q: How do we fix a non-BCNF relation?
Q: What’s the loophole for 3NF?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
35
Normalization Review Q: If AsBs violates BCNF, what do we do?
Q: In this case, could the decomposition be lossy?
Q: How do we combine two relations?
Q: Can BCNF decomp. lose FDs?
Q: Can 3NF decomp. lose FDs?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
36
New topic: MVDs Consider this relation
People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent
Chappaqua333 Some StreetFirst Lady456Hilary
Washington444 Embassy RowFirst Lady456Hilary
New York111 East 60th StreetCEO123Michael
London222 Brompton RoadCEO123Michael
444 Embassy Row
333 Some Street
444 Embassy Row
333 Some Street
222 Brompton Road
111 East 60th Street
Streets
Lawyer
Lawyer
Senator
Senator
Mayor
Mayor
Jobs
Washington456Hilary
Chappaqua789Hilary
Washington789Hilary
Chappaqua456Hilary
London123Michael
New York123Michael
CitysSSNName
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
37
Redundancy in BCNF
Lots of redundancy! Key? All fields
None determined by others! Non-trivial FDs? None! In BCNF? Yes!
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
Now what? New concept, leading
to another normal form: Multivalued
dependencies
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
38
As Bs if, when As are held fixedvalues in Bs are independent of
values in rest
More precisely: if t1 and t3 agree on As, we then can find t2 such that
t2, t2, t3 agree on As
t2, t1 agree of Bs
t2, t3 agree on Cs
MVD definition
As Bs Cst1
As Bs Cst2
As Bs Cst3
| |
| |
| |
| |
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
39
MVD example Claim: name streets,cities If true: can pick arbitrary t1, t3 and find a t2
We pick: first and last of Hilary’s tuples:
Now: if true, can find another Hilary row with street/address of t1 and job of t3
LawyerWashington444 Embassy RowHilary
JobsCitysStreetsName
SenatorChappaqua333 Some StreetHilaryt1
t3
LawyerChappaqua333 Some StreetHilaryt2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
40
MVD example Now: if true, can find another Hilary row with
street/address of t1 and job of t3
Sure enough:
Hilary 333 Some Street Chappaqua Lawyert2
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
t2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
41
MVD rules No splitting rule:
In the example, name streets,cities Do we have name streets?
No: 444 Embassy Row doesn’t go with Chappaqua
NB: City doesn’t determine street – could have >1 house But city, street aren’t independent
Name Streets Citys Jobs
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Lawyer
t1
t3
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
42
MVD rules Trivial dependencies:
As Bs iff As BsAi
Transitive rule: As Bs, Bs Cs As Cs
Complementation rule: As Bs As rest Intuition: if each value in Bs is assoc’ed w/each value in
rest, then each value of rest is assoc’ed w/each value in BsName Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
43
MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As Bs
Pick t1, t3 that agree on As.
Must find a t2. Let t2 be t3.
Then1) t2 agrees on As with both
2) t2 agrees on Bs with t1 (why?)
3) t2 agrees on rest with t3 (why?)
QED
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
44
Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As Bs is nontrivial if
No Bs are As Some attributes left over (why?)
4NF: for every nontrivial MVD
As Bs, As is a superkey In example name streets,cities, but
name isn’t a superkeyName Streets Citys Jobs
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Lawyer
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
45
Decomposition to 4NF Again, analogous to BCNF If we can find As Bs for R where As isn’t
a superkey, replace R with R1(As,Bs) and R2(As,rest)
Running example: name streets,cities People(name,streets,cities,jobs) becomes
Residences(name,street,city) and Employment(name,job)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
46
4NF: another construal In nontrivial As Bs, As must be superkey After df of 4NF, text says: “That is, … every
nontrivial MVD is really a FD with a superkey on the left” (p123).
We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey As everything As Bs the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones
* The typo swapping these was fixed.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
47
Summary of normal forms
Guaranteed to 3NF BCFN 4NF
Eliminate FD redundancy
Mostly Yes Yes
Eliminate MVD redundancy
No No Yes
Preserve FDs Yes No No
Preserve MVDs No No No
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
48
Next topic: relational algebra Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
49
What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions
Operations Operands: Variables, Constants, expressions
Expressions: Vars & constants Operators applied to expressions
Algebra Vars/consts Operators
High-school Numbers + * - / etc.
Relational Relations (=sets of tupes)
union, intersection, join, etc.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
50
Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the
take The relations these exprs cash out to are the
answers to our questions First proof of RDBMS/RA concept: System R
(1979) Modern implementation of RA: SQL
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
51
Relation operators Five basic operators:
Union: Intersection: Difference: - Selection: Projection: Cartesian Product:
Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
52
Operators Relations are sets have set-theoretic ops
Venn diagrams
Union: R1 R2 Example:
ActiveEmployees RetiredEmployees
Difference: R1 – R2 Example:
AllEmployees – RetiredEmployees = ActiveEmployees
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
53
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Ford 345 Palm M 7/7/77
R S:
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
54
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R - S: Name Address Gender Birthdate
Hamill 456 Oak M 8/8/88
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
55
Operators Intersection: R1 R2 Example:
UnionizedEmployees RetiredEmployees
Intersection can be derived from and – R1 R2 = R1 – (R1 – R2) R1 R2 = -(-R1 -R2) (allowed?)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
56
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R S: Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
57
Operators Selection Selects all tuples satisfying a condition Notation: c(R)
Examples salary > 100000(Employee) name = “Smith”(Employee)
The condition c can have comparison ops:=, <, , >, , <> boolean ops: and, or
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
58
Selection example
Select the movies at Angelica: Theater=“Angelica”(Showings)
City of GodVillageFilm Forum
Village
Village
N’hood
Fog of War
City of God
Title
Angelica
Angelica
Theater
Village
Village
N’hood
Fog of War
City of God
Title
Angelica
Angelica
Theater
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
59
Operators Projection: op we used for decomposition
Eliminates columns, then removes duplicates
Notation: A1,…,An(R)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
60
Operators Cartesian Product
Cross product Each tuple in R1 combines w/each tuple in R2
Notation: R1 R2
If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A
Fairly rare in practice used to express joins
Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how
large is R1 x R2?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
61
Cartesian product example
Street City
333 Some Street Chappaqua
444 Embassy Row Washington
333 Some Street Chappaqua
Hillary-addresses
Job
Senator
First Lady
Lawyer
Hillary-jobs
Street City Job
333 Some Street Chappaqua Senator
444 Embassy Row Washington Senator
333 Some Street Chappaqua First Lady
444 Embassy Row Washington First Lady
333 Some Street Chappaqua Lawyer
444 Embassy Row Washington Lawyer
Hillary-addresses x Hillary-jobs
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
62
Operators Natural join: our join up to now
But always merging shared attributes Notation: R1 ⋈ R2 Meaning:
R1 ⋈ R2 = every att once(shared atts =(R1 R2)) I.e., first compute the cross product R1 x R2
Next, select the rows in which shared fields agree
Finally, project onto the union of R1 and R2’s fields (remove duplicates)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
63
Natural join example
Name Street City
Hilary 333 Some Street Chappaqua
Hilary 444 Embassy Row Washington
Hilary 333 Some Street Chappaqua
Addresses
Name Job
Hilary Senator
Hilary First Lady
Hilary Lawyer
Jobs
Addresses ⋈ JobsName Street City Job
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
64
Natural Join R S
R ⋈ S= ?
Unpaired tuples called dangling
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
65
Natural Join Given the schemas R(A, B, C, D), S(A, C, E),
what is the schema of R ⋈ S ?
Given R(A, B, C), S(D, E), what is R ⋈ S?
Given R(A, B), S(A, B), what is R ⋈ S?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
66
Theta Join Like natural join, but
includes only rows that satisfy arbitrary condition Does not project away shared attributes
R1 ⋈ R2 = (R1 R2)
Here can be any condition If condition is always satisfies, then theta join
becomes natural join
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
67
Theta-join exampleA B C
1 2 3
6 7 8
9 7 8
B C D
2 3 4
2 3 5
7 8 10
A U.B U.C V.B V.C D
1 2 3 2 3 4
1 2 3 2 3 5
1 2 3 7 8 10
6 7 8 7 8 10
9 7 8 7 8 10
U V
U V
A<D
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
68
Equijoin A theta join where is an equality R1 ⋈A=B R2 = A=B(R1 R2) = lower-case sigma Example:
Employee ⋈SSN=SSN Dependents
Most useful join in practice
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
69
Semijoin R ⋉ S = {atts of R}(R ⋈ S) Q: What does this mean?
Natural join of R and S; Then project onto R’s atts
A: The rows of R for which >1 row in S agree on shared atts
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
70
Semijoin example
SSN Name
. . . . . .
DSSN Dname SSN
. . . . . .
EmployeeDependents
network
Employee ⋉ Dependents =
{employees who have dependents}
Employee ⋉ Dependents =
{employees who have dependents}
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
71
Renaming Changes the schema, not the instance Notation: B1,…,Bn(R) is spelled “rho”, pronounced “row” Example:
Employee(ssn,name) social, name)(Employee)
Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
72
Complex RA Expressions Q: How long was Star Wars (1977)?
Strategy: find the row with Star Wars; then project the length field
Title Year Length inColor Studio Prdcr#
Star Wars 1977 124 True Fox 12345
M.Ducks 1991 104 True Disney 67890
W.World 1992 95 True Paramount 99999
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
73
Combining operations Schema: Movies (Title, year, length, filmType, studioName)
Query: select titles and years of movies by Fox that are at least 100 minutes long.
Title Year Length Filmtype StudioStar wars 1977 124 Color Fox
Mighty ducks 1991 104 Color Disney
Wayne’s world 1992 85 Color Paramount
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
74
Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names Clients.name(Reps.name=George(Reps.ssn=rssn(
Reps x Clients))) Or: Clients.name(Reps.name=George and Reps.ssn=rssn(Reps x
Clients)) Or: Clients.name(Reps.name=George(Reps x Clients)
Reps.ssn=rssn(Reps x Clients))
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
75
For next time Finish chapter 5 Come to office hours!
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
76
BCNF Review Q: What’s required for BCNF?
Q: What’s the slogan for BCNF?
Q: Who are B & C?
Q: What are the two types of violations?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
77
BCNF Review Q: How do we fix a non-BCNF relation?
Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy?
Q: Under what circumstances could a decomposition be lossy?
Q: How do we combine two relations?