Relational Algebra & Calculus - UMass...
Transcript of Relational Algebra & Calculus - UMass...
1
Relational Algebra & Calculus
Yanlei Diao UMass Amherst
Slides Courtesy of R. Ramakrishnan and J. Gehrke
2
Outline
v Conceptual Design: ER model
v Logical Design: ER to relational model
v Querying and manipulating data
§ Practical language: SQL
• Declarative: say “what you want”, not “how to get it”
§ Formal languages: Relational Algebra and Calculus
• Mathematical foundation for query processing
3
What is an “Algebra”?
v A mathematical system consisting of: § Operands: variables or values from which new values
can be constructed. § Operators: symbols denoting procedures that construct
new values from given values.
4
What is “Relational Algebra”
v Relational Algebra: § Operands are relations. § Each operator takes 1 or 2 relations and produces a relation.
v Closure property: relational algebra is closed under the relational model. § Relational operators can be arbitrarily composed!
5
Relational Algebra v Basic operations:
§ Selection ( σ ) Selects a subset of rows from a relation.
§ Projection ( π ) Deletes unwanted columns from a relation.
§ Cross-product ( × ) Allows us to combine two relations.
§ Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.
§ Union ( ∪ ) Tuples in reln. 1 or in reln. 2.
v Additional operations: § Join ( ), Intersection ( ∩ ), Division ( / ), Renaming ( ρ ) § Can be derived from basic operators. Not essential, but
useful for writing simpler queries.
6
Example Instances
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1
S2
Sailors
Reserves
7
Projection
sname ratingyuppy 9lubber 8guppy 5rusty 10
π sname rating S, ( )2
v Retain only attributes in the projection list; delete others. v Schema of result contains exactly the fields in projection list.
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
8
Projection (contd.)
age35.055.5
πage S( )2
v Projection operator has to eliminate duplicates! § Real systems typically don’t do duplicate elimination
unless the user explicitly asks for it. (Why not?)
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
9
Selection
σ rating S>8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
v Select rows that satisfy the selection condition; discard others.
v Schema of result identical to schema of input.
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
10
Selection (contd.)
€
σ(rating > 5∨rating < 5)∧age < 50(S2)
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
11
A Simple Example of Composition
σ rating S>8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
π σsname rating rating S, ( ( ))>8 2
v Composition: result relation of an operator can be the input to another operator.
12
Union, Intersection, Set-Difference
v Set operations: § Union ( ∪ )
§ Intersection ( ∩ ) § Set difference ( - )
v Take two input relations, which must be union-compatible: § Same number of fields. § Corresponding fields have the
the same type.
v What is the schema of result?
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1
S2
13
Example Set Operations
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
S S1 2∪
Duplicate elimination: remove tuples that have same values in all attributes.
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1
S2
14
Example Set Operations
sid sname rating age31 lubber 8 55.558 rusty 10 35.0
S S1 2∩
sid sname rating age22 dustin 7 45.0
S S1 2−
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1
S2
Duplicates?
15
Cross (Cartesian) Product
v S1 × R1: Each row of S1 is paired with each row of R1.
sid bid day22 101 10/10/9658 103 11/12/96
R1 sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96
S1 × R1
16
Cross-Product (contd.)
v Result schema inherits all fields of S1 and R1. § Conflict: Both S1 and R1 have a field called sid.
ρ ( ( , ), )C sid sid S R1 1 5 2 1 1→ → ×
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96
• Renaming operator:
What if we want to find pairs s.t. S1.sid = S2.sid?
17
Join
v Condition (theta) Join:
§ Result schema same as that of cross-product. § But often fewer tuples, more efficient for computation.
€
RθS =σ
θ(R×S)
(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96
S RS sid R sid1 11 1 . .<
18
Join
v Equi-Join: A special case of condition join where the condition θ contains only equalities.
§ Result schema contains only one copy of fields for which equality is specified.
v Natural Join ( R wv S ): equi-join on all common fields.
sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96
11 RS sid
19
Self-Join
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1 sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S2
ρ(S1,S)×ρ(S2,S)
€
S1.rating>S2.rating∧S1.age<S2.age
20
Example Schema
v Sailors(sid: integer, sname: string, rating: integer, age: integer)
v Boats(bid: integer, color: string) v Reserves(sid: integer, bid: integer, day: date)
21
v Sailors(sid: integer, sname: string, rating: integer, age: integer)
v Boats(bid: integer, color: string) v Reserves(sid: integer, bid: integer, day: date)
Find names of sailors who’ve reserved boat #103
v How many relations do we need to access? § If more than 1, use cross-products or joins.
v Filtering condition? Use selection. v Attributes to return? Use projection.
22
Find names of sailors who’ve reserved boat #103
v Solution 1: π σsname bid serves Sailors(( Re ) )=103
v Solution 2: ρ σ( , Re )Temp servesbid1 103=
ρ ( , )Temp Temp Sailors2 1
π sname Temp( )2
v Solution 3: π σsname bid serves Sailors( (Re ))=103
Algebraic equivalence!
23
Find names of sailors who’ve reserved a red boat
v Boat color is only available in Boats; so need an extra join:
π σsname color red Boats serves Sailors(( ' ' ) Re )=
v Sailors(sid: integer, sname: string, rating: integer, age: integer)
v Boats(bid: integer, color: string) v Reserves(sid: integer, bid: integer, day: date)
24
Find names of sailors who’ve reserved a red or a green boat
v Sailors(sid: integer, sname: string, rating: integer, age: integer)
v Boats(bid: integer, color: string) v Reserves(sid: integer, bid: integer, day: date)
25
Find names of sailors who’ve reserved a red or a green boat
v Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
ρ σ( , ( ' ' ' ' ))Tempboats color red color green Boats= ∨ =
π sname Tempboats serves Sailors( Re )
v Can also define Tempboats using union. How?
26
Find names of sailors who’ve reserved a red boat and a green boat
v A single selection won’t work. Why? v Instead, intersect sailors who’ve reserved red boats and
sailors who’ve reserved green boats.
ρ π σ( , (( ' ' ) Re ))Tempred sid color red Boats serves=
π sname Tempred Tempgreen Sailors(( ) )∩
ρ π σ( , (( ' ' ) Re ))Tempgreen sid color green Boats serves=
sid is a key for Sailors
Can we project onto name and then do the intersection?
28
Relational Calculus
v Query has the form:
x x xn p x x xn1 2 1 2, ,..., | , ,...,!
"
###
$
%
&&&
'
()
*)
+
,)
-)
v Answer includes all tuples that
make the formula be true.
x x xn1 2, ,...,
p x x xn1 2, ,...,!
"
###
$
%
&&&
domain variables, or constants formula
29
Formulas
v Formula is recursively defined: § Atomic formulas: retrieving tuples from relations, or
making comparisons of values
§ Logical connectives: ¬, ∧, ∨ § Quantifiers: ∃, ∀ (optional)
o ∀x F(x) is equivalent to ¬∃x ¬F(x)
30
Find names of sailors rated > 7 who have reserved boat #103
))()Re(( 7103 Sailorsserves ratingbidsname >=σσπ
{Xsname | ∃ Xsid, Xrating, Xage Sailors(Xsid, Xsname, Xrating, Xage) ∧ Xrating > 7
∧ ∃ Xbid, Xday Reserves(Xsid, Xbid, Xday) ∧ Xbid=103 }
Relational Algebra:
Relational Calculus:
v Where is the join? § Use ∃ to find a tuple in Reserves that `joins with’ the Sailors tuple
under consideration.
31
Find names of sailors who’ve reserved all boats
v To find sailors who’ve reserved all red boats:
{Xsname | ∃ Xsid, Xrating, Xage 〈Xsid, Xsname, Xrating, Xage〉∈Sailors ∧ ∀ 〈Xbid, Xcolor〉∈Boats (∃ Xday 〈Xsid, Xbid, Xday〉 ∈ Reserves) }
{Xsname | ∃ Xsid, Xrating, Xage 〈Xsid, Xsname, Xrating, Xage〉∈Sailors ∧ ∀ 〈Xbid, Xcolor〉∈Boats (Xcolor≠ʹ′redʹ′ ∨ ∃ Xday 〈Xsid, Xbid, Xday〉 ∈ Reserves) }
Sailors(sid: integer, sname: string, rating: integer, age: integer) Boats(bid: integer, color: string) Reserves(sid: integer, bid: integer, day: date)
32
Unsafe Queries, Expressive Power
v Some syntactically correct calculus queries can have an infinite number of answers! Such queries are called unsafe. § e.g.,
v Equivalence result: every query that can be expressed in relational algebra can be expressed as a safe query in relational calculus; the converse is also true.
v Query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus.
S S Sailors| ¬ ∈"
#
$$
%
&
''
(
)*
+*
,
-*
.*
33
Find names of sailors who’ve reserved all boats
{Xsname | ∃ Xsid, Xrating, Xage 〈Xsid, Xsname, Xrating, Xage〉∈Sailors ∧ ∀ 〈Xbid, Xcolor〉∈Boats (∃ Xday 〈Xsid, Xbid, Xday〉 ∈ Reserves) }
{Xsname | ∃ Xsid, Xrating, Xage 〈Xsid, Xsname, Xrating, Xage〉∈Sailors ∧ ¬∃ 〈Xbid, Xcolor〉∈Boats (¬∃ Xday 〈Xsid, Xbid, Xday〉 ∈ Reserves) }
34
Find the names of sailors who’ve reserved all boats
v Step 1: for each sailor, check if there exists a boat that he has not reserved (called formula F).
€
ρ (S _ neg, πsid ( (π sid Re serves) × (πbid Boats) − (π sid,bidRe serves)))
€
π sname ( (π sidRe serves − S _ neg) Sailors )
v Step 2: find sailors for which F is not true and retrieve their names
‘-’: the only way to express negation in relational algebra!