1 Lecture 4: Relational algebra
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Lecture 4: Relational algebra
1
Lecture 4:Relational algebra
www.cl.cam.ac.uk/Teaching/current/Databases/
2
Today’s lecture
• What’s the (core) relational algebra?
• How can we write queries using the relational algebra?
• How powerful is the relational algebra?
3
Relational query languages
• Query languages allow the manipulation and retrieval of data from a database
• The relational model supports simple, powerful query languages– Strong formal foundation– Allows for much (provably correct)
optimisation
• NOTE: Query languages are not (necessarily) programming languages
4
Formal relational query languages
• Two formal query languages1. Relational Algebra
• Simple ‘operational’ model, useful for expressing execution plans
2. Relational Calculus• Logical model (‘declarative’), useful for theoretical results
• Both languages were introduced by Codd in a series of papers
• They have equivalent expressive power
They are the key to understanding SQL query processing!
5
Preliminaries
• A query is applied to relation instances, and the result of a query is also a relation instance– Schema of relations are fixed (cf. types)– The query will then execute over any valid
instance– The schema of the result can also be
determined
6
Example relation instances
• A database of boats, sailors, and reservationssid bid day
22 101 101001
99 103 111201
sid sname rating age
11 Sue 7 26
22 Tim 8 26
33 Bob 9 28
55 Kim 10 28
sid sname rating age
10 Myleene 6 23
22 Tim 8 26
99 Julia 100 20
88 Gavin 100 21
R1
S1
S2
bid colour
101 red
102 blue
103 green
B1
7
Core relational algebra
• Five basic operator classes:1. Selection
• Selects a subset of rows
2. Projection• Picking certain columns
3. Renaming • Renaming attributes
4. Set theoretic operations• The familiar operations: union, intersection, difference, …
5. Products and joins• Combining relations in useful ways
8
Selection
• Selects rows that satisfy a condition, written
R1 = c(R2)
• where c is a condition involving the attributes of R2, e.g.
rating>8(S2)
returns the relation instance
sid sname rating
age
99 Julia 100 20
88 Gavin 100 21
9
Selection cont.
• Note:1. The schema of the result is exactly the same
as the schema of the input relation instance
2. There are no duplicates in the resulting relation instance (why?)
3. The resulting relation instance can be used as the input for another relational algebra operator, e.g.
sname=“Julia”(rating>8(S2))
10
Projection
Deletes fields that are not in the
projection list
R1=A(R2)
where A is a list of attributes from the
schema of R2, e.g.
sname,rating(S2)
returns the relation instance
sname rating
Myleene 6
Tim 8
Julia 100
Gavin 100
11
Projection cont.
• Note:1. Projection operator has to eliminate
duplicates (why?)
2. Aside: Real systems don’t normally perform duplicate elimination unless the user explicitly asks for it (why not?)
12
Renaming
R1= A:=B(R2)
• Returns a relation instance identical to R2 except that field A is renamed B
• For example, sname:=nom(S1)
sid nom rating
age
11 Sue 7 26
22 Tim 8 26
33 Bob 9 28
55 Kim 10 28
13
Familiar set operations
• We have the familiar set-theoretic operators, e.g. , , -
• There is a restriction on their input relation instances: they must be union compatible– Same number of fields– Same field names and domains
• E.g. S1S2 is valid, but S1R1 is not
14
Cartesian products
AB
• Concatenate every row of A with every row of B
• What do we do if A and B have some field names in common?– Several choices, but we’ll simply assume that
the resulting duplicate field names will have the suffix 1 and 2
15
Example
S1R1
sid.1 sname rating age sid.2 bid day
11 Sue 7 26 22 101 101001
11 Sue 7 26 99 103 111201
22 Tim 8 26 22 101 101001
22 Tim 8 26 99 103 111201
33 Bob 9 28 22 101 101001
33 Bob 9 28 99 103 111201
55 Kim 10 28 22 101 101001
55 Kim 10 28 99 103 111201
Note!
16
Theta join
• Theoretically, it is a derived operator
R1 Vc R2 @ c(R1R2)
• E.g., S1 Vsid.1<=sid.2R1sid.1 sname rating age sid.2 bid day
11 Sue 7 26 22 101 101001
11 Sue 7 26 99 103 111201
22 Tim 8 26 22 101 101001
22 Tim 8 26 99 103 111201
33 Bob 9 28 99 103 111201
55 Kim 10 28 99 103 111201
17
Theta join cont.
1. The result schema is the same as for a cross-product
2. Sometimes this operator is called a conditional join
3. Most commonly the condition is an equality on field names, e.g. S1 Vsid.1=sid.2R1
18
Equi- and natural join
• Equi-join is a special case of theta join where the condition is equality of field names, e.g. S1 Vsid R1
• Natural join is an equi-join on all common fields where the duplicate fields are removed. It is written simply A V B
sid.1 sname rating age sid.2 bid day
22 Tim 8 26 22 101 101001
19
Natural join cont.
• Note that the common fields appear only once in the resulting relation instance
• This operator appears very frequently in real-life queries
• It is always implemented directly by the query engine (why?)
20
Division
• Not a primitive operator, but useful to express queries such as
Find sailors who have reserved all the boats• Consider the simple case, where relation A has
fields x and y, and relation B has field y• A/B is the set of xs (sailors) such that for every y
(boat) in B, there is a row (x,y) in A
21
Division cont.
• Can you code this up in the relational algebra?
22
Division cont.
• Can you code this up in the relational algebra?
x’s that are disqualified: x((x(A) B) – A)
Thus: x(A)-x((x(A) B) – A)
23
Example 1
Find names of sailors who’ve reserved boat 103
Solution 1: sname(bid=103(Reserves) V Sailors)
Solution 2: sname(bid=103(Reserves V Sailors))
Which is more efficient?
Queryoptimisatio
n
24
Example 2
Find names of sailors who’ve reserved a red boat
25
Example 2
Find names of sailors who’ve reserved a red boat
sname(colour=“red”(Boats) V Reserves V Sailors)
Better:sname(sid(bid(colour=“red”(Boats)) V Reserves) V Sailors)
26
Example 3
Find sailors who’ve reserved a red or a green boat
27
Example 3
Find sailors who’ve reserved a red or a green boat
let T = colour=“red”colour=“green”(Boats)
in sname(T V Reserves V Sailors)
28
Example 4
Find sailors who’ve reserved a red and a green boat
29
Example 4
Find sailors who’ve reserved a red and a green boat
let T1 = sid (colour=“red”(Boats) V Reserves)
T2 = sid (colour=“green”(Boats) V Reserves)
in sname((T1 T2) V Sailors)
NOTE: Can’t just trivially modify last solution!
30
Example 5
Find the names of sailors who’ve reserved at least two boats
let T = sid.1:=sid (sid.1,sname,bid (Sailors V Reserves))
in
sname.1 (sid.1=sid.2bid.1bid.2(T T))
31
Example 6
Find the names of sailors who’ve reserved all boats
let T = sid,bid (Reserves) / bid (Boats)
in sname(T V Sailors)
32
Computational limitations
• Suppose we have a relation SequelOf of movies and their immediate sequels
• We want to compute the relation ‘isFollowedBy’ …
movie sequel
Naked Gun Naked Gun 2½
Naked Gun 2½ Naked Gun 33 1/3
Rocky Rocky II
Rocky II Rocky III
Rocky III Rocky IV
Rocky IV Rocky V
33
Computational limitations
• We could compute
fst,thd(movie:=fst,sequel:=snd(SequelOf)
V movie:=snd,sequel:=thd(SequelOf))
• This provides us with sequels-of-sequels• We could write three joins to get sequels-of- sequels-of-
sequels and union the results• What about Friday the 13th (9 sequels)? • In general we need to be able to write an arbitrarily large
union…• The relational algebra needs to be extended to handle
these sorts of queries
34
Summary
You should now understand:
• The core relational algebra– Operations and semantics– Union compatibility
• Computational limitations of the relational algebra
Next lecture: Relational calculus