Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and...
Transcript of Relational Database Systems 1 · 2018. 10. 24. · relational algebra. Key to understanding SQL and...
Topics:The Relational Algebra
1
Relational
Database Systems 1
DR. Eng. Ramez Alkhatib
Relational Data Manipulation Languages
• Variety languages used by relational database management systems
– Procedural languages: The user tells the system how to manipulatethe data, e.g. Relational Algebra
– Declarative languages: the user states what data is needed but notexactly how it is to be located, e.g. Relational Calculus and SQL
– Graphical languages: allowing the user to give an example or anillustration of what data should be found, e.g. QBE
2
𝝈 𝝅 ⋈
Queries and query languages
• Query: A question about the data in a database.
• Query: A statement requesting the retrieval of information from adatabase.
• Example:
• Query language: language in which queries are expressed.
• Query languages versus programming languages!
– Query languages are not intended to be used for complex calculations.
– Query languages support easy and efficient access to large data sets.
3
Find the names of students who are taking DB1
Relational Algebra
• Relational Algebra is a set of 6 operators that act on tables toproduce tables.
• Just as we operate numbers with arithmetic, we operate on tables withrelational algebra.
• Key to understanding SQL and query processing and optimization.
– SQL is, roughly speaking, a generalization of relational algebra.
– Internal languages: an SQL query is rewritten as relational algebraexpression, which can in turn be rewritten into a more efficient formand evaluated using a bunch of well developed algorithms.
4
Relational Algebra
• A set of operations (functions), each of which takes a relation (orrelations) as input and produces a relation as output.
• Basic operations: using these we can build up sophisticated databasequeries.
– Projection
– Selection
– Union
– Difference
– Product
– Renaming
• Additional operations: Intersection, Join, Division.
5
Preliminaries
• Review: a relational database is a collection of data.
• A query is applied to relation instances and the result of a query is alsoa relation instance.
– Schemas of input relations for a query are fixed. The query will runregardless of instances.
– The schema for a result of a given query is also fixed. It will bedetermined by the query.
• Example schemas:
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer, teacher:string)
Enroll(sid:int, cid:string, grade:string)
-------------------
6
Example Instances
Student Enroll
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
sid cid grade
1 501 A
2 502 A
3 501 C
3 502 B
Course
cid cname credits teacher
501 db 6 slim
502 tc 6 haytham
7
Projection
• Given a list of column names A and a relation R.
• πA(R): extracts the columns in A from the relation R.
• Example:
Student πsid,gpa(Student)
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
sid gpa
1 1.0
2 2.3
3 0.7
• The result of the projection can be visualized as a vertical partition ofthe relation into two relations.
• Questions:
– What is the schema of the result? Recall Student has schema
Student(sid:int, sname:string, gpa:real)
– What is the query (in English)?
8
Projection - continued
• Suppose the result of πA(R) has duplicate values.
• Example:
Student πgpa(Student)
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
gpa
1.0
2.3
0.7
• In relational algebra, the answer is always a set (has to eliminateduplicates).
• However, SQL and some other languages return, by default, a bag (don’teliminate duplicates)
9
Selection
• Given a condition C and a relation R.
• σC(R): extracts those rows from the relation R that satisfy C.
• Example:
Student σgpa≤2.0(Student)
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
sid sname gpa
1 Dina 1.0
3 Maria 0.7
4 Ali 1.0
• The result of the selection can be visualized as a horizontal partition ofthe relation into two sets of tuples.
• Questions:
– What is the schema of the result? Recall Student has schemaStudent(sid:int, sname:string, gpa:real)
– What is the query (in English)?10
Selection – What can go into the condition?
• Condition C in σC(R) is built up from
– Boolean operations on the field names: <,≤,=, 6=,≥, >.Example: gpa ≤ 2.0, sname = Ali.
– Predicates constructed from these using ∧ (and) , ∨ (or) , ¬ (not).
• Question: What is the result of σgpa≤2.0 ∧ sname = Ali(Student):
Student
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
σgpa≤2.0 ∧ sname = Ali(Student)
sid sname gpa
4 Ali 1.0
11
Set Operations
• Set operations: S ∪ T, S − T, S ∩ T– Union (S ∪ T ): a relation that includes all tuples that are either in S
or in T or in both S and T . Duplicate tuples are eliminated.
– Intersection (S ∩ T ): a relation that includes all tuples that are inboth S and T .
– Difference (S − T ): a relation that includes all tuples that are in S
but not in T .
• Condition: All these operations must be union-compatible:
–
• Question:
– Recall Student and Course given above. Can we writeStudent ∪Course?
– What is the schema of the result of a set operation?
12
(i.e., they must consist of the same attributes)
Set Operations – Union
Student1 Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
5 Amira 1.0
Student1 ∪ Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
5 Amira 1.0
13
Set Operations – Intersection
Student1 Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
5 Amira 1.0
Student1 ∩ Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
14
Set Operations – Difference
Student1 Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
5 Amira 1.0
Student1− Student2
sid sname gpa
4 Ali 1.0
Student2− Student1
sid sname gpa
5 Amira 1.0
15
Set Operations – Intersection
• In relational algebra, basic set operations are union and set differenceonly.
• We can implement the other set operations using those basic operations.
• For example, for any relations S and T , we can already express S ∩ T
S ∩ T = S − (S − T )
• It is mathematically nice to have fewer operators, however operations likeset difference may be less efficient than intersection.
16
Product
• Product S × T connects two relations S and T that are not necessarilyunion-compatible.
Student Course
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
cid cname credits teacher
501 db 6 slim
502 tc 6 haytham
Student×Course
sid sname gpa cid cname credits teacher
1 Dina 1.0 501 db 6 slim
2 Ahmed 2.3 501 db 6 slim
3 Maria 0.7 501 db 6 slim
1 Dina 1.0 502 tc 6 haytham
2 Ahmed 2.3 502 tc 6 haytham
3 Maria 0.7 502 tc 6 haytham
17
Cartesian Product S × T
• Each row of S is paired with each row of T .
• Schema of the result has one field per field of S and T .
• Example: The schema of Student×Course
(sid, sname, gpa, cid, cname, credits, teacher)
• Question:
– What is the primary key of S × T in general?Answer: Primary key of S and primary key of T .
– Cardinality: Suppose that S has n rows and T has m rows. What isthe cardinality of S × T?Answer: n×m
18
Product - continued
• What happens when we form a product of two relations with columnshaving the same name?
Student Enroll
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
sid cid grade
1 501 A
2 502 A
• May vary among systems: Common answer is to suffix the attributenames with 1 and 2:
Student×Enroll
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
2 Ahmed 2.3 1 501 A
3 Maria 0.7 1 501 A
1 Dina 1.0 2 502 A
. . . . . . . . . . . . . . . . . .19
Product - continued
• Products are hardly used alone; they are typically used in conjuction witha selection.
• Example: σsid:1 = sid:2 ∧ cid = 501(Student×Enroll)
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
• What does this query do (in English)?
• Suppose we want to find the names and grades of students who are taking501. How to write the query?
πsname, grade(σsid:1 = sid:2 ∧ cid = 501(Student×Enroll))
sname grade
Dina A
20
Joins – Conditional Join
• The combination of a selection and a join is so common that it has aspecial symbol and name.
S 1C T is defined to be σC(S × T )
• Example: Student 1sid:1 = sid:2 Enroll is
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
2 Ahmed 2.3 2 502 A
• Questions:
– What is the result schema?Assume that S(A1, . . . , An) and T (B1, . . . , Bm), the join S 1C T
results a relation with the attributes (A1, . . . , An, B1, . . . , Bm)
– Conditional join is in general more efficient than cross product. Why?
• The condition C in a conditional join is usually an equality or conjunctionof equalities (EquiJoin).
21
Natural Join
• S 1 T : special case of conditional join, equality on common fields of Sand T
– Equality condition only
– On all common fields
– Leave only one copy of these fields in the resulting relation.
• Example:
S T S 1 T
A B C
1 2 a
1 2 b
1 3 c
2 1 g
A B D
1 2 d
1 2 e
1 4 d
A B C D
1 2 a d
1 2 a e
1 2 b d
1 2 b e
• Question: What if S and T have no fields in common?
Answer: Cartesian Product
22
Natural Join – Example
Student Enroll
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
sid cid grade
1 501 A
2 502 A
Student 1 Enroll
sid sname gpa cid grade
1 Dina 1.0 501 A
2 Ahmed 2.3 502 A
23
Queries – Example (I)
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer, teacher:string)
Enroll(sid:int, cid:string, grade:string)
1. Find the names of the students:
πsname(Student)
2. Find the courses taught by Slim
σteacher = Slim(Course)
3. Find the titles of courses taught by Slim
πcname(σteacher = Slim(Course))
• These queries involve a single relation: Unary operations
• The result of a query is also a relation and therefore can be used as inputof another query.
24
Queries – Example (II)
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer, teacher:string)
Enroll(sid:int, cid:string, grade:string)
Find the names of students who are taking 501.
• Two relations: use (natural) join or product
• Fields: projection
• Condition: selection
Solutions:
• πsname(σcid = 501(Student 1 Enroll))
• πsname(Student 1 σcid = 501(Enroll))
25
Renaming – Another Operator
• It is simpler to break down a complex sequence of operations byspecifying intermediate result relations.
• Example:
πsname(Student 1 σcid = 501(Enroll))
is equivalent to
– temp1← σcid = 501(Enroll)
– temp2← Student 1 temp1
– Result← πsname(temp2)
• The same technique can be used to rename the attributes in theintermediate and result relations.
• Example:
R(firstName)← πsname(temp2)
26
Renaming – Another Operator
• General Rename operation when applied to a relation R of degree n isdenoted by one of the following three forms:
– ρS(B1,...,Bn)(R): renames both the relation and the attributes
– ρS(R): renames the relation only
– ρ(B1,...,Bn)(R): renames the attributes only
• ρ denotes the rename operator
• S the new relation name
• B1, . . . , Bn the new attribute names
• If the attributes of R are A1, . . . , An in that order, then each Ai isrenamed as Bi.
27
Queries - Example (III)
28
Division - Example
R S T = R/S
A B
a1 b1
a2 b1
a3 b1
a4 b1
a1 b2
a3 b2
a2 b3
a3 b3
a4 b3
a1 b4
a2 b4
a3 b4
A
a1
a2
a3
B
b1
b4
29
Division - Another Operator
Find the sids of students who are taking all courses.
π(sid,cid)
(Enroll) / πcid(Course)
In general: R/S
• The schema of S must be a proper subset of the schema of R, e.g.{cid} ⊂ {cid, sid}.
• The schema of the result is the set difference of the schema of R and theschema of S.
• For every tuple t in the result and every tuple s in S, t s (t appended ontos) is in the first relation R.
30
Division – Example
31
What We Cannot Compute with Relational Algebra?
• Arithmetic operations, e.g, 3 + 3.
• Aggregate, e.g. the number of students who are taking CSEN501, or theaverage GPA of all students.
IN SQL, these are possible – SQL has numerous extensions to relationalalgebra.
• Recursive queries: given a relation parent() compute the ancestor.
These are not possible in SQL either.
• Complex structures, e.g. lists, arrays, nested relations, . . .
SQL cannot handle complex structures either, but they are possible inobject-oriented data models and query languages.
32
Summary – What you should remember!
• What are query languages?
• Relational Algebra: A set of operations (functions), each of which takesa relation (or relations) as input and produces a relation as output.
• Basic Operations: projection, selection, union, difference, product,renaming
• Additional Operations: intersection, division, join (very useful)
• What we cannot do with relational algebra.
33
Translation of Relational Algebra Exp.
MOVIE(id, namenot null, year, type, remark)
COUNTRY(movie, countrynot null)
Πcountry, name(COUNTRY ⋈COUNTRY.movie=MOVIE.id
ςyear=1893 ⋀ type=‚cinema‘ MOVIE)
From which countries are the movies of the year 1893
and what are their names?
34