Chapter 7 Relational Algebra. Topics in this Chapter Closure Revisited The Original Algebra: Syntax...
-
Upload
geoffrey-bruce-parks -
Category
Documents
-
view
232 -
download
0
Transcript of Chapter 7 Relational Algebra. Topics in this Chapter Closure Revisited The Original Algebra: Syntax...
Chapter 7
Relational Algebra
Topics in this Chapter
• Closure Revisited • The Original Algebra: Syntax and
Semantics• What is the Algebra For?• Further Points and Additional Operators• Grouping and Ungrouping
Relational Algebra
• The relational algebra is a collection of operators that take relations as their operands and return a relation as their result
• Eight operators, in two groups of four• Union, intersect, difference, Cartesian product• Restrict, project, join, divide• The set of possible relational operators is
essentially unlimited• The operators are “read only”
Fig. 7.1 The original eight operators (overview)
Closure Revisited
• The output from any relation operator is another relation: the closure property
• Relation expressions can be nested (analogously to arithmetic expressions)
• Every relation has a head and a body; relational algebra must address both
• Attribute type inference must be supported• RENAME changes the name of an attribute
without changing its type or content
RENAME
S RENAME CITY AS SCITY
+------+-------+--------+--------+| S# | SNAME | STATUS | SCITY |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+
RENAME
(S RENAME CITY AS SCITY)
+------+-------+--------+--------+| S# | SNAME | STATUS | SCITY |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+
expression
value
The Syntax of the Original Algebra
The algebra exists because of the nature and definition of relations. The algebra is independent of its description.
Note that the Date text defines the algebra using words rather than symbols.
Most texts use symbols to describe the syntax of the algebra. Some who prefer that say that it “looks more scientific.”
More about the symbols later.
The Syntax of the Original Algebra
BNF grammar for the relational algebra:
::= “is defined as”< > to indicate category names | “or” […] to indicate something optional
upper case words such as WHERE are elements of the language{ and } are symbols in the language, not BNFuse of “commalist” for repetition
The Syntax of the Original Algebra
• Each operator returns a relation, and operates on a relation
• Each operator assigns a relation value to the new relation, based on alterations to the tables being operated upon
• Generically:
<relation expression>:= RELATION { <tuple expression
commalist>
}
The Syntax of the Original Algebra –General Format
<relation expression>
:= RELATION {<tuple expression commalist>}
| <relvar name>
| <relation operator invocation>
| <with expression>
| <introduced name>
| ( <relation expression>)
<relation operation invocation> ::= <project> | <nonproject>
<project> :: = <relation expression>
{ [ ALL BUT ] <attribute name commalist> }
(The <relation expression> must not be a <nonproject>)
<nonproject> ::= <rename> | <union> | <intersect> | <minus>
| <times> | <where> | <join> | <divide>
<rename> ::= <relation expression> RENAME <renaming commalist>
(The <relation expression> must not be a <nonproject>)
<union> ::= <relation expression> UNION <relation expression>
(The <relation expression>s must not be <nonproject>s,
except either or both can be another <union>)
<intersect> ::= <relation expression> INTERSECT <relation expression>
(The <relation expression>s must not be <nonproject>s)
(Except either or both can be another <intersect>)
<minus> ::= <relation expression> MINUS <relation expression>
(The <relation expression>s must not be <nonproject>s)
<times> ::= <relation expression> TIMES <relation expression>
(The <relation expression>s must not be <nonproject>s)
(Except either or both can be another <times>)
<where> ::= <relation expression> WHERE <boolean expression>
(The <relational expression> must not be a <nonproject>)
<join> ::= <relation expression> JOIN <relation expression>
(The <relation expression>s must not be <nonproject>s)
(Except either or both can be another <join>)
<divide> ::= <relation expression>
DIVIDEBY <relation expression> PER <per>
(The <relation expression>s must not be <nonproject>s)
<per> ::= <relation expression>
| (<relation expression>, <relation expression> )
(The <relation expression>s must not be <nonproject>s)
<with expression> ::= WITH <name intro commalist> : <expression>
<name intro> ::= <expression> AS <introduced name>
Semantics of the Original Algebra –Union
• Union operates on two sets and returns a set that contains all elements belonging to either
• Both sets must be of the same type - formerly known as union compatibility
• Relations cannot have duplicate tuples; we say loosely that UNION “eliminates duplicates”
Semantics of the Original Algebra –Intersect and Difference
• Intersect operates on two sets and returns a set that contains all tuples belonging to both
• Difference operates on two sets and returns a set containing all tuples occuring in one but not the other, using MINUS
• For both Intersect and Difference, the sets operated upon must be of the same type - formerly known as union compatibility
Semantics of the Original Algebra –Cartesian Product
• A Cartesian Product is the set of all ordered pairs such that in each pair, the first element comes from the first set, and the second element comes from the second set
• However, since the result of a relational operator is a relation, the result of each pair is a single tuple containing all the elements of both of the source tuples
• Uses keyword TIMES
Semantics of the Original Algebra –Restrict
• Yields a horizontal subset – a/k/a “SELECT”
• a WHERE p• p is called the restriction condition• p is a predicate, and returns boolean• If it can be evaluated by examining a
single tuple it is simple; otherwise it is nonsimple
Semantics of the Original Algebra –Project
• Yields a vertical subset• The general form is a commalist of
attributes to be kept in the result• For all attributes kept, all tuples are kept• An alternative specification is to name the
attributes to be excluded:• P { ALL BUT WEIGHT}
Semantics of the Original Algebra –Join – Natural Join
• When unqualified, join means “natural join”• For any two relations with at least one matching
attribute, the join operator returns a relation with a single tuple of all the attributes for each match
• Attributes that do not match from each source relation are retained
• If no attributes match, result is a Cartesian product
• If all attributes match, result is an Intersect
Semantics of the Original Algebra –Join – Theta Join
• Used to join relations based on matching attributes, where the values are not equal
• Given relations a and b, and attributes X and Y, this can be expressed as follows:
• (a TIMES b) WHERE X theta Y • When theta is set to = the result can be
made to be that of natural join (project away the duplicate attribute, and rename the kept one)
Semantics of the Original Algebra –Divide
• Used to “divide one relation into another”• Small Divide uses one relation expression
as divisor, Great Divide uses two• For small divide:• a DIVIDEDBY b PER c• where a is the dividend, b is the divisor,
and c is the mediator• Used to determine who in a relates to the
complete set in b
Semantics of the Original Algebra –Divide - Example
• Let S be a relation of suppliers, P one of parts, and SP the mediator
• S JOIN ( S {S#} DIVIDEDBY P {P#}
PER SP {S#, P#} )
• Will return a relation with suppliers who supply all parts, only
Examples
Get supplier names for suppliers who supply part P2.
In SQL:SELECT SNAME FROM SWHERE S# IN
(SELECT S# FROM SP WHERE P# = ‘P2’);
In relational algebra:
( ( SP JOIN S ) WHERE P# = P# (‘P2’) ) { SNAME }
Get supplier names for suppliers who supply at least one red part.
SELECT SNAMEFROM SWHERE S# IN
(SELECT S# FROM SP WHERE P# IN
(SELECT P# FROM P WHERE COLOR = ‘RED’) );
( ( ( P WHERE COLOR = COLOR (‘RED’) ) { P# } JOIN SP ) { S# }
JOIN S ) {SNAME}
Get supplier names for suppliers who do not supply part P2.
SELECT SNAME
FROM S
WHERE NOT EXISTS
( SELECT S#
FROM SP
WHERE S# = S.S#
AND P# = ‘P2’ ) ;
( ( S {S#} MINUS ( SP WHERE P# = ‘P2’ ) { S# } )
JOIN S ) { SNAME }
Get all pairs of supplier numbers where the two suppliers are located in the same city.
SELECT FIRST.S#, SECOND.S#FROM S FIRST, S SECONDWHERE FIRST.CITY = SECOND.CITYAND FIRST.S# < SECOND.S#; ( ( ( S RENAME S# AS FIRSTS# ) {FIRSTS#, CITY} JOIN (S RENAME S# AS SECONDS# ) {SECONDS#, CITY} ) WHERE FIRSTS# < SECONDS# )
{ FIRSTS#, SECONDS# }
Get supplier names for suppliers who do not supply part P2. SELECT SNAMEFROM SWHERE NOT EXISTS
( SELECT S# FROM SP WHERE S# = S.S# AND P# = ‘P2’ ) ;
( ( S {S#} MINUS ( SP WHERE P# = ‘P2’ ) { S# } )
JOIN S ) { SNAME }
<divide> ::= <relation expression>
DIVIDEBY <relation expression> PER <per>
(The <relation expression>s must not be <nonproject>s)
<per> ::= <relation expression>
| (<relation expression>, <relation expression> )
(The <relation expression>s must not be <nonproject>s)
<with expression> ::= WITH <name intro commalist> : <expression>
<name intro> ::= <expression> AS <introduced name>
Semantics of the Original Algebra –Divide
• Used to “divide one relation into another”• Small Divide uses one relation expression
as divisor, Great Divide uses two• For small divide:• a DIVIDEDBY b PER c• where a is the dividend, b is the divisor,
and c is the mediator• Used to determine who in a relates to the
complete set in b
Fig. 7.8 Division Examples
The “Symbolic” Form
Names of Suppliers located in Paris:
π SNAME ( σ CITY = ‘Paris’ (S) )
(S WHERE CITY = CITY (‘Paris’) ){SNAME}
Names of Suppliers of part ‘P2’:
π SNAME ( σ P# = ’P2’ (S SP) )
((S JOIN SP) WHERE P# = ‘P2’) {SNAME}
Relational Algebra Symbols
Unary Operators
Selection
Projection
Aggregate Function
Binary Operators
Union
Intersection
Difference
Cartesian product X
Theta Join
Natural Join * (or in some notations )
Left Outer Join
Right Outer Join
Full Outer Join
Outer Union *
Logic Symbols
Logical AND
Logical OR
Logical NOT
What is the Algebra for?
• The purpose of the algebra is to allow the writing of relational expressions
• Applications of the algebra: retrieval, update, defining integrity constraints, derived relvars, stability and security
• An implemented language can be said to be relationally complete if it is at least as powerful as the algebra
The Original Algebra
• Many operators are associative: Union, intersect, times, join, but not minus
• Many operators are commutative: Union, intersect, times, join, but not minus
• Join, union, intersect were originally defined as dyadic, but are now seen to operate on any number of relations, including DEE and DUM
Additional Relational Operators
• Semijoin is used to perform a partial join based on restrictions (Join for a specific part number, for example)
• Semidifference is similar (Obtain suppliers who do not supply a particular part, e.g.)
• Extend adds an attribute dynamically, but does not alter the underlying relvar
• Summarize performs vertical or attribute-wise computations
Semijoin
A SEMIJOIN B is equivalent to:
(A JOIN B) { X, Y }
The JOIN of A and B projected over the attributes of A.
The tuples of A that have “counterparts” in B.
Grouping…
• Required because relations can have attributes that are themselves relations
• Provides a map between such relations and “flat” relations
• SP GROUP {P#, QTY} AS PQ• Will return quantities of parts by supplier,
which is the unnamed co-conspirator
…and Ungrouping
• Returns the original relation• In the example, the original SP relation• If you group, you can always ungroup, but
the converse is not necessarily true• This occurs when the relations being
ungrouped were not validly grouped in the first place