db1-03

download db1-03

of 43

Transcript of db1-03

  • 8/3/2019 db1-03

    1/43

    Relational Algebra169

    After completing this chapter, you should be able to

    enumerate and explain the operations of relational algebra

    (there is a core of 5 relational algebra operators),

    write relational algebra queries of the type

    joinselectproject,

    discuss correctness and equivalence of given relational

    algebra queries.

    Relational Algebra170

    Overview

    1. Introduction; Selection, Projection

    2. Cartesian Product, Join

    3. Set Operations

    4. Outer Join

    5. Formal Definitions, A Bit of Theory

  • 8/3/2019 db1-03

    2/43

    Example Database (recap)171

    STUDENTS

    SID FIRST LAST EMAIL101 Ann Smith ...

    102 Michael Jones (null)

    103 Richard Turner ...

    104 Maria Brown ...

    EXERCISES

    CAT ENO TOPIC MAXPT

    H 1 Rel.Alg. 10H 2 SQL 10

    M 1 SQL 14

    RESULTS

    SID CAT ENO POINTS101 H 1 10

    101 H 2 8

    101 M 1 12

    102 H 1 9

    102 H 2 9

    102 M 1 10

    103 H 1 5

    103 M 1 7

    Relational Algebra (1)172

    Relational algebra (RA) is a query language for therelational model with a solid theoretical foundation.

    Relational algebra is not visible at the user interface level (notin any commercial RDBMS, at least).

    However, almost any RDBMS uses RA to represent queriesinternally (for query optimization and execution).

    Knowledge of relational algebra will help in understandingSQL and relational database systems in general.

  • 8/3/2019 db1-03

    3/43

    Relational Algebra (2)173

    In mathematics, an algebra is a

    set (the carrier), and

    operations that are closed with respect to the set.

    Example: (N, {, +}) forms an algebra.

    In case of RA,

    the carrier is the set of all finite relations.

    We will get to know the operations of RA in the sequel (onesuch operation is, for example, ).

    Relational Algebra (3)174

    Another operation of relational algebra is selection.

    In contrast to operations like + in N, the selection is

    parameterized by a simple predicate.

    For example, the operation SID=101 selects all tuples in theinput relation which have the value 101 in column SID.

    Relational algebra: selection

    SID=101

    RESULTSSID CAT ENO POINTS101 H 1 10101 H 2 8101 M 1 12102 H 1 9

    102 H 2 9102 M 1 10103 H 1 5103 M 1 7

    =SID CAT ENO POINTS101 H 1 10101 H 2 8

    101 M 1 12

  • 8/3/2019 db1-03

    4/43

    Relational Algebra (4)175

    Since the output of any RA operation is some relation Ragain, R may be the input for another RA operation.

    The operations of RA nest to arbitrary depth such that

    complex queries can be evaluated. The final results will always

    be a relation.

    A query is a term (or expression) in this relational algebra.

    A query

    FIRST,LAST(STUDENTS CAT=M(RESULTS))

    Relational Algebra (5)176

    There are some difference between the two query languagesRA and SQL:

    Null values are usually excluded in the definition of relationalalgebra, except when operations like outer join are defined.

    Relational algebra treats relations as sets, i.e., duplicate

    tuples will never occur in the input/output relations of an

    RA operator.

    Remember: In SQL, relations are multisets (bags) and may

    contain duplicates. Duplicate elimination is explicit in SQL(SELECT DISTINCT).

  • 8/3/2019 db1-03

    5/43

    Relational Algebra (6)177

    Relational algebra is the query language when it comes to thestudy of relational query languages (DB Theory):

    The semantics of RA is much simpler than that of SQL.

    RA features five basic operations (and can be completely

    defined on a single page, if you will).

    RA is also a yardstick for measuring the expressiveness of

    query languages. If a query language QL can express all

    possible RA queries, then QL is said to be relationallycomplete.

    SQL is relationally complete. Vice versa, every SQL query

    (without null values, aggregation, and duplicates) can also be

    written in RA.

    Selection (1)178

    Selection

    The selection selects a subset of the tuples of a

    relation, namely those which satisfy predicate .Selections acts like a filter on a set.

    Selection

    A=1

    A B

    1 3

    1 4

    2 5

    =A B

    1 3

    1 4

  • 8/3/2019 db1-03

    6/43

    Selection (2)179

    A simple selection predicate has the form

    Term ComparisonOperator Term.

    Term is an expression that can be evaluated to a datavalue for a given tuple:

    an attribute name,

    a constant value,

    an expression built from attributes, constants, and data

    type operations like +, , , /.

    Selection (3)180

    ComparisonOperator is

    = (equals), = (not equals),

    < (less than), > (greater than), , ,

    or other data type-dependent predicates (e.g., LIKE).

    Examples for simple selection predicates:

    LAST = Smith

    POINTS 8

    POINTS = MAXPT.

  • 8/3/2019 db1-03

    7/43

    Selection (4)181

    (R) may be imlemented as:

    Naive selection

    create a new temporary relation T;

    foreach t R do

    p (t);

    if p then

    insert t into T;

    fi

    od

    return T;

    If index structures are present (e.g., a B-tree index), it ispossible to evaluate (R) without reading every tuple of R.

    Selection (5)182

    A few corner cases

    C=1

    A B1 31 4

    2 5

    = (schema error)

    A=A

    A B1 31 42 5

    =

    A B1 31 42 5

    1=2

    A B1 31 42 5

    = A B

  • 8/3/2019 db1-03

    8/43

    Selection (6)183

    (R) corresponds to the following SQL query:

    SELECT *

    FROM R

    WHERE

    A different relational algebra operation called projection

    corresponds to the SELECT clause. Source of confusion.

    Selection (7)184

    More complex selection predicates may be performed usingthe Boolean connectives:

    1 2 (and), 1 2 (or), 1 (not).

    Note: 12 (R) = 1 (2 (R)).

    and

    Are the Boolean connectives , strictly needed?

    The selection predicate must permit evaluation for each inputtuple in isolation.

    Thus, exists () and for all () or nested relational algebra

    queries are not permitted in selection predicates. Actually, such

    predicates do not add to the expressiveness of RA.

  • 8/3/2019 db1-03

    9/43

    Projection (1)185

    ProjectionThe projection L eliminates all attributes

    (columns) of the input relation but those

    mentioned in the projection list L.

    Projection

    A,C

    A B C1 4 7

    2 5 8

    3 6 9

    =A C1 7

    2 8

    3 9

    Projection (2)186

    The projection Ai1 ,...,Aik (R) produces for each input tuple(A1 : d1, . . . , An : dn) an output tuple (Ai1 : di1 , . . . , Aik : dik).

    may be used to reorder columns.

    discards rows, discards columns.

    DB slang: All attributes not in L are projected away.

  • 8/3/2019 db1-03

    10/43

    Projection (3)187

    In general, the cardinalities of the input and output relationsare not equal.

    Projection eliminates duplicates

    B

    A B

    1 4

    2 5

    3 4

    =B

    4

    5

    Projection (4)188

    Ai1 ,...,Aik (R) may be imlemented as:

    Naive projection

    create a new temporary relation T;

    foreach t = (A1 : d1, . . . , An : dn)

    R dou (Ai1 : di1 , . . . , Aik : dik);

    insert u into T;

    od

    eliminate duplicate tuples in T;

    return T;

    The necessary duplicate elimination makes L one of the more

    costly operations in RDBMSs. Thus, query optimizers try hardto prove that the duplicate eliminaton step is not necessary.

  • 8/3/2019 db1-03

    11/43

    Projection (5)189

    If RA is used to formalize the semantics of SQL, the formatof the projection list is often generalized:

    Attribute renaming:

    B1Ai1 ,...,BkAik (R) .

    Computations (e.g., string concatenation via ) to derive

    the value in new columns, e.g.:

    SID,NAME FIRST LAST (STUDENTS) .

    Such generalized operators are also referred to as mapoperators (as in functional programming languages).

    Projection (6)190

    A1,...,Ak(R) corresponds to the SQL query:

    SELECT DISTINCT A1, . . . ,AkFROM R

    B1A1,...,BkAk(R) is equivalent to the SQL query:

    SELECT DISTINCT A1 [AS] B1, . . . ,Ak [AS] BkFROM R

  • 8/3/2019 db1-03

    12/43

    Selection vs. Projection191

    Selection vs. Projection

    Selection Projection

    A1 A2 A3 A4 A1 A2 A3 A4

    Filter some rows Maps all rows

    Combining Operations (1)192

    Since the result of any relational algebra operation is arelation again, this intermediate result may be the input of a

    subsequent RA operation.

    Example: retrieve the exercises solved by student with ID 102:

    CAT,ENO(SID=102(RESULTS)) .

    We can think of the intermediate result to be stored in anamed temporary relation (or as a macro definition):

    S102 SID=102(RESULTS);CAT,ENO(S102)

  • 8/3/2019 db1-03

    13/43

    Combining Operations (2)193

    Composite RA expressions are typically depicted as operatortrees:

    CAT,ENO

    SID=102

    RESULTS

    +

    x

    2

    ????

    ?y

    ????

    ?

    In these trees, computation proceeds bottom-up. Theevaluation order of sibling branches is not pre-determined.

    Combining Operations (3)194

    SQL-92 permits the nesting of queries (the result of a SQLquery may be used in a place of a relation name):

    Nested SQL QuerySELECT DISTINCT CAT, ENO

    FROM (SELECT *

    FROM RESULTS

    WHERE SID = 102) AS S102

    Note that this is not the typical style of SQL querying.

  • 8/3/2019 db1-03

    14/43

    Combining Operations (4)195

    Instead, a single SQL query is equivalent to an RA operatortree containing , , and (multiple) (see below):

    SELECT-FROM-WHERE Block

    SELECT DISTINCT CAT, ENO

    FROM RESULTS

    WHERE SID = 102

    Really complex queries may be constructed step-by-step (usingSQLs view mechanism), S102 may be used like a relation:

    SQL View DefinitionCREATE VIEW S102

    AS SELECT *

    FROM RESULTS

    WHERE SID = 102

    Relational Algebra196

    Overview

    1. Introduction; Selection, Projection

    2. Cartesian Product, Join

    3. Set Operations

    4. Outer Join

    5. Formal Definitions, A Bit of Theory

  • 8/3/2019 db1-03

    15/43

    Cartesian Product (1)197

    In general, queries need to combine information from severaltables.

    In RA, such queries are formulated using , the Cartesianproduct.

    Cartesian Product

    The Cartesian product R S of two relations R, S iscomputed by concatenating each tuple t R with eachtuple u S. ( denotes tuple concatenation.)

    Cartesian Product (2)198

    Cartesian Product

    A B

    1 2

    3 4

    C D

    6 7

    8 9

    =

    A B C D

    1 2 6 71 2 8 9

    3 4 6 7

    3 4 8 9

    Since attribute names must be unique within a tuple, theCartesian product may only be applied if R, S do not share

    any attribute names. (This is no real restriction because we

    have .)

  • 8/3/2019 db1-03

    16/43

    Cartesian Product (3)199

    If t = (A1 : a1, . . . , An : an) and u = (B1 : b1, . . . , Bm : bm),then t u = (A1 : a1, . . . , An : an, B1 : b1, . . . , Bm : bm).

    Cartesian Product: Nested Loops

    create a new temporary relation T;

    foreach t R doforeach u S do

    insert t u into T;odod

    return T;

    Cartesian Product and Renaming200

    R S may be computed by the equivalent SQL query (SQLdoes not impose the unique column name restriction, a

    column A of relation R may uniquely be identified by R.A):

    Cartesian Product in SQL

    SELECT *

    FROM R, S

    In RA, this is often formalized by means of of a renaming

    operator X(R). If sch(R) = (A1 : D1, . . . , An : Dn), then

    X(R) X.A1A1,...,X.AnAn (R) .

  • 8/3/2019 db1-03

    17/43

    Join (1)201

    The intermediate result generated by a Cartesian product maybe quite large in general (|R| = n, |S| = m |R S| = n m).

    Since the combination of Cartesian product and selection inqueries is common, a special operator join has been

    introduced.

    Join

    The (theta-)join R S between relations R, S is

    defined asR S (R S).

    The join predicate may refer to attribute names of R

    and S.

    Join (2)202

    S(STUDENTS) S.SID=R.SID R(RESULTS)

    S.SID S.FIRST S.LAST S.EMAIL R.SID R.CAT R.ENO R.POINTS

    101 Ann Smith ... 101 H 1 10

    101 Ann Smith ... 101 H 2 8

    101 Ann Smith ... 101 M 1 12

    102 Michael Jones (null) 102 H 1 9

    102 Michael Jones (null) 102 H 2 9

    102 Michael Jones (null) 102 M 1 10

    103 Richard Turner ... 103 H 1 5

    103 Richard Turner ... 103 M 1 7

    Note: student Maria Brown does not appear in the join result.

  • 8/3/2019 db1-03

    18/43

    Join (3)203

    R S can be evaluated by folding the procedures for , :

    Nested Loop Join

    create a new temporary relation T;

    foreach t R doforeach u S do

    if (t u) theninsert t u into T;

    fi

    od

    od

    return T;

    Join (4)204

    Join combines tuples from two relations and acts like a filter:tuples without join partner are removed.

    Note: if the join is used to follow a foreign key relationship,

    then no tuples are filtered:

    Join follows a foreign key relationship (dereference)

    RESULTS SID=S.SID S.SIDSID,FIRST,LAST,EMAIL(STUDENTS)

    There are join variants which act like filters only: left andright semijoin (,):

    R S sch(R)(R S) ,

    or do not filter at all: outer-join (see below).

  • 8/3/2019 db1-03

    19/43

    Natural Join205

    The natural join provides another useful abbreviation (RAmacro).

    In the natural join R S, the join predicate is defined to be

    an conjunctive equality comparison of attributes sharing

    the same name in R, S.

    Natural join handles the necessary attribute renaming and

    projection.

    Natural Join

    Assume R(A,B,C) and S(B ,C ,D). Then:

    R S = A,B,C,D(B=BC=C(R BB,CC,D(S)))

    (Note: shared columns occur once in the result.)

    Joins in SQL (1)206

    In SQL, R S is normally written as

    Join in SQL (classic and SQL-92)

    SELECT FROM R,S

    WHERE

    orSELECT FROM R JOIN S ON

    Note: this left query is exactly the SQL equivalent of(R S) we have seen before.

    SQL is a declarative language: it is the task of the SQLoptimizer to infer that this query may be evaluated using a

    join instead of a Cartesian product.

  • 8/3/2019 db1-03

    20/43

    Algebraic Laws (1)207

    The join satisfy the associativity condition

    (R S) T R (S T) .

    In join chains, parentheses are thus superfluous:

    R S T .

    Join is not commutative unless it is followed by a projection,i.e., a column reordering:

    L(R S) L(S R) .

    Algebraic Laws (2)208

    A significant number of further algebraic laws hold, which areheavily utilized by the query optimizer.

    Example: selection push-down.

    If predicate refers to attributes in S only, then(R S) R (S) .

    Selection push-down

    Why is selection push-down considered one of the most

    significant algebraic optimizations?

    (Such effficiency considerations are the subject ofDatenbanken II.)

  • 8/3/2019 db1-03

    21/43

    A Common Query Pattern (1)209

    The following operator tree structure is very common:

    A1,...,Ak

    1

    2

    ooo

    n1

    Rnooo

    Rn1

    OOR2

    OOOO R1

    OOOO

    1 Join all tables needed to answer the query, 2 select therelevant tuples, 3 project away all irrelevant columns.

    A Common Query Pattern (2)210

    The select-project-join query

    A1,...,Ak((R1 1 R2 2 n1 Rn))

    has the obvious SQL equivalent

    SELECT DISTINCT A1, . . . ,AkFROM R1, . . . ,RnWHERE

    AND 1 AND AND n1

    It is a common source of errors to forget a join condition:think of the scenario R(A, B), S(B, C), T(C, D) whenattributes A, D are relevant for the query output.

  • 8/3/2019 db1-03

    22/43

    Relational Algebra Quiz (Level: Novice)211

    STUDENTS

    SID FIRST LAST EMAIL101 Ann Smith ...

    102 Michael Jones (null)

    103 Richard Turner ...

    104 Maria Brown ...

    EXERCISES

    CAT ENO TOPIC MAXPT

    H 1 Rel.Alg. 10H 2 SQL 10

    M 1 SQL 14

    RESULTS

    SID CAT ENO POINTS101 H 1 10

    101 H 2 8

    101 M 1 12

    102 H 1 9

    102 H 2 9

    102 M 1 10

    103 H 1 5

    103 M 1 7

    Relational Algebra Quiz (Level: Novice)212

    Formulate equivalent queries in RA

    1 Print all homework results for Ann Smith (print exercise

    number and points).

    2 Who has got the maximum number of points for ahomework? Print full name and homework number.

    3 (Who has got the maximum number of points for allhomework exercises?)

  • 8/3/2019 db1-03

    23/43

    Self Joins (1)213

    Sometimes it is necesary to refer to more than one tuple ofthe same relation at the same time.

    Example: Who got more points than the student with ID

    101 for any of the exercises?

    Two answer this query, we need to compare two tuples t, u

    of the relation RESULTS:

    1 tuple t corresponding to the student with ID 101,

    2 tuple u, corresponding to the same exercise as the tuplet, in which u.POINTS > t.POINTS.

    Self Joins (2)214

    This requires a generalization of the select-project-join querypattern, in which two instances of the same relation are

    joined (the attributes in at least one instances must be

    renamed first):

    S := X(RESULTS) X.CAT=Y.CAT X.ENO=Y.ENO

    Y(RESULTS)

    X.SID(X.POINTS>Y.POINTS Y.SID=101)(S)

    Such joins are commonly referred to as self joins.

  • 8/3/2019 db1-03

    24/43

    Relational Algebra215

    Overview

    1. Introduction; Selection, Projection

    2. Cartesian Product, Join

    3. Set Operations

    4. Outer Join

    5. Formal Definitions, A Bit of Theory

    Set Operations (1)216

    Relations are sets (of tuples). The usual family of binaryset operations can also be applied to relations.

    It is a requirement, that both input relations have the sameschema.

    Set Operations

    The set operations of relational algebra are R S,R S, and R S (union, intersection, difference).

    A minimal set of operations

    Which of these set operations is redundant (i.e., may be

    derived using an alternative RA expression, just like )?

  • 8/3/2019 db1-03

    25/43

    Set Operations (2)217

    R

    S

    R S

    R S

    R S

    S R

    Set Operations (3)218

    R S may be implemented as follows:

    Union

    create a new temporary relation T;foreach t R do

    insert t into T;

    od

    foreach t S doinsert t into T;

    od

    remove duplicates in T;return T;

  • 8/3/2019 db1-03

    26/43

    Set Operations (4)219

    R S may be implemented as follows:

    Difference

    create a new temporary relation T;

    foreach t R doremove false;foreach u S do

    remove remove (t = u);od

    if remove theninsert t into T;

    fi

    od

    return T;

    Union (1)220

    In RA queries, a typical application for is case analysis.

    Example: Grading

    MPOINTS := SID,POINTS(CAT=MENO=1(RESULTS)))

    SID,GRADEA(POINTS12(MPOINTS))

    SID,GRADEB(POINTS10 POINTS

  • 8/3/2019 db1-03

    27/43

    Union (2)221

    In SQL, is directly supported: keyword UNION.UNION may be placed between two SELECT-FROM-WHERE blocks:

    SQLs UNION

    SELECT SID, A AS GRADE

    FROM RESULTS

    WHERE CAT = M AND ENO = 1 AND POINTS = 12

    UNION

    SELECT SID, B AS GRADEFROM RESULTS

    WHERE CAT = M AND ENO = 1

    AND POINTS = 10 AND POINTS 12

    UNION

    ...

    Set Difference (1)222

    Note: the RA operators ,, ,, are monotic bydefinition, e.g.:

    R S = (R) (S) .

    Then it follows that every query Q that exclusively uses theabove operators behaves monotonically:

    Let I1 be a database state, and let I2 = I1 {t}(database state after insertion of tuple t).

    Then every tuple u contained in the answer to Q in state I1

    is also contained in the anser to Q in state I2.

    Database insertion never invalidates a correct answer.

  • 8/3/2019 db1-03

    28/43

    Set Difference (2)223

    If we pose non-monotonic queries, e.g.,

    Which student has not solved any exercise?

    Who got the most points for Homework 1?

    Who has solved all exercises in the database?

    then it is obvious that ,, ,, are not sufficient toformulate the query. Such queries require set difference ().

    A non-monotonic query

    Which student has not solved any exercise? (Print

    full name (FIRST, LAST).

    (Example database tables repeated on next slide.)

    Example Database (recap)224

    STUDENTS

    SID FIRST LAST EMAIL

    101 Ann Smith ...

    102 Michael Jones (null)

    103 Richard Turner ...104 Maria Brown ...

    EXERCISES

    CAT ENO TOPIC MAXPT

    H 1 Rel.Alg. 10

    H 2 SQL 10

    M 1 SQL 14

    RESULTS

    SID CAT ENO POINTS

    101 H 1 10

    101 H 2 8

    101 M 1 12102 H 1 9

    102 H 2 9

    102 M 1 10

    103 H 1 5

    103 M 1 7

  • 8/3/2019 db1-03

    29/43

    Set Difference (3)225

    A correct solution?

    FIRST,LAST(STUDENTS SID=SID2 SID2SID(RESULTS))

    A correct solution?

    SID,FIRST,LAST(STUDENTS SID(RESULTS))

    Correct solution!

    Set Difference (4)226

    A typical RA query pattern involving set difference is theanti-join.

    Given R(A, B) and S(B, C), retrieve the tuples of R that donot have a (natural) join partner in S

    (Note: sch(R) sch(S) = {B}):

    R (B(R) B(S)) .

    (The following is equivalent: R sch(R)(R S).)

    There is no common symbol for this anti-join, but RSseems appropriate (complemented semi-join).

  • 8/3/2019 db1-03

    30/43

    Set Difference (5)227

    Suppose that the relations R, S have been computed as:

    S := SELECT A1, . . . , An FROM R1, . . . , Rm WHERE 1 R := SELECT B1, . . . , Bn FROM S1, . . . , Sk WHERE 2

    Set difference R S in SQL4

    SELECT A1, . . . ,AnFROM R1, . . . ,RmWHERE 1 AND NOT EXISTS

    (SELECT *

    FROM S1, . . . ,SkWHERE 2AND B1=A1 AND AND Bn=An)

    4The subquery in () returns TRUE if it returns 0 tuples.

    Set Operations and Complex Selections228

    Note that the availability of , (and ) renders complexselection predicates superfluous:

    Predicate Simplification Rules

    12 (Q)= 1 (Q) 2 (Q)

    12 (Q) = 1 (Q) 2 (Q)

    (Q) = Q (Q)

    RDBMS implement complex selection predicates anyway

    Why?

  • 8/3/2019 db1-03

    31/43

    Relational Algebra Quiz (Level: Intermediate)229

    The RA quiz below refers to the Homework DB. Schema:

    RESULTS (SID STUDENTS,(CAT, ENO) EXERCISES,POINTS)

    STUDENTS (SID,FIRST,LAST,EMAIL)

    EXERCISES (CAT,ENO,TOPIC,MAXPT)

    Formulate equivalent queries in RA

    1 Who got the most points (of all students) forHomework 1?

    2 Which students solved all exercises in the database?

    Union vs. Join230

    Find RA expressions that translate between the two

    Two alternative representations of the homework, midtermexam, and final totals of the students are:

    RESULTS 1STUDENT H M F

    Jim Ford 95 60 75Ann Smith 80 90 95

    RESULTS 2STUDENT CAT PCT

    Jim Ford H 95Jim Ford M 60Jim Ford F 75

    Ann Smith H 80Ann Smith M 90Ann Smith F 95

  • 8/3/2019 db1-03

    32/43

    Summary231

    The five basic operations of relational algebra are:

    1 Selection

    2 L Projection

    3 Cartesian Product

    4 Union

    5 Difference

    Derived (and thus redundant) operations:Theta-Join , Natural Join , Semi-Join , Renaming ,

    and Intersection .

    Relational Algebra232

    Overview

    1. Introduction; Selection, Projection

    2. Cartesian Product, Join

    3. Set Operations

    4. Outer Join

    5. Formal Definitions, A Bit of Theory

  • 8/3/2019 db1-03

    33/43

    Outer Join (1)233

    Join () eliminates tuples without partner:

    A B

    a1 b1a1 b2

    B C

    b2 c2b3 c3

    =A B C

    a2 b2 c2

    The left outer join preserves all tuples in its left argument,

    even if a tuple does not team up with a partner in the join:A B

    a1 b1a1 b2

    B C

    b2 c2b3 c3

    =

    A B C

    a1 b1 (null)

    a2 b2 c2

    Outer Join (2)234

    The right outer join preserves all tuples in its right argument:

    A B

    a1 b1a1 b2

    B C

    b2 c2b3 c3

    =

    A B C

    a2 b2 c2(null) b3 c3

    The full outer join preserves all tuples in both arguments:

    A B

    a1 b1a1 b2

    B C

    b2 c2b3 c3

    =

    A B C

    a1 b1 (null)

    a2 b2 c2(null) b3 c3

  • 8/3/2019 db1-03

    34/43

    Outer Join (3)235

    R S

    create a new temporary relation T;foreach t R do

    haspartner false;foreach u S do

    if (t u) theninsert t u into T;haspartner true;

    fiodif haspartner then

    insert t (null, . . . , null) | {z } # attributes in S

    into T;fiodreturn T;

    Outer Join (4)236

    Example: Prepare a full homework results report, includingthose students who did not hand in any solution at all:

    STUDENTS SID=SID SIDSID,ENO,POINTS(CAT=H(RESULTS))

    SID FIRST LAST EMAIL SID ENO POINTS

    101 Ann Smith ... 101 1 10

    101 Ann Smith ... 101 2 8

    102 Michael Jones (null) 102 1 9

    102 Michael Jones (null) 102 2 9

    103 Richard Turner ... 103 1 5

    104 Maria Brown ... (null) (null) (null)

  • 8/3/2019 db1-03

    35/43

    Outer Join (5)237

    Join vs. Outer Join

    Is there any difference between STUDENTS

    RESULTS andSTUDENTS RESULTS?

    (Can you tell without looking at the table states?)

    Note: Outer join is a derived operation (like , ), i.e., itcan simulated using the five basic relational algebra operations.

    Consider R(A, B) and S(B, C). Then

    R S (R S)

    (R A,B(R S)) {(C:null)}

    SQL-92 provides {FULL, LEFT, RIGHT} OUTER JOIN.

    Relational Algebra238

    Overview

    1. Introduction; Selection, Projection

    2. Cartesian Product, Join

    3. Set Operations

    4. Outer Join

    5. Formal Definitions, A Bit of Theory

  • 8/3/2019 db1-03

    36/43

    Definitions: Syntax (1)239

    Let the following be given:

    A set D of data type names and for each D D a setval(D) of values.

    A set A of valid attribute names (identifiers).

    Relational Database Schema

    A relational database schema S consists of a finite set of relation names R, and for every R R a relation schema sch(R).(We ill ignore constraints here.)

    Definitions: Syntax (2)240

    The set of syntactically correct RA expressions or queries isdefined recursively, together with the resulting schema of

    each expression.

    Syntax of RA (Base Cases)

    1 R (relation name)For every R R, R is an RA expression with schemasch(R).

    2 {(A1:d1, . . . , An:dn)} (relation constant)A relation constant is an RA expression if A1, . . . , An

    A, di val(Di) for 1 i n with D1, . . . , Dn D.The schema of this expression is (A1:D1, . . . , An:Dn).

  • 8/3/2019 db1-03

    37/43

    Definitions: Syntax (3)241

    Let Q be an RA expr. with schema s = (A1:D1, . . . , An:Dn).

    Syntax of RA (Recursive Cases)

    3 Ai=Aj(Q)for i , j {1, . . . , n} is an RA expression with schema s.

    4 Ai=d(Q)for i {1, . . . , n} and d val(Di) is an RA expressionwith schema s.

    5 B1Ai1 ,...,BmAim (Q)for i1, . . . , i m {1, . . . , n} and B1, . . . , Bm A suchthat Bj = Bk for j = k is an RA expression with schema(B1:Di1 , . . . , Bm:Dim ).

    Definitions: Syntax (4)242

    Let Q1, Q2 be an RA expressions with the same schema s.

    Syntax of RA (Recursive Cases)

    6 Q1 Q2 and 7 Q1 Q2

    are RA expressions with schema s.

    Let Q1, Q2 be RA expressions with schemas(A1:D1, . . . , An:Dn) and (B1:E1, . . . , Bm:Em), respectively.

    Syntax of RA (Recursive Cases)

    8 Q1 Q2

    is an RA expression with schema (A1:D1, . . . , An:Dn,B1:E1, . . . , Bm:Em) if {A1, . . . , An} {B1, . . . , Bm} = .

  • 8/3/2019 db1-03

    38/43

    Definitions: Semantics (1)243

    Database StateA database state I (instance) defines a relation I(R) for

    every relation name R in the database schema S.

    The result of a query Q, i.e., an RA expression, in adatabase state I is a relation. This relation is denoted by I[Q]

    and defined recursively corresponding to the syntacticstructure of Q.

    Definition: Semantics (2)244

    I[Q]

    If Q is a relation name R, then I[Q] := I(R).

    If Q is a constant relation

    {(A

    1:d

    1, . . . , A

    n:d

    n)

    }, then

    I[Q] := {(d1, . . . , d n)}.

    If Q has the form Ai=Aj(Q1), thenI[Q] := {(d1, . . . , d n) I[Q1] | di = dj}

    If Q has the form Ai=d(Q1), thenI[Q] := {(d1, . . . , d n) I[Q1] | di = d}

    If Q has the form B1Ai1 ,...,BmAim (Q1), thenI[Q] := {(di1 , . . . , d im ) | (d1, . . . , d n) I[Q1]}

  • 8/3/2019 db1-03

    39/43

    Definition: Semantics (3)245

    I[Q] (continued)

    If Q has the form Q1 Q2, thenI[Q] := I[Q1] I[Q2]

    If Q has the form Q1 Q2, thenI[Q] := I[Q1] I[Q2]

    If Q has the form Q1 Q2, thenI[Q] := { (d1, . . . , d n, e1, . . . , e m) |

    (d1, . . . , d n) I[Q1],(e1, . . . , e m) I[Q2]} .

    Monotonicity246

    Smaller Database State

    A database state I1 is smaller than (or equal to) a database

    state I2, written I1 I2, iff I1(R) I2(R) for all relationnames R R of schema S.

    Theorem: RA \{} is monotonic

    If an RA expression Q does not contain the (set difference)operator, then the following holds for all database states I1, I2:

    I1 I2 = I1[Q] I2[Q] .

    Formulate proof by induction on syntactic structure of Q(structural induction).

  • 8/3/2019 db1-03

    40/43

    Equivalence (1)247

    Equivalence of RA Expressions

    Two RA expressions Q1 and Q2 are equivalent iff theyhave the same (result) schema and for all database

    states I, the following holds:

    I[Q1] = I[Q2] .

    Examples: 1 (2 (Q)) = 2 (1 (Q))

    (Q1 Q2) Q3 = Q1 (Q2 Q3) If A is an attribute in the result schema of Q1, then

    A=d(Q1 Q2) = (A=d(Q1)) Q2.

    Theorem: The equivalence of (arbitrary) relational

    algebra expressions is undecidable.

    Limitations of RA (1)248

    Let R be a relation name and assume sch(R) = (A:D, B:D),i.e., both columns share the same data type D. Let val(D) be

    infinite.

    The transitive closure of I(R) is the set of all(d , e) val(D) val(D) such that there are n N, n 1,and d0, d1, . . . , d n val(D) with d = d0, e = dn and(di1, di) I(R) for i = 1, . . . , n.

    a b++

    c

    d ll

    Rfrom to

    a b

    b cc d

  • 8/3/2019 db1-03

    41/43

    Limitations of RA (2)249

    Theorem: There is no RA expression Q such that I[Q] isthe transitive closure of I(R) for all database states I.

    In the directed graph example, one self-join (of R with

    itself) is needed, to follow the edges in the graph:

    S.from,T.to(S(R) S.to=T.from T(R))

    An n-fold self-join will find all paths in the graph of lengthn + 1. To compute the transitive closure for arbitrary graphs,

    i.e., for all database states I, is impossible in RA.

    Limitations of RA (3)250

    This of course implies that relational algebra is notcomputationally complete.

    There are functions from database states to relations (queryresults), for which we could write a program5, but we will

    not be able to find an equivalent RA expression to do the

    same.

    However, this would have been truly unexpected and

    actually unwanted, because want a guarantee that query

    evaluation always terminates. This is guaranteed for RA.

    Otherwise, we would have solved the halting problem.

    5Pick your favourite programming language.

  • 8/3/2019 db1-03

    42/43

    Limitations of RA (4)251

    All RA queries can be evaluated in time that ispolynomically in the size of the database state.

    This implies that certain complex problems cannot beformulated in relational algebra.

    For example, if you find a way to formulate the Traveling

    Salesman problem in RA, you have solved the famous P = NP

    problem. (With a solution that nobody expects; contact me tocollect your PhD.)

    As the transitive closure example shows, even not all problemsof polynomial complexity can be formulated in classical RA.

    Expressive Power (1)252

    Relational Completeness

    A query language L for the relational model is called

    strong relationally complete if, for every DB schema Sand for every RA epxression Q1 with respect to S there isa query Q2 L such that for all database states I withrespect to S the two queries produce the same results:

    I[Q1] = I[Q2] .

    Read as: It is possible to write an RA-to-L query compiler.

  • 8/3/2019 db1-03

    43/43

    Expressive Power (2)253

    SQL is strong relationally complete. If we can even write RA-to-L as well as L-to-RA compilers,

    both query languages are equivalent. SQL and RA are not equivalent. SQL contains concepts,

    e.g., the aggregate COUNT, which cannot be simulated in RA.

    Equivalent Query Languages

    1 Relational algebra,

    2 SQL without aggregations and with mandatoryduplicate elimination,

    3 Tuple relational calculus,

    4 Datalog (a Prolog variant) without recursion.