Chapter 3 Relational algebra and calculus

Post on 11-Feb-2016

88 views 12 download

description

Chapter 3 Relational algebra and calculus. Query languages for relational databases. Operations on databases: queries: "read" data from the database updates: change the content of the database Both can be modeled as functions from databases to databases - PowerPoint PPT Presentation

Transcript of Chapter 3 Relational algebra and calculus

1

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Chapter 3

Relational algebra and calculus

2

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Query languages for relational databases

• Operations on databases:– queries: "read" data from the database– updates: change the content of the database

• Both can be modeled as functions from databases to databases• Foundations can be studied with reference to query languages:

– relational algebra, a "procedural" language– relational calculus, a "declarative" language– (briefly) Datalog, a more powerful language

• Then, we will study SQL, a practical language (with declarative and procedural features) for queries and updates

3

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Relational algebra

• A collection of operators that– are defined on relations– produce relations as resultsand therefore can be combined to form complex expressions

• Operators– union, intersection, difference– renaming– selection– projection– join (natural join, cartesian product, theta join)

4

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Union, intersection, difference

• Relations are sets, so we can apply set operators• However, we want the results to be relations (that is,

homogeneous sets of tuples)• Therefore:

– it is meaningful to apply union, intersection, difference only to pairs of relations defined over the same attributes

5

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Union

Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38

Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38

Graduates

Managers

Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 389297 O'Malley 56

Graduates Managers

6

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Intersection

Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38

Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38

Graduates

ManagersNumber Surname Age

7432 O'Malley 399824 Darkes 38

Graduates Managers

7

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Difference

Number Surname Age7274 Robinson 377432 O'Malley 399824 Darkes 38

Number Surname Age9297 O'Malley 567432 O'Malley 399824 Darkes 38

Graduates

ManagersNumber Surname Age

7274 Robinson 37

Graduates - Managers

8

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Paternity Maternity ???

Father ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Paternity Mother ChildEve CainEve Seth

Sarah IsaacHagar Ishmael

Maternity

A meaningful but impossible union

• the problem: Father and Mother are different names, but both represent a "Parent"

• the solution: rename attributes

9

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Renaming

• unary operator• "changes attribute names" without changing values• removes the limitations associated with set operators• notation:

Y X.(r)• example:

Parent Father.(Paternity)• if there are two or more attributes involved then ordering is

meaningful: Location,Pay Branch,Salary.(Employees)

10

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Renaming, example

Father ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Paternity

Father ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Parent Father.(Paternity)

11

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Renaming and unionFather ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Paternity Mother ChildEve CainEve Seth

Sarah IsaacHagar Ishmael

Maternity

Parent Father.(Paternity) Parent Mother.(Maternity)

Parent ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Eve CainEve Seth

Sarah IsaacHagar Ishmael

12

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Renaming and union, with more attributes

Surname Branch SalaryPatterson Rome 45Trumble London 53

Employees

Location,Pay Branch,Salary (Employees) Location,Pay Factory, Wages (Staff)

Surname Location PayPatterson Rome 45Trumble London 53Cooke Chicago 33Bush Monza 32

Surname Factory WagesPatterson Rome 45Trumble London 53

Staff

13

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Selection and projection

• Two unary operators, in a sense orthogonal:– selection for "horizontal" decompositions– projection for "vertical" decompositions

A B C

A B C

Selection

A B C

Projection

A B

14

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Selection

• Produce results – with the same schema as the operand – with a subset of the tuples (those that satisfy a condition)

• Notation: F(r)

• Semantics: F(r) = { t | t r and t satisfies F}

15

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Selection, example

Surname FirstName Age SalarySmith Mary 25 2000Black Lucy 40 3000Verdi Nico 36 4500Smith Mark 40 3900

Employees

Surname FirstName Age SalarySmith Mary 25 2000Verdi Nico 36 4500

Age<30 Salary>4000 (Employees)

16

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Selection, another example

Surname FirstName PlaceOfBirth ResidenceSmith Mary Rome MilanBlack Lucy Rome RomeVerdi Nico Florence FlorenceSmith Mark Naples Florence

Citizens

PlaceOfBirth=Residence (Citizens)Surname FirstName PlaceOfBirth Residence

Black Lucy Rome RomeVerdi Nico Florence Florence

17

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Projection

• Produce results – over a subset of the attributes of the operand – with values from all its tuples

• Notation (given a relation r(X) and a subset Y of X): Y(r)

• Semantics: Y(r) = { t[Y] | t r }

18

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Projection, example

Surname FirstName Department HeadSmith Mary Sales De RossiBlack Lucy Sales De RossiVerdi Mary Personnel FoxSmith Mark Personnel Fox

Employees

Surname FirstNameSmith MaryBlack LucyVerdi MarySmith Mark

Surname, FirstName(Employees)

19

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Projection, another example

Surname FirstName Department HeadSmith Mary Sales De RossiBlack Lucy Sales De RossiVerdi Mary Personnel FoxSmith Mark Personnel Fox

Employees

Department HeadSales De Rossi

Personnel Fox

Department, Head (Employees)

20

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Cardinality of projections

• The result of a projections contains at most as many tuples as the operand

• It can contain fewer, if several tuples "collapse" Y(r) contains as many tuples as r if and only if Y is a superkey

for r; – this holds also if Y is "by chance" (not defined as a superkey

in the schema, but superkey for the specific instance), see the example

21

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Tuples that collapse

RegNum Surname FirstName BirthDate DegreeProg284328 Smith Luigi 29/04/59 Computing296328 Smith John 29/04/59 Computing587614 Smith Lucy 01/05/61 Engineering934856 Black Lucy 01/05/61 Fine Art965536 Black Lucy 05/03/58 Fine Art

Students

Surname DegreeProgSmith ComputingSmith EngineeringBlack Fine Art

Surname, DegreeProg (Students)

22

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Tuples that do not collapse, "by chance"

Students

Surname, DegreeProg (Students) Surname DegreeProgSmith ComputingSmith EngineeringBlack Fine ArtBlack Engineering

RegNum Surname FirstName BirthDate DegreeProg296328 Smith John 29/04/59 Computing587614 Smith Lucy 01/05/61 Engineering934856 Black Lucy 01/05/61 Fine Art965536 Black Lucy 05/03/58 Engineering

23

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Join

• The most typical operator in relational algebra• allows to establish connections among data in different

relations, taking into advantage the "value-based" nature of the relational model

• Two main versions of the join:– "natural" join: takes attribute names into account– "theta" join

• They are all denoted by the symbol

24

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

A natural join

Employee DepartmentSmith salesBlack productionWhite production

Department Headproduction Mori

sales Brown

Employee Department HeadSmith sales BrownBlack production MoriWhite production Mori

r1 r2

r1 r2

25

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Natural join: definition

• r1 (X1), r2 (X2)

• r1 r2 (natural join of r1 and r2) is a relation on X1X2 (the union of the two sets):

{ t on X1X2 | t [X1] r1 and t [X2] r2 }

or, equivalently

{ t on X1X2 | exist t1 r1 and t2 r2 with t [X1] = t1 and t [X2] = t2 }

26

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Natural join: comments

• The tuples in the result are obtained by combining tuples in the operands with equal values on the common attributes

• The common attributes often form a key of one of the operands (remember: references are realized by means of keys, and we join in order to follow references)

27

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Another natural joinOffences Code Date Officer Dept Registartion

143256 25/10/1992 567 75 5694 FR987554 26/10/1992 456 75 5694 FR987557 26/10/1992 456 75 6544 XY630876 15/10/1992 456 47 6544 XY539856 12/10/1992 567 47 6544 XY

Cars Registration Dept Owner …6544 XY 75 Cordon Edouard …7122 HT 75 Cordon Edouard …5694 FR 75 Latour Hortense …6544 XY 47 Mimault Bernard …

Code Date Officer Dept Registration Owner …143256 25/10/1992 567 75 5694 FR Latour Hortense …987554 26/10/1992 456 75 5694 FR Latour Hortense …987557 26/10/1992 456 75 6544 XY Cordon Edouard …630876 15/10/1992 456 47 6544 XY Cordon Edouard …539856 12/10/1992 567 47 6544 XY Cordon Edouard …

Offences Cars

28

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Yet another join

• Compare with the union: – the same data can be combined in various ways

Father ChildAdam CainAdam Abel

Abraham IsaacAbraham Ishmael

Paternity Mother ChildEve CainEve Seth

Sarah IsaacHagar Ishmael

Maternity

Paternity Maternity

Father Child MotherAdam Cain Eve

Abraham Isaac SarahAbraham Ishmael Hagar

29

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Joins can be "incomplete"

• If a tuple does not have a "counterpart" in the other relation, then it does not contribute to the join ("dangling" tuple)

Employee DepartmentSmith salesBlack productionWhite production

Department Headproduction Moripurchasing Brown

Employee Department HeadBlack production MoriWhite production Mori

r1 r2

r1 r2

30

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Joins can be empty

• As an extreme, we might have that no tuple has a counterpart, and all tuples are dangling

Employee DepartmentSmith salesBlack productionWhite production

Department Headmarketing Moripurchasing Brown

Employee Department Head

r1 r2

r1 r2

31

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

The other extreme

• If each tuple of each operand can be combined with all the tuples of the other, then the join has a cardinality that is the product of the cardinalities of the operands

Employee ProjectSmith ABlack AWhite A

Project HeadA MoriA Brown

Employee Project HeadSmith A MoriBlack A BrownWhite A MoriSmith A BrownBlack A MoriWhite A Brown

r1 r2

r1 r2

32

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

How many tuples in a join?

Given r1 (X1), r2 (X2) • the join has a cardinality between zero and the products of the

cardilnalities of the operands: 0 | r1 r2 | | r1 | | r2|

(| r | is the cardinality of relation r)• moreover:

– if the join is complete, then its cardinality is at least the maximum of | r1 | and | r2|

– if X1X2 contains a key for r2, then | r1 r2 | | r1|

– if X1X2 is the primary key for r2, and there is a referential constraint between X1X2 in r1 and such a key, then | r1 r2 | = | r1|

33

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Outer joins

• A variant of the join, to keep all pieces of information from the operands

• It "pads with nulls" the tuples that have no counterpart• Three variants:

– "left": only tuples of the first operand are padded– "right": only tuples of the second operand are padded– "full": tuples of both operands are padded

34

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Outer joinsEmployee Department

Smith salesBlack productionWhite production

Department Headproduction Moripurchasing Brown

Employee Department HeadSmith Sales NULL

Black production MoriWhite production Mori

r1 r2

r1 LEFTr2

Employee Department HeadBlack production MoriWhite production MoriNULL purchasing Brown

r1 RIGHT r2

Employee Department HeadSmith Sales NULL

Black production MoriWhite production MoriNULL purchasing Brown

r1 FULL r2

35

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

N-ary join

• The natural join is – commutative: r1 r2 = r2 r1

– associative: (r1 r2) r3 = r1 (r2 r3)• Therefore, we can write n-ary joins without ambiguity:

r1 r2 … rn

36

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

N-ary joinEmployee Department

Smith salesBlack productionBrown marketingWhite production

Department Divisionproduction Amarketing Bpurchasing B

r1 r2

Division HeadA MoriB Brown

r3

Employee Department Division HeadBlack production A MoriBrown marketing B BrownWhite production A Mori

r1 r2 r3

37

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Cartesian product

• The natural join is defined also when the operands have no attributes in common

• in this case no condition is imposed on tuples, and therefore the result contains tuples obtained by combining the tuples of the operands in all possible ways

38

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Cartesian product: example

Employee ProjectSmith ABlack ABlack B

Code NameA VenusB Mars

Employee Project Code NameSmith A A VenusBlack A A VenusBlack B A VenusSmith A B MarsBlack A B MarsBlack B B Mars

Employees Projects

Employes Projects

39

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Theta-join

• In most cases, a cartesian product is meaningful only if followed by a selection:– theta-join: a derived operator

r1 F r2 = F(r1 r2)

– if F is a conjunction of equalities, then we have an equi-join

40

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Equi-join: example

Employee ProjectSmith ABlack ABlack B

Code NameA VenusB Mars

Employee Project Code NameSmith A A VenusBlack A A VenusBlack B B Mars

Employees Projects

Employes Project=Code Projects

41

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Queries

• A query is a function from database instances to relations• Queries are formulated in relational algebra by means of

expressions over relations

42

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

A database for the examples

Number Name Age Salary101 Mary Smith 34 40103 Mary Bianchi 23 35104 Luigi Neri 38 61105 Nico Bini 44 38210 Marco Celli 49 60231 Siro Bisi 50 60252 Nico Bini 44 70301 Steve Smith 34 70375 Mary Smith 50 65

Employees

Head Employee210 101210 103210 104231 105301 210301 231375 252

Supervision

43

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Example 1

• Find the numbers, names and ages of employees earning more than 40 thousand.

Number Name Age104 Luigi Neri 38210 Marco Celli 49231 Siro Bisi 50252 Nico Bini 44301 Steve Smith 34375 Mary Smith 50

44

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Example 2

• Find the registration numbers of the supervisors of the employees earning more than 40 thousand

Head210301375

45

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Example 3

• Find the names and salaries of the supervisors of the employees earning more than 40 thousand

NameH SalaryHMarco Celli 60Steve Smith 70Mary Smith 65

46

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Example 4

• Find the employees earning more than their respective supervisors, showing registration numbers, names and salaries of the employees and supervisors

Number Name Salary NumberH NameH SalaryH104 Luigi Neri 61 210 Marco Celli 60252 Nico Bini 70 375 Mary Smith 65

47

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Example 5

• Find the registration numbers and names of the supervisors whose employees all earn more than 40 thousand

Number Name301 Steve Smith375 Mary Smith

48

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Algebra with null values

Age>30 (People)• which tuples belong to the result?• The first yes, the second no, but the third?

Name Age SalaryAldo 35 15

Andrea 27 21Maria NULL 42

People

49

Database Systems (Atzeni, Ceri, Paraboschi, Torlone)Chapter 3 : Relational algebra and calculus

McGraw-Hill and Atzeni, Ceri, Paraboschi, Torlone 1999

Material on relational calculus and Datalog will be provided in the near future.

Please contact Paolo Atzeni (atzeni@dia.uniroma3.it) for more information