1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of...

48
1 CSBP430 Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University [email protected]

description

3 BASIS OF THE MODEL The relational Model of Data is based on the concept of a Relation. A Relation is a mathematical concept based on the ideas of sets. The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations.

Transcript of 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of...

Page 1: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

1

CSBP430 – Database Systems

Chapter 7 - The Relational Data Model

Elarbi BadidiCollege of Information TechnologyUnited Arab Emirates [email protected]

Page 2: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

2

In this chapter, you will learn:

Relational Model ConceptsCharacteristics of RelationsRelational Integrity Constraints

Key ConstraintsEntity Integrity ConstraintsReferential Integrity Constraints

Update Operations on RelationsRelational Algebra Operations

SELECT and PROJECT Set OperationsJOIN OperationsAdditional Relational Operations

Page 3: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

3

BASIS OF THE MODEL

The relational Model of Data is based on the concept of a Relation.A Relation is a mathematical concept based on the ideas of sets.The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations.

Page 4: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

4

INFORMAL DEFINITIONS

RELATION: A table of valuesA relation may be thought of as a set of rows.A relation may alternately be though of as a set of columns.Each row of the relation may be given an identifier.Each column typically is called by its column name or column header or attribute name.

Page 5: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

5

FORMAL DEFINITIONS (1)

A Relation may be defined in multiple ways.The Schema of a Relation:

R (A1, A2, .....An)

Relation R is defined over attributes A1, A2, .....An

For Example –

CUSTOMER (Cust-id, Cust-name, Address, Phone#)

Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers.

Page 6: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

6

FORMAL DEFINITIONS (2)

A tuple is an ordered set of valuesEach value is derived from an appropriate domain.Each row in the CUSTOMER table may be called as a tuple in the table and would consist of four values.<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a tuple belonging to the CUSTOMER relation.A relation may be regarded as a set of tuples (rows).Columns in a table are also called as attributes of the relation.

Page 7: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

7

FORMAL DEFINITIONS (3)

The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name.For example, attribute Cust-name is defined over the domain of strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name of customers.Formally,

Given R(A1, A2, .........., An)r(R) subset-of dom (A1) X dom (A2) X ....X dom(An)R: schema of the relationr of R: a specific "value" or population of R.

R is also called the intension of a relationr is also called the extension of a relation

Let S1 = {0,1}Let S2 = {a,b,c}Let R subset-of S1 X S2for example: r(R) = {<0,a> , <0,b> , <1,c> }

Page 8: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

8

DEFINITION SUMMARY

Informal Terms Formal TermsTable RelationColumn Attribute/DomainRow TupleValues in a column DomainTable Definition Schema of RelationPopulated Table Extension

Notes: Whereas languages like SQL use the informal terms of

TABLE (e.g. CREATE TABLE), COLUMN (e.g. SYSCOLUMN variable), the relational database textbooks present the model and operations on it using the formal terms.

Page 9: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

9

Page 10: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

10

Characteristics of Relations

Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form.Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .(However, a more general alternative definition of relation does not require this ordering).Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.Notation:We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t).Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.

Page 11: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

11

Page 12: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

12

Relational Integrity Constraints

Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints:

Domain constraintsKey constraints, Entity integrity constraints, and Referential integrity constraints

Page 13: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

13

Domain constraints

Domains constraints specify that the value of each attribute A must be an atomic value from the domaine dom(A).

Page 14: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

14

Key Constraints

Superkey of R: A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R), t1[SK] # t2[SK].Key of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey.Example: The CAR relation schema:

CAR(State, Reg#, SerialNo, Make, Model, Year)has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key.If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.

Page 15: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

15

Page 16: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

16

Entity Integrity

Relational Database Schema: A set S of relation schemas that belong to the same database. S is the name of the database.S = {R1, R2, ..., Rn}Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples.t[PK] <> null for any tuple t in r(R)Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.

Page 17: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

17

Referential Integrity

A constraint involving two relations (the previous constraints involve a single relation).Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation.Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.

Page 18: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

18

Page 19: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

19

Page 20: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

20

Page 21: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

21

Update Operations on Relations

INSERT a tuple.DELETE a tuple.MODIFY a tuple.Integrity constraints should not be violated by the update operations.Several update operations may have to be grouped together.Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints.In case of integrity violation, several actions can be taken:

cancel the operation that causes the violation (REJECT option)perform the operation but inform the user of the violationtrigger additional updates so the violation is corrected (CASCADE option, SET NULL option)execute a user-specified error-correction routine

Page 22: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

22

The Relational Algebra

Operations to manipulate relations.Used to specify retrieval requests (queries).Query result is in the form of a relation.Relational Operations:

SELECT σ and PROJECT operations.Set operations: These include UNION U, INTERSECTION ||, DIFFERENCE -, CARTESIAN PRODUCT X.JOIN operations .Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.

σ(sigma) (pi)

Page 23: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

23

SELECT σ and PROJECT operations

SELECT operation (denoted by σ ):Selects the tuples (rows) from a relation R that satisfy a certain selection condition cForm of the operation: σ c(R)The condition c is an arbitrary Boolean expression on the attributes of RResulting relation has the same attributes as RResulting relation includes each tuple in r(R) whose attribute values satisfy the condition cExamples: σDNO=4(EMPLOYEE)

σSALARY>30000(EMPLOYEE) σ(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)

Page 24: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

24

Select

Yields a subset of rows based on specified criterion

Page 25: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

25

PROJECT operation (denoted by ):

Keeps only certain attributes (columns) from a relation R specified in an attribute list LForm of operation: L(R)Resulting relation has only those attributes of R specified in LExample: FNAME,LNAME,SALARY(EMPLOYEE)The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a mathematical set (no duplicate elements)Example: SEX,SALARY(EMPLOYEE)If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation.Duplicate tuples are eliminated by the operation.

Page 26: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

26

Project

Yields all values for selected attributes

Page 27: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

27

Page 28: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

28

Sequences of operationsSeveral operations can be combined to form a relational algebra expression (query)Example: Retrieve the names and salaries of employees who work in department 4: FNAME,LNAME,SALARY (σDNO=4(EMPLOYEE) )

Alternatively, we specify explicit intermediate relations for each step:

DEPT4_EMPS σDNO=4(EMPLOYEE) R FNAME,LNAME,SALARY(DEPT4_EMPS)

Attributes can optionally be renamed in the resulting left-hand-side relation (this may be required for some operations that will be presented later):

DEPT4_EMPS σDNO=4(EMPLOYEE) R(FIRSTNAME,LASTNAME,SALARY) <- FNAME,LNAME,SALARY(DEPT4_EMPS)

Page 29: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

29

Page 30: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

30

Set Operations

Binary operations from mathematical set theory:UNION: R1 U R2,INTERSECTION: R1 | | R2,SET DIFFERENCE: R1 - R2,CARTESIAN PRODUCT: R1 X R2.For U, ||, -, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility.The resulting relation for U, ||, or - has the same attribute names as the first operand relation R1 (by convention).

Page 31: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

31

Union

Combines all rows

Page 32: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

32

Intersect

Yields rows that appear in both tables

Page 33: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

33

Difference

Yields rows not found in other tables

Page 34: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

34

Page 35: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

35

CARTESIAN PRODUCT

R(A1, A2, ..., Am, B1, B2, ..., Bn) <- R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn)

A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that: t[A1, A2, ..., Am]=t1 and t[B1, B2, ..., Bn]=t2If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.CARTESIAN PRODUCT is a meaningless operation on its own. It can combine related tuples from two relations if followed by the appropriate SELECT operation.

Example: Combine each DEPARTMENT tuple with the EMPLOYEE tuple of the manager.DEP_EMP DEPARTMENT X EMPLOYEEDEPT_MANAGER σMGRSSN=SSN(DEP_EMP)

Page 36: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

36

Product

Yields all possible pairs from two tables

Page 37: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

37

Page 38: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

38

Join

Information from two or more tables is combined

Page 39: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

39

JOIN OperationsTHETA JOIN: Similar to a CARTESIAN PRODUCT followed by a SELECT. The condition c is called a join condition.R(A1, A2, ..., Am, B1, B2, ..., Bn)

R1(A1, A2, ..., Am) c R2 (B1, B2, ..., Bn) c is of the form: (Ai Bj) AND ... AND (Ah Bk); 1<i,h<m, 1<j,k<n

is one of the following operators: =,>,<,,,

EQUIJOIN: The join condition c includes one or more equality comparisons involving attributes from R1 and R2. That is, c is of the form:(Ai = Bj) AND ... AND (Ah = Bk); 1<i,h<m, 1<j,k<n

In the above EQUIJOIN operation:Ai, ..., Ah are called the join attributes of R1Bj, ..., Bk are called the join attributes of R2

Example of using EQUIJOIN:Retrieve each DEPARTMENT's name and its manager's name:T DEPARTMENT MGRSSN=SSN EMPLOYEERESULT DNAME,FNAME,LNAME(T)

Page 40: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

40

NATURAL JOIN (*)

In an EQUIJOIN R R1 c R2, the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN, the redundant join attributes of R2 are eliminated from R. The equality condition is implied and need not be specified.R R1 *(join attributes of R1),(join attributes of R2) R2

Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she works for:T EMPLOYEE *(DNO),(DNUMBER) DEPARTMENTRESULT FNAME,LNAME,DNAME(T)

If the join attributes have the same names in both relations, they need not be specified and we can write R R1 * R2.

Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR:SUPERVISOR(SUPERSSN,SFN,SLN) SSN,FNAME,LNAME(EMPLOYEE)TEMPLOYEE * SUPERVISORRESULT FNAME,LNAME,SFN,SLN(T)

Page 41: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

41

Page 42: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

42

Note: In the original definition of NATURAL JOIN, the join attributes were required to have the same names in both relations.There can be a more than one set of join attributes with a different meaning between the same two relations. For example:JOIN ATTRIBUTES RELATIONSHIPEMPLOYEE.SSN= EMPLOYEE manages

DEPARTMENT.MGRSSN the DEPARTMENTEMPLOYEE.DNO= EMPLOYEE works for DEPARTMENT.DNUMBER the DEPARTMENT

Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she works for:TEMPLOYEE DNO=DNUMBERDEPARTMENTRESULT FNAME,LNAME,DNAME(T)

Page 43: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

43

A relation can have a set of join attributes to join it with itself :JOIN ATTRIBUTES RELATIONSHIPEMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises

EMPLOYEE(2).SSN EMPLOYEE(1)One can think of this as joining two distinct copies of the relation, although only one relation actually existsIn this case, renaming can be useful

Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR:

SUPERVISOR(SSSN,SFN,SLN) SSN,FNAME,LNAME(EMPLOYEE)TEMPLOYEE SUPERSSN=SSSNSUPERVISORRESULT FNAME,LNAME,SFN,SLN(T)

Page 44: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

44

Complete Set of Relational Algebra Operations

All the operations discussed so far can be described as a sequence of only the operations SELECT, PROJECT, UNION, SET DIFFERENCE, and CARTESIAN PRODUCT. Hence, the set {σ, , U, - , X } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete.For database applications, additional operations are needed that were not part of the original relational algebra. These include:

Aggregate functions and grouping.OUTER JOIN and OUTER UNION.

Page 45: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

45

Additional Relational OperationsAGGREGATE FUNCTIONS

Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in database applications<grouping attributes> F <function list> (R)The grouping attributes are optionalExample 1: Retrieve the average salary of all employees (no grouping needed):R(AVGSAL) F AVERAGE SALARY (EMPLOYEE)

Example 2: For each department, retrieve the department number, the number of employees, and the average salary (in the department):R(DNO,NUMEMPS,AVGSAL) DNO, F COUNT SSN, AVERAGE SALARY (EMPLOYEE)DNO is called the grouping attribute in the above example

Page 46: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

46

Page 47: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

47

OUTER JOIN

In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the resultSome queries require all tuples in R1 (or R2 or both) to appear in the resultWhen no matching tuples are found, nulls are placed for the missing attributesLEFT OUTER JOIN: R1 X R2 lets every tuple in R1 appear in the resultRIGHT OUTER JOIN: R1 X R2 lets every tuple in R2 appear in the resultFULL OUTER JOIN: R1 X R2 lets every tuple in R1 or R2 appear in the result

Page 48: 1 CSBP430 – Database Systems Chapter 7 - The Relational Data Model Elarbi Badidi College of Information Technology United Arab Emirates University

48