Relational Algebra Tim Kaddoura CS157A. Introduction Relational query languages are languages for...

Post on 14-Dec-2015

236 views 2 download

Transcript of Relational Algebra Tim Kaddoura CS157A. Introduction Relational query languages are languages for...

Relational Algebra

Tim Kaddoura

CS157A

Introduction

Relational query languages are languages for describing queries on a relational database

Three variants Three variants– Relational Algebra– Relational Calculus– SQL

Query languages V.S. programming languages– Query languages support easy, efficient access to large data sets.

Some Facts

Introduced by Edgar 'Ted' Coddof IBM Research in 1970.

Concept of mathematical relation as the underlying basis.

The standard database model for most transactional databases today.

What is Algebra?

A language based on operators and a domain of values Operators map values taken from the domain into other

domain values Hence, an expression involving operators and arguments

produces a value in the domain Consider arithmetic operations +, - and * over integers.

– Algebra expressions: 2+3, (46–3)+3, (7*x)+(3*x) Relational algebra:

– Domain: the set of all relations– Expression: referred to as a query

Relational Algebra

Domain: set of relations Basic operators: select, project, union, set

difference, Cartesian product Derived operators: set intersection,

division, join Procedural: Relational expression

specifies query by describing an algorithm for determining the result of an expression

Example Database

STUDENT(Id, Name, Password, Address)FACULTY(Id, Name, DeptId, Password, Address)COURSE(CrsCode, DeptId, CrsName, CreditHours)REQUIRES(CrsCode, PrereqCrsCode, EnforcedSince)CLASS(CrsCode, SectionNo, Semester, Year,Textbook,

ClassTime, Enrollment, MaxEnrollment, ClassroomId, InstructorId)

CLASSROOM(ClassroomId, Seats)TRANSCRIPT(StudId, CrsCode, SectionNo, Semester,

Year, Grade)

Examples Queries

Find all the courses student named ‘John Doe’ has completed. Find all students who are taking a course by Prof. Lee in ‘Fall 2005’. Find all courses taught by a faculty from the ‘CS’ department in ‘Fall

2005’. Find all faculty who taught courses both in ‘Fall’ and ‘Spring’ 2005. Find all faculty who did not teach courses both in ‘Fall’ and ‘Spring’

2005. Find all students with 4.0 GPA. Find all students who completed a course without taking one of its

prerequisites.

Relational Algebra

A relation schema is given by R(A1,…,Ak), the name of the relation and the list of the attributes in the relation

A relation is a set of tuples that are valid instances of its schema

Relational algebra expressions take as input relations and produce as output new relations.

After each operation, the attributes of the participating relations are carried to the new relation. The attributes may be renamed, but their domain remains the same.

Basic Operators

Unary (single relation) operators SELECT ( selection-condition )

PROJECT ( attribute-list )

Binary (two relation) operators UNION ( ) SET DIFFERENCE ( - ) CARTESIAN PRODUCT ( )

Set Operators

Set operators connect two relations by set operations.

However, for a set operation R1 op R2 to be valid, R1 and R2 should be union-compatible, that is R1, R2 have the same number of columns The names of the attributes are the same in both

relations Attributes with the same name in both relations have

the same domain

Selection

SELECTION: selection-condition ( R ) Select from R all tuples that satisfy the selection-

condition The condition only refers to the attributes in R An atomic-selection-condition is of the form

relation-attribute operation constant, or relation-attribute operation relation-attribute

A selection-condition is obtained by boolean combination of atomic selection conditions by means of connectives AND, OR, and NOT.

Selection

Enrollment>MaxEnrollment AND Enrollment>100 (CLASS)

Grade=‘A’ (TRANSCRIPT)

Year=2005 AND (Semester=‘Fall’ OR Semester=‘Spring’) (CLASS)

Name like ‘%Lee’ (FACULTY)

Projection

PROJECT: attribute-list (R) Return from R all tuples, but remove from all tuples

any attribute that is not in the attribute list The attribute list can only refer to attributes in RExamples: ID (FACULY) ID ( Name like ‘%Lee’ FACULTY ) CrsCode, Textbook (COURSE) PrereqCrsCode ( CrsCode=‘CSCI4380’ REQUIRES )

Set Operators

Set operators connect two relations by set operations.

However, for a set operation R1 op R2 to be valid, R1 and R2 should be union-compatible, that is R1, R2 have the same number of columns The names of the attributes are the same in both

relations Attributes with the same name in both relations have

the same domain

Set Operators

Given two relations R1, R2 that are union-compatible, we have that R1 R2 returns the set of tuples that are in R1 or

R2. [UNION]∆ R1 R2 returns the set of tuples that are both in R1

and R2. [INTERSECTION]∆ R1 - R2 returns the set of tuples that are in R1, but

not in R2. [SET DIFFERENCE]∆ Note that set difference is the only negative operator,

i.e. not in R2.

Set Operators

Name (FACULY) Name (STUDENT) Address (FACULY) Address (STUDENT) CrsCode (CLASS) - CrsCode (TRANSCRIPT)

Id (STUDENT) - StudId (TRANSCRIPT) Problem: The two relations in this case are not union

compatible (even though the attribute numbers and domains match, the names do not).

Solution For this?

Rename

Rename all the attributes in the relation Given a relation with schema R(A1,…,An) The expression R[B1,…,Bn] is used to

rename attribute A1 as B1, …, An as Bn. The rename operator does not change the

domain of the attributes. The rename operator does not change the

number of attributes in a relation (that can be done by the projection operation).

Rename

Corrected Expression Id (STUDENT) - ( StudId (TRANSCRIPT))[Id]Example: Temp:= InstructorId (Year=2005 AND Semester=‘Spring’ COURSE)

InstructorId (Year=2005 AND Semester=‘Fall’ COURSE) ‘All faculty who taught courses both in ‘Fall’ and ‘Spring’ 2005.’

Id (FACULTY) - Temp[Id]All faculty who did not teach courses both in ‘Fall’ and

‘Spring’ 2005.’’

Set Difference

Find all students with 4.0 GPA

or find students with all As (and at least one A grade)

or find students who never got anything other than an A

Temp1= StudId (Grade=’A’ (TRANSCRIPT))

Students who got at least one non-A grade

Temp2 = StudId (TRANSCRIPT)

Students who got at least one grade

Result = Temp2 - Temp1

Cartesian Product

Given two sets, A={a1,a2,a3}, B={b1,b2}, the Cartesian

Product (A B) returns the set of tuples

A B={(a1,b1), (a1,b2), (a2,b1), (a2,b2), (a3,b1), (a3,b2)}

The Cartesian product for relations is a generalization of this concept.

Cartesian Product

Given two relations with schema R1(A1,…,An) and R2(B1,….,Bm), and let Temp = R1 R2

R1 R2 returns a relation Temp with schema Temp(A1,…,An,B1,…,Bm) such that For all possible tuples t1 in R1, and t2 in R2, there

exists a tuple t in Temp, such that t is identical to t1 with respect to A1,…,An and t is identical to t2 with respect to B1,…,Bm.

Cartesian Product

Find all the courses student named ‘John Doe’ has completed.

CrsCode (Id=StudId (TRANSCRIPT

(Name=‘John Doe’ STUDENT) ) )

Join

Join is a derived operator, obtained from a Cartesian product followed by a selection condition.

Given R1 join-condition R2 Where join-condition is a boolean combination of

combinations of the form relation-attribute operation relation-attribute where the attributes on the left (and right) hand side come from R1 (and R2).

R1 join-condition R2 = join-condition (R1 R2)

Join

Output the names of all employees that earn more than their managers.

Employee.Name (Employee >< MngrId=Manager.Id AND Employee.Salary >

Manager.Salary Manager)

RESULT (a table with the following attributes):

Employee.Name, Employee.Id, Employee.Salary, Manager.Name, Manager.Id, Manager.Salary

Division

o Given R1(A1,…,An,B1,…,Bm) and R2(B1,…,Bm)

R1/R2 is the “maximal” set of tuples from A1,…,An(R1) such that R1 contains all tuples in (R / S) x S.

o Division is often used for “for-all”queries, e.g. “Find tuples in R that are in a relation with ‘all’ tuples in S.”

o Important to note that all the attributes in the dividing relation R2 must exist in the divided relation R1!

Relational Algebra - Division

R A B C a1 b1 c1 a2 b1 c1 a1 b2 c1 a1 b2 c2 a2 b1 c2 a1 b2 c3 a1 b2 c4 a1 b1 c5

S C c1 c2

V B C

b1 c1

Y B C

b1 c1 b2 c1

Find:

• R / S• R / V• R / Y

Conclusion

Based on the concept of mathematical relation

Building block: a relation comprising of attributes within domains

Tuples + Schema = Relation