Copyright © 2003-2012 Curt Hill The Relational Algebra What operations can be done?

47
Copyright © 2003-2012 Curt Hill The Relational Algebra What operations can be done?

Transcript of Copyright © 2003-2012 Curt Hill The Relational Algebra What operations can be done?

Copyright © 2003-2012 Curt Hill

The Relational Algebra

What operations can be done?

Algebra• There exists a relational algebra• What is an algebra?• What most of us know as Algebra

is actually the Algebra of Real Numbers

• There is also Boolean Algebra among others

Copyright © 2003-2012 Curt Hill

Algebra of Real Numbers• What does this Algebra consist of?• A set of objects to work on

– This is the infinite set of real numbers

• A set of operators– These include , , , – The may be subtraction or negation– Each operator takes one or two real

numbers and computes another real– There are rules on how they must be

applied

Copyright © 2003-2012 Curt Hill

Relational Algebra• Mostly the same• The object that it operates on are

not real numbers but tables– Sets of relations

• It also possesses operators– Each takes one or two tables and

produces another table

• These are considered next

Copyright © 2003-2012 Curt Hill

Relational Operations• Selection• Projection• Cartesian product• Set union• Set intersection• Set difference• First two take one table, the rest two• The above are primitive operations

– There are also composite operations

Copyright © 2003-2012 Curt Hill

Copyright © 2003-2012 Curt Hill

Operations• Relational operations are like

arithmetic operations• They are unary or binary• They take operands and produce a

result• The operands and results are

tables

Copyright © 2003-2012 Curt Hill

Selection• Choose or eliminate tuples based

on a comparison• Unary operation• There is a boolean test which

determines if a row is eliminated or not

• The symbol is the sigma (s)

Copyright © 2003-2012 Curt Hill

Boolean test of selection• The test may include

– Fields compared with constants– Fields compared with other fields in

same record

• The test may be ANDed, ORed or NOTted together

Copyright © 2003-2012 Curt Hill

Selection Example

s cnt > 2(T1)

ID Cnt Src Use Frq

A 1 X 18 5

B 5 X 4 5

C 2 W 16 4

E 3 Z 12 9

ID Cnt Src Use Frq

B 5 X 4 5

E 3 Z 12 9

Copyright © 2003-2012 Curt Hill

Projection• Choose or eliminate columns• Unary operation• We may also rearrange the rows

that are left• The symbol is pi (p)

Copyright © 2003-2012 Curt Hill

Projection Example

p Src, Cnt, ID(T1)

ID Cnt Src Use Frq

A 1 X 18 5

B 5 X 4 5

C 2 W 16 4

E 3 Z 12 9Src Cnt ID

X 1 A

X 5 B

W 2 C

Z 3 E

Copyright © 2003-2012 Curt Hill

More Projection Notes

• Usually projection is used to change degree of relation

• A relation is a set– It must have unique entries before a

projection– This may not be so when an attribute

is removed

• Projection may eliminate duplicates– Not all systems actually do this– Why not?

Copyright © 2003-2012 Curt Hill

Second Projection Example

p Src, Cnt, ID(T1)

ID Cnt Src Use Frq

A 1 X 18 5

A 1 X 4 5

C 2 W 16 4

C 2 W 12 9Src Cnt ID

X 1 A

W 2 C

Copyright © 2003-2012 Curt Hill

Cartesian Product• AKA Cross product• Binary operation• Append to each row in first each

row in the second• The number of tuples in the result is

the product of the number of rows in the operands– This could be large and expensive– Seldom done without optimization

Copyright © 2003-2012 Curt Hill

Cartesian Product ExampleT1 T2

ID Cnt

A 1

B 5

ID Cnt F1 F2 F3

A 1 A 6 1

B 5 A 6 1

A 1 X 4 2

B 5 X 4 2

A 1 B 3 9

B 5 B 3 9

F1 F2 F3

A 6 1

X 4 2

B 3 9

Copyright © 2003-2012 Curt Hill

Cartesian Addendum• Only primitive operator that deals

with two different schemas• If the two tables have a common

field, one of the fields must be renamed– A tuple must be a set with unique

field names

Copyright © 2003-2012 Curt Hill

Cartesian Product ExampleT1 T2

ID Cnt

A 1

B 5

ID Cnt ID2 Cnt2 Src

A 1 A 6 1

B 5 A 6 1

A 1 X 4 2

B 5 X 4 2

A 1 B 3 9

B 5 B 3 9

ID Cnt Src

A 6 1

X 4 2

B 3 9

Copyright © 2003-2012 Curt Hill

Set Union• Binary operation• Two relations must be union

compatible– They must have same schema,

that is the same attributes• New relation has all the

tuples of both tables with duplicates removed

Copyright © 2003-2012 Curt Hill

Union Example

T1 T2ID Cnt

A 1

D 8

B 5

ID Cnt

B 5

E 2

A 1

ID Cnt

A 1

D 8

E 2

B 5

Copyright © 2003-2012 Curt Hill

Set Intersection• Binary operation• Two relations must be union

compatible– They must have same schema,

that is the same attributes

• New relation has only the tuples in both tables

Copyright © 2003-2012 Curt Hill

Intersection Example

T1 T2ID Cnt

A 1

D 8

B 5

ID Cnt

B 5

E 2

A 1

ID Cnt

A 1

B 5

Copyright © 2003-2012 Curt Hill

Set Difference• Binary operation• Two relations must be union

compatible– They must have same schema,

that is the same attributes• New relation has only the

tuples in both tables removed from first table

• Not symmetrical or commutative

Copyright © 2003-2012 Curt Hill

Set Difference Example

T1 - T2

ID Cnt

A 1

D 8

B 5

ID Cnt

B 5

E 2

A 1

ID Cnt

D 8

T2 – T1

ID Cnt

E 2

Copyright © 2003-2012 Curt Hill

Relational Algebra• The algebra only uses these

operations– All of our queries translate into these

• Each operation produces a relation– Starts with one or two relations

• The algebra is closed– Maps from the set of relations back to

the set of relations

• There are also composite operations

Copyright © 2003-2012 Curt Hill

Join• Binary composite operation• It is the composite of three

operations– Cartesian product– Selection– Projection (optional)

• Often the only way cartesian products are done– Thus the DBMS may optimize it

Copyright © 2003-2012 Curt Hill

Join• A join always operates on joining

the two tables through a common field (or fields) in each

• Thus we join on one or more fields that are in common between the two tables– The fields must have the same

format, often have same name

Copyright © 2003-2012 Curt Hill

Join Process• Take the product of the two

tables• Use select to eliminate all

records where the two fields are not equal

• Eliminate one of the redundant fields

• Resulting table as the sum of the two tables field minus one

• The number of rows is dependent on data

Copyright © 2003-2012 Curt Hill

Natural Join Example

T1 ID T2

ID Cnt

A 1

B 5

ID Cnt Src Dst

A 1 6 1

A 1 3 2

B 5 3 9

ID Src Dst

A 6 1

X 4 2

A 3 2

B 3 9

Copyright © 2003-2012 Curt Hill

Types of Joins• The relationship between the two

joined fields may be anything– We specify the fields and comparison– Called a Condition Join– Same schema as product

• When comparison is equality the join is called an Equijoin– Project on equijoin to eliminate

redundant column

• If the join is equijoin on all common fields then it is called a Natural Join

Copyright © 2003-2012 Curt Hill

Condition Join Example

T1 T1.Cnt<T2.Cnt T2

ID Cnt

A 5

B 3

ID Cnt ID2 Cnt2 Dst

A 5 A 6 1

B 3 A 6 1

B 3 X 4 2

ID Cnt Dst

A 6 1

X 4 2

B 3 9

Copyright © 2003-2012 Curt Hill

Join Importance

• The cartesian product is only primitive that may take two different relation types

• Cartesian products are usually inside a join

• Usually only one table in a database has a particular schema

• Almost every multiple table queries will use a join

Copyright © 2003-2012 Curt Hill

Division• Division a composite not primitive

operation• Deals with three relations of

different degree– First table degree m+n– Second of degree n– Result of degree m

• Columns in the second table are eliminated from the first

Copyright © 2003-2012 Curt Hill

Division Process• The columns in the second table

correspond to those in the first• If the values in the first table

match any corresponding values in the second the row is copied to result

• The common columns are eliminated in the result

• Duplicates are then eliminated

Copyright © 2003-2012 Curt Hill

Division Example

T1 /T2

ID Src Cnt Dst

A 2 4 X

B 2 4 X

A 3 4 Y

B 3 4 Y

B 3 5 Y

B 2 7 Y

B 3 9 Y

B 2 5 X

B 3 8 X

B 2 9 X

C 3 8 Z

ID Cnt

A 4

B 4

B 5

B 9

Src Dst

2 X

3 Y

Copyright © 2003-2012 Curt Hill

Division Implementation• Do an equijoin on the two tables

common columns• Project away the remaining

common columns

Copyright © 2003-2012 Curt Hill

Algebra and Calculus• The algebra is procedural

– You specify how to do what needs to be done

– Must use the operations

• The calculus is declarative– You say what you want without saying

how to obtain this

• SQL has elements of both• Calculus must be translated into

the algebra before execution

Copyright © 2003-2012 Curt Hill

Algebra Shortcomings• Algebra is a theoretical support for

database implementation• It lacks most of the niceties needed

for an actual implementation– Reports– Formatting– Counts– Averages

• All it does is deliver the data as a table

Copyright © 2003-2012 Curt Hill

Queries using relational algebra

• Consider the college schema tables:– Course– Students– Grade– Faculty– Faculty_teach– Department– Division

Example• Suppose we want to produce a

grade report for students• This should include information

from two or three relations:– Students– Grade– Courses (depending how much

information is needed)

Copyright © 2003-2012 Curt Hill

Copyright © 2003-2012 Curt Hill

Find student grades: Tables

naid name address

2156 Betty Reynoldson 315 4th Ave

dept number naid scoreCS 160 2067 86CIS 385 2156 94

Find student grades: Operations

• Equi join on ID

– Students NAID Grades

• Project away unwanted fields

–p name, dept, course, score

– Include address if it is to be mailed

• Outside of the algebra– Format score as grade– Sort by zip code– Format into pages

Copyright © 2003-2012 Curt Hill

Copyright © 2003-2012 Curt Hill

Find all the courses taught by faculty members

• Equi join on ID

– Faculty NAID Faculty_teach

• Project away unwanted fields

–p name, dept, course

Copyright © 2003-2012 Curt Hill

Find the departmental chairs for each faculty

member• Equi join on ID

– Faculty NAID Departments

• This connects departments and chairs

– p name, dept

• Equi join on ID

– Faculty Dept Temp

• Project out what is not needed• Two more joins needed to include

divisional chairs

Copyright © 2003-2012 Curt Hill

Find all the students who got a B or better in any CS class • Use selection to trim the grades

relation to just CS acronym, ID and score greater than or equal to 80

• Join this with students based on student ID equality

• Eliminate those columns that you do not want

Copyright © 2003-2012 Curt Hill

Find all the students that each faculty member

has• Join faculty with faculty_teach on NAID

• Join this with grades file based on equality with both dept acronym and course number

• Project out what you don’t want

Copyright © 2003-2012 Curt Hill

Find all the students who got an A in Calculus and

an A in CIS 385• Select out all the grades table

leaving only Calc As• Join this with students on NAID

Call this T1• Select out all the grades table

leaving only CIS 385 As• Join this with students on NAID and

call it T2• Intersect T2

Copyright © 2003-2012 Curt Hill

Find this semesters GPA of all Math students

• What is a Math student?– A Math major or– Any student taking any math course

• Is this the average score of math courses?

– If so do a selection on grades table

• Is this the average score of all students taking any math course?

– Select grades to just find Math– Join this with student on NAID– Join this with original grade file– Use report program to sort by NAID

and then compute and summarize the GPA