RELATIONAL ALGEBRA SECTION 5 An Introduction. Introduction E.F. Codd in 1970 A revolutionary advance...
-
Upload
austen-gordon -
Category
Documents
-
view
226 -
download
8
Transcript of RELATIONAL ALGEBRA SECTION 5 An Introduction. Introduction E.F. Codd in 1970 A revolutionary advance...
RELATIONAL ALGEBRA
SECTION 5
An Introduction
Introduction
• E.F. Codd in 1970
• A revolutionary advance in data manipulation
• Very significant. Why?
• Manipulation of data solely on their logical characteristics
• Codd proposed two data manipulation languages
• Change in terminology emphasized logical meaning
– procedural
• Relational calculus– nonprocedural
• Relational algebra
Relational Algebra
• Manipulation of relations
• Creation of new relations
• Consists of nine operations:– union, intersection, difference, product, select, project, join, divide, and assignment
Union ()
• Consider the two following relations:
– Representations of sales person
– SALES_SUBORD subordinate salespeople– SALES_MGR sales managers
• Combine data from two relations
SALES_SUBORD
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
37 Elena Hermana 12 B. A. 13
39 Goro Azuma 44 Tokyo 10
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
SALES_MGR
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
If A = SALES_SUBORD and B = SALES_MGR
What is A B?
A B
SALESPERSON
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
37 Elena Hermana 12 B. A. 13
39 Goro Azuma 44 Tokyo 10
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
• Union compatible
• SALESPERSON:= SALES_SUBORD SALES_MGR
Intersection ()
• Using the original two union-compatible tables
• Allows the identification of rows that are common to two relations
If A = SALES_SUBORD and B = SALES_MGR
What is A B?
A B
SALES_SUBORD_MGR
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
• If C is the intersection of A and B, C := A B
• What does C consist of?
Difference (–)
• Again using the original two-union compatible tables
• SALES_MGR_MGR := SALES_ MGR – SALES_ SUBORD
• Creates the set difference of two union-compatible relations
If A = SALES_SUBORD and B = SALES_MGR
What is A - B?
A B
SALES_MGR_MGR
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
27 Terry Cardon Chicago 15
• If C is the difference between A and B, C := A – B
• Then?
• Consider SALES_SUBORD – SALES_ MGR
• What does this give?
• Note:
A – B B – A
A B
Product ()
• Creates the Cartesian product of two relations
C := A B
X Y W Z
10 22 33 54
10 22 37 98
10 22 42 100
11 25 33 54
11 25 37 98
11 25 42 100
A
X Y
10 22
11 25
B
W Z
33 54
37 98
42 100
• Consider the following example of a PRODUCT table and a SALE table
• Want PS := PRODUCT SALE
• Note:– Adjoins the attributes of the two relations
– Attaches to each row in A, each of the rows in B
PRODUCT
PROD ID PROD DESC MANUFACTR ID
COST PRICE
1035 Sweater 210 12.50 20.00
2241 Table Lamp 317 22.50 32.50
2249 Table Lamp 317 35.50 48.00
2518 Brass Sculpture 253 11.00. 22.00
SALE
DATE CUST ID SALPERS ID PROD ID QTY
02/28 100 10 2241 200
02/12 101 23 2518 300
02/15 101 23 1035 150
02/19 100 39 2518 200
02/02 101 23 1035 200
02/05 105 10 2241 100
02/22 110 37 2518 150
02/14 105 10 2249 50
02/01 101 23 2249 75
02/04 101 23 2241 250
• Any problems?
• Any obvious application for a product operation?
• How many columns and rows would PS have?
PROD ID
PROD DESC MANUFACTR ID
COST PRICE
1035 Sweater 210 12.50 20.00
1035 Sweater 210 12.50 20.00
1035 Sweater 210 12.50 20.00
2241 Table Lamp 317 22.50 32.50
2241 Table Lamp 317 22.50 32.50
2241 Table Lamp 317 22.50 32.50
DATE CUST ID SALPERS ID
PROD ID
QTY
02/28 100 10 2241 200
02/12 101 23 2518 300
02/15 101 23 1035 150
02/28 100 10 2241 200
02/12 101 23 2518 300
02/15 101 23 1035 150
Select
• Query:– Give all information for salespeople in the Tokyo office.
• SALES_TOKYO := SELECT (SALESPERSON: OFFICE = ‘TOKYO’)
• Creates a relation from another relation by selecting only those rows that satisfy a given condition
SALES_TOKYO
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
14 Masaji Matsu 44 Tokyo 11
39 Goro Azuma 44 Tokyo 10
44 Albert Ige 27 Tokyo 12
• Essentially an ‘IF’ statement
• Other examples of a SELECT– SALPERS_ID = 23– SALPERS_NAME = ‘Brigit Bovary’– MANAGER_ID >= 20– OFFICE not = ‘B.A.’– COMM_% < 11
• Query:– Which salesperson has ID 23?
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
23 Francois Moire 35 Brussels 9
Solution: SELECT (SALESPERSON: SALPERS_ID = 23)
• Query:– Give all information about salesperson Brigit Bovary
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
35 Brigit Bovary 27 Brussels 11
Solution: SELECT (SALESPERSON: SALPERS_NAME = ‘Brigit Bovary’)
• Query:– Who are the sales people working for managers having an ID greater
than 20?
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
39 Goro Azuma 44 Tokyo 10
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
Solution: SELECT (SALESPERSON:MANAGER_ID > 20)
• Query:– Give information on all salespeople except those in the Buenos Aries
office.
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
39 Goro Azuma 44 Tokyo 10
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
Solution: SELECT (SALESPERSON:OFFICE not = ‘B.A.’)
• Query:– Who are the salespeople in Tokyo getting more thn 10% commission?
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
14 Masaji Matsu 44 Tokyo 11
44 Albert Ige 27 Tokyo 12
Solution: SELECT (SALESPERSON:OFFICE = ‘Tokyo’ and COMM_% > 10)
• Query:– Who is reporting to manager 27 or getting over 10% commission?
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
37 Elena Hermana 12 B. A. 13
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
Solution: SELECT (SALESPERSON:MANAGER_ID = 27 or COMM_% > 10)
Project
• A projection
• Used to eliminate unwanted columns
SALES_TOKYO
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
14 Masaji Matsu 44 Tokyo 11
39 Goro Azuma 44 Tokyo 10
44 Albert Ige 27 Tokyo 12
• Who are the salespeople in the Tokyo office?
Solution: SALES_TOKYO [SALPERS_NAME]
SALPERS NAME
Masaji Matsu
Goro Azuma
Albert Ige
• You can project any number of columns
E.g. SALES_TOKYO [SALESPERS_ID, SALPERS_NAME, MANAGER_ID]
SALPERS ID
SALPERS NAME MANAGER ID
14 Masaji Matsu 44
39 Goro Azuma 44
44 Albert Ige 27
Another Example: SALESPERSON [COMM_%]
COMM %
10
11
9
13
15
12
• What is noticeable in this new relation?
• Query:– Which salespeople are getting less than 11% commission?
SALPERS NAME
Rodney Jones
Francois Moire
Goro Azuma
Buster Sanchez
Solution: SELECT (SALESPERSON:COMM% < 11) [SALPERS_NAME]
Join
• The most important function in any database language
• Natural join– Connects relations when common columns have equal values
• Used to connect data across a number of relations
CUSTOMER
CUST ID
CUST NAME ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
An example:
Logical connections between SALESPERSON, PRODUCT, and CUSTOMER
•Query:–What are the names of the customers who have made purchases from salesperson 10?
SALE
DATE CUST ID SALPERS ID PROD ID QTY
02/28 100 10 2241 200
02/12 101 23 2518 300
02/15 101 23 1035 150
02/19 100 39 2518 200
02/02 101 23 1035 200
02/05 105 10 2241 100
02/22 110 37 2518 150
02/14 105 10 2249 50
02/01 101 23 2249 75
02/04 101 23 2241 250
•The SALE relation
Solution:1. Select from SALES those sales belonging to salesperson 10
SALE _SP10
DATE CUST ID SALPERS ID PROD ID QTY
02/28 100 10 2241 200
02/05 105 10 2241 100
02/14 105 10 2249 50
2. Join SALE_SP10 and CUSTOMER
Syntax JOIN (SALE_SP10, CUSTOMER)
• The operation proceeds as follows
1. The product of SALE_SP10 and CUSTOMER is created
SALE _SP10
DATE CUST ID SALPERS ID PROD ID QTY
02/28 100 10 2241 200
02/05 105 10 2241 100
02/14 105 10 2249 50
CUSTOMER
CUST ID
CUST NAME ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
• Want SALE_SP10 CUSTOMER
DATE CUST ID
SALPERS ID
PROD ID
QTY
02/28 100 10 2241 200
02/28 100 10 2241 200
02/28 100 10 2241 200
02/28 100 10 2241 200
02/05 105 10 2241 100
02/05 105 10 2241 100
02/05 105 10 2241 100
02/05 105 10 2241 100
02/14 105 10 2249 50
02/14 105 10 2249 50
02/14 105 10 2249 50
02/14 105 10 2249 50
CUST ID
CUST NAME
ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
The product of SALE_SP10 and CUSTOMER
DATE CUST ID
SALPERS ID
PROD ID
QTY
02/28 100 10 2241 200
02/28 100 10 2241 200
02/28 100 10 2241 200
02/28 100 10 2241 200
02/05 105 10 2241 100
02/05 105 10 2241 100
02/05 105 10 2241 100
02/05 105 10 2241 100
02/14 105 10 2249 50
02/14 105 10 2249 50
02/14 105 10 2249 50
02/14 105 10 2249 50
CUST ID
CUST NAME
ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
101 Maltzl Salzburg Austria 75,314 77,200
105 Jefferson Box 918, Chicago USA 49,333 57,811
110 Gomez Santiago Chile 27,400 35,414
Notice that the customer id’s do not match. Thus these are invalid rows and must be deleted. This leads to step 2.
2. Eliminate all rows except those in which CUST_ID from SALE_SP10 is equal to CUST_ID from CUSTOMER
DATE CUST ID
SALPERS ID
PROD ID
QTY
02/28 100 10 2241 200
02/05 105 10 2241 100
02/14 105 10 2249 50
CUST ID
CUST NAME
ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
100 Watabe Bros Box 241, Tokyo Japan 45,551 52,113
105 Jefferson Box 918, Chicago USA 49,333 57,811
105 Jefferson Box 918, Chicago USA 49,333 57,811
3. Eliminate one of the CUST_ID columns from this new relation
DATE CUST ID
SALPERS ID
PROD ID
QTY
02/28 100 10 2241 200
02/05 105 10 2241 100
02/14 105 10 2249 50
CUST NAME
ADDRESS COUNTRY BEGINNING
BALANCE
CURRENT
BALANCE
Watabe Bros Box 241, Tokyo Japan 45,551 52,113
Jefferson Box 918, Chicago USA 49,333 57,811
Jefferson Box 918, Chicago USA 49,333 57,811
1. Take the product of A and B. The resulting relation will have two columns for each of C1 ….. Cn
2. Eliminate all rows from the product except those on which the values of the columns C1 ….. Cn in A are equal, respectively, to the values of those in columns in B.
3. Eliminate duplicate columns.
4. Project out one copy of the columns C1 ….. Cn
• General definition of a natural join of two relations A and B which have columns C1 ….. Cn in common.
•Query:–What is the name of the customer involved in each sale?
Solution: A := CUSTOMER [CUST_ID, CUST_NAME]
B := JOIN (SALE, A)
B
DATE SALPERS ID PROD ID QTY CUST ID CUST NAME
02/28 10 2241 200 100 Watabe Bros
02/12 23 2518 300 101 Maltzl
02/15 23 1035 150 101 Maltzl
02/19 39 2518 200 100 Watabe Bros
02/02 23 1035 200 101 Maltzl
02/05 10 2241 100 105 Jefferson
02/22 37 2518 150 110 Gomez
02/14 10 2249 50 105 Jefferson
02/01 23 2249 75 101 Maltzl
02/04 23 2241 250 101 Maltzl
•Query:
Gives the names of customers who have purchased product 2518.
Solution: A := SELECT (SALE: PROD_ID = 2518)
B := JOIN (A, CUSTOMER) [CUST_NAME]
B
CUST NAME
Maltzl
Watabe Bros
Gomez
•Query:
Who has bought table lamps?
Solution:
A := SELECT (PRODUCT: PROD_DESC = ‘Table Lamp’)
B := JOIN (A, SALE)
C := JOIN (B, CUSTOMER) [CUST_NAME]
C
CUST NAME
Watabe Bros
Jefferson
Maltzl
MANUFACTURER
MANUFACTR ID MANUFACTR NAME
ADDRESS COUNTRY
210 Kiwi Klothes Auckland New Zealand
253 Brass Works Lagos Nigeria
317 Llama Llamps Lima Peru
• Add another table: The manufacturer of the products
•Query:
Which salespeople have sold products manufactured in Peru?
Solution:
A := SELECT (MANUFACTURER: COUNTRY = ‘Peru’)
B := JOIN (A, PRODUCT)
C := JOIN (B, SALE)
D := JOIN (C, SALESPERSON) [SALPERS_NAME]
D
SALPERS NAME
Rodney Jones
Francois Moire
Theta Join
– Query: Identify salespeople whose manager gets a commission rate exceeding 11%.
• Consider the following:
SALESPERSON
SALPERS ID
SALPERS NAME MANAGER ID OFFICE COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
37 Elena Hermana 12 B. A. 13
39 Goro Azuma 44 Tokyo 10
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
• Cannot be solved by a simple select operation
• Have to join a relation to itself. How?
• Have to match SALPERS_ID to MANAGER_ID
• Create two copies of SALESPERSON– SP1 := SALESPERSON– SP2 := SALESPERSON
• Then:A := JOIN (SP1, SP2: SP1.MANAGER_ID = SP2.SALESPERS_ID)
• Have a new version of a JOIN– The Theta Join
• The result
SP1.
SALPERS ID
SP1.SALPERS NAME
SP1.
MANAGER ID
SP1.
OFFICE
SP1.
COMM %
10 Rodney Jones 27 Chicago 10
14 Masaji Matsu 44 Tokyo 11
23 Francois Moire 35 Brussels 9
37 Elena Hermana 12 B. A. 13
39 Goro Azuma 44 Tokyo 10
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
SP2.
SALPERS ID
SP2.SALPERS NAME
SP2.
MANAGER ID
SP2.
OFFICE
SP2.
COMM %
27 Terry Cardon Chicago 15
44 Albert Ige 27 Tokyo 12
35 Brigit Bovary 27 Brussels 11
12 Buster Sanchez 27 B. A. 10
44 Albert Ige 27 Tokyo 12
27 Terry Cardon Chicago 15
27 Terry Cardon Chicago 15
27 Terry Cardon Chicago 15
A
• Whose manager gets a commission rate exceeding 11%?B := SELECT (A: SP2.COMM_% > 11) [SP1.SALPERS_NAME]
• Not yet finished
B
SP1.SALPERS NAME
Rodney Jones
Masaji Matsu
Goro Azuma
Albert Ige
Brigit Bovary
Buster Sanchez
• The comparison operators:=, not =, <, >, <=, >=
• General form:– JOIN (A, B: X δ Y)
• The Theta Join is a join with a specified condition involving a column from each relation
Divide
• Query:– List salespeople who have sold every product
• Thus for each product there must be at least one row in SALE containing the SALE_ID of that salesperson
• Selects rows in one relation that match every row in another relation
• Different requirement here
• Have to look at the whole relation
• Key word is every
• Using the example relations
• Obtain a relation consisting of all product ids– PI := PRODUCT [PROD_ID]
• Need instances of salesperson and product– PISI := SALE [PROD_ID, SALPERS_ID]
• Result: Two projections
PI
PROD ID
1035
2241
2249
2518
PISI
SALPERS
ID
PROD ID
10 2241
23 2518
23 1035
39 2518
37 2518
10 2249
23 2249
23 2241
• A := PISI / PI
• What do we want from these relations now?
A
SALPERS
ID
23
• General Description of a Divide
• Assume A, B, and C are relations where B/C = A
1. Columns of C must be a subset of B
2. The columns of A are all and only those columns of B that are not columns of C
Assignment
• Have been using it all the time
A := SELECT (SALESPERSON: COMM_% > 11)
• An operation that gives a name to a relation