Relational Model and Algebra - Duke University
Transcript of Relational Model and Algebra - Duke University
Relational Model and Algebra
Introduction to DatabasesCompSci 316 Spring 2020
Announcements (Tue. Jan. 14)โข You should be on Piazza and Gradescope
โข Otherwise, let the instructor know after class
โข HW1 has been posted, due next Tuesday 11:59 pmโข Instant feedback, multiple submissions allowed until correct!โข 5% / hour late submission penalty โข Use pgweb from course website to try your queries on small Beers
datasetโข If you join the class after Tuesday 01/14, let the instructor know
โข Office hours posted on course websiteโข There is at least one everyday except Saturday
2
Todayโs plan
โข Revisit relational modelโข Revisit simple SQL queries and its semanticโข Start relational algebra
3
The famous โBeersโ database4
BarEach has an address
DrinkerEach has an address
BeerEach has a brewer
Drinker Frequents BarsโXโ times a week
Bar Serves BeerAt price โYโ
Drinker Likes Beer
Your database for HW1
โBeersโ as a Relational Database5See online database for more tuples
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Name brewer
Budweiser Anheuser-Busch Inc.
Corona Grupo Modelo
Dixie Dixie Brewing
name address
Amy 100 W. Main Street
Ben 101 W. Main Street
Dan 300 N. Duke Street
name address
The Edge108 Morris Street
Satisfaction905 W. Main Street
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
drinker beer
Amy Corona
Dan Budweiser
Dan Corona
Ben Budweiser
Bar
Beer
Drinker
Likes
Frequents
Serves
What is an example of a
โข Relationโข Attributeโข Tupleโข Schemaโข Instance
What is
โข Set semantic โข in relational model
โข Bag semanticโข In SQL (why)
โข Set semantic โข No duplicates, Order of tuples does not matter
โข Bag semanticโข Duplicates allowed, for efficiency and flexibilityโข Do not want duplicates? Use SELECT DISTINCT โฆ
Basic queries: SFW statement
โข SELECT ๐ด", ๐ด#, โฆ, ๐ด$FROM ๐ ", ๐ #, โฆ, ๐ &WHERE ๐๐๐๐๐๐ก๐๐๐
โข SELECT, FROM, WHERE are often referred to as SELECT, FROM, WHERE โclausesโ
โข Each query must have a SELECT and a FROM
โข WHERE is optional
6
In HW1, you can only use SFW
Example: reading a table
โข SELECT * FROM Serves
โข Single-table queryโข * is a shorthand for โall columnsโ
7
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves
Example: ORDER BY
โข SELECT * FROM ServesORDER BY beer
โข Equivalent to โORDER BY beer ascโ (asc is default option)โข For descending order, use โdescโโข Can combine multiple ordersโข What does this return?โข ORDER BY beer asc, price desc
8
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves
Example: some columns and DISTINCT
โข SELECT beer FROM Serves
โข Only want unique values? Use DISTINCT
โข SELECT DISTINCT beer FROM Serves
9
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves
Returns a set
Returns a bag
Example: selecting few rowsโข SELECT beer AS mybeer
FROM ServesWHERE price < 2.75
โข SELECT S.beerFROM Serves SWHERE bar = โThe Edgeโ
10
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves
โข SELECT list can contain expressionsCan also use built-in functions such as SUBSTR, ABS, etc.
โข NOT EQUAL TO: Use <>โข LIKE matches a string against a pattern
% matches any sequence of zero or more characters
What does these return?
Example: Joinโข Find addresses of all bars that โDanโ frequents
โข SELECT B.addressFROM Bar B, Frequents FWHERE B.name = F.bar
AND F.drinker = โDanโ
11
name address
The Edge 108 Morris Street
Satisfaction 905 W. Main Street
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Bar
Frequents
Semantics of SFWโข SELECT ๐ธ", ๐ธ#, โฆ, ๐ธ$
FROM ๐ ", ๐ #, โฆ, ๐ &WHERE ๐๐๐๐๐๐ก๐๐๐โข For each ๐ก" in ๐ ":
For each ๐ก# in ๐ #: โฆ โฆFor each ๐ก& in ๐ &:
If ๐๐๐๐๐๐ก๐๐๐ is true over ๐ก", ๐ก#, โฆ, ๐ก&:
Compute and output ๐ธ", ๐ธ#, โฆ, ๐ธ$ as a row
12
1. Apply โFROMโForm โcross-productโ of R1, .., Rm
2. Apply โWHEREโOnly consider satisfying rows
3. Apply โSELECTโOutput the desired columns
Step 1: Illustration of Semantics of SFW
โข NOTE: This is โNOT HOWโ the DBMS outputs the result, but โWHATโ it outputs!
13
โข SELECT B.addressFROM Bar B, Frequents F
WHERE B.name = F.barAND F.drinker = โDanโ
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street
Ben Satisfaction 2
The Edge 108 Morris Street
Dan The Edge 1
The Edge 108 Morris Street
Dan Satisfaction 2
Satisfaction 905 W. Main Street
Ben Satisfaction 2
Satisfaction 905 W. Main Street
Dan The Edge 1
Satisfaction 905 W. Main Street
Dan Satisfaction 2
Form a โCross productโ of two relations
Step 2: Illustration of Semantics of SFW
โข NOTE: This is โNOT HOWโ the DBMS outputs the result, but โWHATโ it outputs!
14
โข SELECT B.addressFROM Bar B, Frequents F
WHERE B.name = F.barAND F.drinker = โDanโ
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street
Ben Satisfaction 2
The Edge 108 Morris Street
Dan The Edge 1
The Edge 108 Morris Street
Dan Satisfaction 2
Satisfaction 905 W. Main Street
Ben Satisfaction 2
Satisfaction 905 W. Main Street
Dan The Edge 1
Satisfaction 905 W. Main Street
Dan Satisfaction 2
Discard rows that do not satisfy WHERE condition
Step 3: Illustration of Semantics of SFW
โข NOTE: This is โNOT HOWโ the DBMS outputs the result, but โWHATโ it outputs!
15
โข SELECT B.addressFROM Bar B, Frequents F
WHERE B.name = F.barAND F.drinker = โDanโ
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street
Ben Satisfaction 2
The Edge 108 Morris Street
Dan The Edge 1
The Edge 108 Morris Street
Dan Satisfaction 2
Satisfaction 905 W. Main Street
Ben Satisfaction 2
Satisfaction 905 W. Main Street
Dan The Edge 1
Satisfaction 905 W. Main Street
Dan Satisfaction 2
Output the โaddressโ output of rows that survived
Final output: Illustration of Semantics of SFWโข NOTE: This is โNOT HOWโ the DBMS outputs the result, but โWHATโ it
outputs!
16
โข SELECT B.addressFROM Bar B, Frequents F
WHERE B.name = F.barAND F.drinker = โDanโ
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
address
108 Morris Street
905 W. Main Street
Output the โaddressโ output of rows that survived
SQL vs. C++, Java, Pythonโฆ17
SQL vs. C++, Java, Pythonโฆ
SQL is declarativeโข Programmer specifies what answers a query should return, โข but not how the query is executedโข DBMS picks the best execution strategy based on
availability of indexes, data/workload characteristics, etc.โข Not a โProceduralโ or โOperationalโ language like C++,
Java, Pythonโข There are several ways to write a query, but equivalent
queries always provide the same (equivalent) resultsโข SQL (+ its execution and optimizations) is based on a strong
foundation of โRelational Algebraโ
18
Relational algebra
A language for querying relational data based on โoperatorsโ
19
RelOp
RelOp
โข Core operators:โข Selection, projection, cross product, union, difference,
and renaming
โข Additional, derived operators:โข Join, natural join, intersection, etc.
โข Compose operators to make complex queries
Selectionโข Input: a table ๐ โข Notation: ๐/๐
โข ๐ is called a selection condition (or predicate)
โข Purpose: filter rows according to some criteriaโข Output: same columns as ๐ , but only rows of ๐ that satisfy ๐
(set!)
20
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Servesbar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
๐๐๐๐๐๐7๐.๐๐ Serves
Example: Find beers with price < 2.75
Equivalent SQL query?No actual deletion!
More on selectionโข Selection condition can include any column of ๐ , constants,
comparison (=, โค, etc.) and Boolean connectives (โง: and, โจ: or, ยฌ: not)โข Example: Serves tuples for โThe Edgeโ or price >= 2.75
๐ABCDEFGH IJKHEโจ/CLMH N #.OP ๐๐๐๐ฃ๐๐ โข You must be able to evaluate the condition over each single
row of the input table!โข Example: the most expensive beer at any bar
๐/CLMH N HVHCW /CLMH L$ XHCVHY ๐๐ ๐๐
21
WRONG!
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves
Projection
โข Input: a table ๐ โข Notation: ๐\๐ โข ๐ฟ is a list of columns in ๐
โข Purpose: output chosen columnsโข Output: same rows, but only the columns in ๐ฟ (๐ ๐๐ก!)
22
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
Serves ๐ ๐๐๐๐,๐๐๐๐๐ Serves
Example: Find all the prices for each beer
beer price
Budweiser 2.50
Corona 3.00
Budweiser 2.25
Output of ๐AHHCServes?
Cross productโข Input: two tables ๐ and ๐โข Natation: ๐ ร๐โข Purpose: pairs rows from two tablesโข Output: for each row ๐ in ๐ and each ๐ in ๐, output a row ๐๐
(concatenation of ๐ and ๐ )
23
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street
Ben Satisfaction 2
The Edge 108 Morris Street
DanThe Edge
1
The Edge 108 Morris Street
DanSatisfaction
2
Satisfaction 905 W. Main Street
Ben Satisfaction 2
Satisfaction 905 W. Main Street
DanThe Edge
1
Satisfaction 905 W. Main Street
Dan Satisfaction 2
Bar x Frequents
Note: orderingof columns does not matter, so R X S = S X R(commutative)
Derived operator: join(A.k.a. โtheta-joinโ: most general joins)โข Input: two tables ๐ and ๐โข Notation: ๐ โ/ ๐
โข ๐ is called a join condition (or predicate)
โข Purpose: relate rows from two tables according to some criteria
โข Output: for each row ๐ in ๐ and each row ๐ in ๐, output a row ๐๐ if ๐ and ๐ satisfy ๐
โข Shorthand for ๐/ ๐ ร๐
Predicate p only has equality (A = 5 โง B = 7) : equijoin
24
One of the most importantoperations!
Join example
โข Extend Frequents relation with addresses of the bars๐น๐๐๐๐ข๐๐๐ก๐ โABCD$B&H ๐ต๐๐
25
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street
Ben Satisfaction 2
The Edge 108 Morris Street
Dan The Edge 1
The Edge 108 Morris Street
Dan Satisfaction 2
Satisfaction 905 W. Main Street
Ben Satisfaction 2
Satisfaction 905 W. Main Street
Dan The Edge 1
Satisfaction 905 W. Main Street
Dan Satisfaction 2
Ambiguous attribute?Use Bar.name
(Lecture 3 -- contd)
26
Announcements (Thu. Jan. 16)โข Reminder: HW1 due next Tuesday 01/21โข HW2 on RA to be posted next Tuesday 01/21, due on 01/28
โข HW2-Q1 on gradiance already open if you want to start earlyโข Check gradiance code on Sakai announcements
โข In-class lab on RA next Tuesday 01/21โข Part of HW2 (~ 2 questions) in class to get the set up ready with TAs helpโข Last 30-40 mins of Tuesdayโs lectureโข You can work in groups of size 2 or 3, but would submit your own solutionโข You can submit by the next day -- 10% extra credit for finishing all questions
correctly in class (last timestamp <= 4:20 pm)!
โข In-class quiz next Thursday 01/23โข You can work in groups of size 2 or 3, but would submit your own solutionโข 50% for attempt, 50% for correct answerโข What if you miss a class? We would drop 25% (ceiling) of the lowest grades while
calculating your final score for quiz, i.e. if we have 4 quizzes 1 dropped, 5-8 quizzes 2 dropped, โฆ
โข Quiz or Lab -- you can submit while not being in the class too, but you would miss the fun of discussing with others (+ help from TAs for Labs)!
27
All on course website schedule
Join Types
โข Theta Join
โข Equi-Join
โข Natural Join
โข Later, (left/right) outer join, semi-join
28
Derived operator: natural join
โข Input: two tables ๐ and ๐โข Notation: ๐ โ ๐ (i.e. no subscript)โข Purpose: relate rows from two tables, andโข Enforce equality between identically named columnsโข Eliminate one copy of identically named columns
โข Shorthand for ๐\ ๐ โ/ ๐ , whereโข ๐ equates each pair of columns common to ๐ and ๐โข ๐ฟ is the union of column names from ๐ and ๐ (with
duplicate columns removed)
29
Natural join example30
Serves โ ๐ฟ๐๐๐๐ = ๐? ๐๐๐๐ฃ๐๐ โ? ๐ฟ๐๐๐๐ = ๐ABC,AHHC,/CLMH,JCL$mHC ๐๐๐๐ฃ๐๐ โXHCVHY.AHHCD
\LmHY.AHHC๐ฟ๐๐๐๐
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
drinker beer
Amy Corona
Dan Budweiser
Dan Corona
Ben Budweiser
LikesServes
bar beer price drinker
The Edge Budweiser 2.50 Dan
The Edge Budweiser 2.50 Ben
The Edge Corona 3.00 Amy
The Edge Corona 3.00 Dan
... โฆ. โฆ..
Natural Join is on beer
Only one column for beer in the output
What happens if the tables have two or more common columns?
Serves โ ๐ฟ๐๐๐๐
Union
โข Input: two tables ๐ and ๐โข Notation: ๐ โช ๐โข ๐ and ๐ must have identical schema
โข Output:โข Has the same schema as ๐ and ๐โข Contains all rows in ๐ and all rows in ๐ (with duplicate
rows removed)
31
Example on board
Important for set operations:
Union Compatibility
Difference
โข Input: two tables ๐ and ๐โข Notation: ๐ โ ๐โข ๐ and ๐ must have identical schema
โข Output:โข Has the same schema as ๐ and ๐โข Contains all rows in ๐ that are not in ๐
32
Example on board
Important for set operations:
Union Compatibility
Derived operator: intersection
โข Input: two tables ๐ and ๐โข Notation: ๐ โฉ ๐โข ๐ and ๐ must have identical schema
โข Output:โข Has the same schema as ๐ and ๐โข Contains all rows that are in both ๐ and ๐
โข How can you write it using other operators?
โข Shorthand forโข Also equivalent to ๐ โ ๐ โ ๐ โข And to ๐ โ ๐
33
๐ โ ๐ โ ๐
Important for set operations:
Union Compatibility
Expression tree notation34
๐ต๐๐ ๐น๐๐๐๐ข๐๐๐ก๐
โJCL$mHCD$B&H
๐BJJCHYY
Also called logicalPlan tree
โข Find addresses of all bars that โDanโ frequents
name address
The Edge108 Morris Street
Satisfaction 905 W. Main Street
Bar
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequents
๐ BJJCHYY๐ต๐๐ โJCL$mHCD
$B&H(๐JCL$mHCDEqB$r๐น๐๐๐๐ข๐๐๐ก๐ )
Equivalent to
๐JCL$mHCDEqB$r
What if you move ๐ to the top?Still correct?
More or less efficient?
Using the same relation multiple times35
drinker bar times_a_week
Ben Satisfaction 2
Dan The Edge 1
Dan Satisfaction 2
Frequentsโข Find drinkers who frequent both โThe Edgeโ and โSatisfactionโ
๐JCL$mHC ๐น๐๐๐๐ข๐๐๐ก๐ โ ABCDEFGH IJKHEโงABCDEXBsLYtBMsLu$EโงJCL$mHCDJCL$mHC
๐น๐๐๐๐ข๐๐๐ก๐ WRONG!
๐vLJw
๐ J",A",s" ๐น๐๐๐๐ข๐๐๐ก๐ โA"DEFGH IJKHEโงA#DEXBsLYtBMsLu$EโงJ"DJ#
๐ J#,A#,s# ๐น๐๐๐๐ข๐๐๐ก๐
Rename!
Renamingโข Input: a table ๐ โข Notation: ๐X ๐ , ๐ yw,yz,โฆ ๐ , or ๐X yw,yz,โฆ ๐ โข Purpose: โrenameโ a table and/or its columnsโข Output: a table with the same rows as ๐ , but called
differentlyโข Used to
โข Avoid confusion caused by identical column namesโข Create identical column names for natural joins
โข As with all other relational operators, it doesnโt modify the databaseโข Think of the renamed table as a copy of the original
36
Summary of core operators
โข Selection: ๐/๐ โข Projection: ๐\๐ โข Cross product: ๐ ร๐โข Union: ๐ โช ๐โข Difference: ๐ โ ๐โข Renaming: ๐X yw,yz,โฆ ๐ โข Does not really add โprocessingโ power
37
Summary of derived operators
โข Join: ๐ โ/ ๐โข Natural join: ๐ โ ๐โข Intersection: ๐ โฉ ๐
โข Many moreโข Semijoin, anti-semijoin, quotient, โฆ
38
Exercise
โข Bars that drinkers in address โ300 N. Duke Streetโ do not frequent
39Frequents(drinker, bar, times_of_week)Bar(name, address)Drinker(name, address)
Exercise
โข Bars that drinkers in address โ300 N. Duke Streetโ do not frequent
40
Bars that the drinkers at thisaddress frequentAll bars
โ
๐$B&H
๐ต๐๐ ๐น๐๐๐๐ข๐๐๐ก๐
๐ท๐๐๐๐๐๐
โ ๐๐๐๐๐๐๐ = ๐๐๐๐
๐BJJCHYYDโ๏ฟฝ๏ฟฝ๏ฟฝ ๏ฟฝ.qvmH XsCHHs"
๐ABC
Frequents(drinker, bar, times_of_week)Bar(name, address)Drinker(name, address)
๐ABC
A trickier exerciseโข For each bar, find the drinkers who frequent it max no.
times a week
41Frequents(drinker, bar, times_of_week)Bar(name, address)Drinker(name, address)
A trickier exerciseโข For each bar, find the drinkers who frequent it max no.
times a weekโข Who do NOT visit a bar max no. of times?โข Whose times_of_weeks is lower than somebody elseโs for a given
bar
42
โ
๐น๐๐๐๐ข๐๐๐ก๐ ๐น๐๐๐๐ข๐๐๐ก๐
๐๏ฟฝ" ๐๏ฟฝ#
โ๏ฟฝ".sL&HY๏ฟฝut๏ฟฝ๏ฟฝHHm7๏ฟฝ#.sL&HY๏ฟฝut๏ฟฝ๏ฟฝHHmโง๏ฟฝ".ABCD๏ฟฝ#.ABC
๐๏ฟฝ".ABC,๏ฟฝ".JCL$mHC
A deeper question:When (and why) is โโโ needed?
Frequents(drinker, bar, times_of_week)Bar(name, address)Drinker(name, address)
๐ABC,JCL$mHC
๐น๐๐๐๐ข๐๐๐ก๐
Monotone operators
โข If some old output rows may need to be removedโข Then the operator is non-monotone
โข Otherwise the operator is monotoneโข That is, old output rows always remain โcorrectโ when more rows
are added to the input
โข Formally, for a monotone operator ๐๐:๐ โ ๐ r implies ๐๐ ๐ โ ๐๐ ๐ r for any ๐ , ๐ r
43
RelOpAdd more rows
to the input...
What happensto the output?
Which operators are non-monotone?
โข Selection: ๐/๐ โข Projection: ๐\๐ โข Cross product: ๐ ร๐โข Join: ๐ โ/ ๐โข Natural join: ๐ โ ๐โข Union: ๐ โช ๐โข Difference: ๐ โ ๐โข Intersection: ๐ โฉ ๐
44
MonotoneMonotoneMonotoneMonotoneMonotoneMonotoneMonotone w.r.t. ๐ ; non-monotone w.r.t ๐
Monotone
Why is โโโ needed for โhighestโ?
โข Composition of monotone operators produces a monotone queryโข Old output rows remain โcorrectโ when more rows are
added to the input
โข Is the โhighestโ query monotone?โข No!โข Current highest price 3.0โข Add another row with price 3.01โข Old answer is invalidated
FSo it must use difference!
45
Extensions to relational algebra
โข Duplicate handling (โbag algebraโ)โข Grouping and aggregationโข โExtensionโ (or โextended projectionโ) to allow
new column values to be computed
โข (Coming later)
46