Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL...
Transcript of Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL...
![Page 1: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/1.jpg)
Database Management SystemsCSEP 544
Lecture 3: SQLRelational Algebra, and Datalog
1CSEP 544 - Fall 2017
![Page 2: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/2.jpg)
Announcements
CSEP 544 - Fall 2017 2
• HW2 due tonight (11:59pm)
• PA3 & HW3 released
![Page 3: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/3.jpg)
HW3• We will be using SQL Server in the cloud (Azure)
– Same dataset– More complex queries J
• Logistics– You will receive an email from [email protected]
to join the “Default Directory organization” --- accept it!– You are allocated $100 to use for this quarter– We will use Azure for two HW assignments– Use SQL Server Management Studio to access the DB
• Installed on all CSE lab machines and VDI machines
CSEP 544 - Fall 2017 3
![Page 4: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/4.jpg)
Scythe
CSEP 544 - Fall 2017 4
![Page 5: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/5.jpg)
Plan for Today• Wrap up SQL
• Study two other languages for the relational data model– Relational algebra – Datalog
CSEP 544 - Fall 2017 5
![Page 6: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/6.jpg)
Reading Assignment 2• Normal form
• Compositionality of relations and operators
CSEP 544 - Fall 2017 6
![Page 7: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/7.jpg)
Review• SQL
– Selection– Projection– Join– Ordering– Grouping– Aggregates– Subqueries
• Query Evaluation
CSEP 544 - Fall 2017 7
FWGHOS
![Page 8: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/8.jpg)
Monotone Queries• Definition A query Q is monotone if:
– Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples
pname price cidGizmo 19.99 c001Gadget 999.99 c004Camera 149.99 c003
Product (pname, price, cid)Company (cid, cname, city)
pname price cidGizmo 19.99 c001Gadget 999.99 c004Camera 149.99 c003iPad 499.99 c001
cid cname cityc002 Sunworks Bonnc001 DB Inc. Lyonc003 Builder Lodtz
Product Companypname cityGizmo LyonCamera Lodtz
pname cityGizmo LyonCamera LodtziPad Lyon
Product Company
Q
Qcid cname cityc002 Sunworks Bonnc001 DB Inc. Lyonc003 Builder Lodtz
![Page 9: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/9.jpg)
SQL Idioms
9CSEP 544 - Fall 2017
![Page 10: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/10.jpg)
Including Empty Groups
• In the result of a group by query, there is one row per group in the result
CSEP 544 - Fall 2017 10
SELECT x.manufacturer, count(*)FROM Product x, Purchase yWHERE x.pname = y.productGROUP BY x.manufacturer
Count(*) is never 0
FWGHOS
![Page 11: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/11.jpg)
Including Empty Groups
CSEP 544 - Fall 2017 11
SELECT x.manufacturer, count(y.pid)FROM Product x LEFT OUTER JOIN Purchase yON x.pname = y.productGROUP BY x.manufacturer
Count(pid) is 0when all pid’s in
the group areNULL
![Page 12: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/12.jpg)
GROUP BY vs. Nested Queries
CSEP 544 - Fall 2017 12
SELECT product, Sum(quantity) AS TotalSalesFROM PurchaseWHERE price > 1GROUP BY product
SELECT DISTINCT x.product, (SELECT Sum(y.quantity)FROM Purchase yWHERE x.product = y.productAND y.price > 1)
AS TotalSalesFROM Purchase xWHERE x.price > 1
Why twice ?
Purchase(pid, product, quantity, price)
![Page 13: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/13.jpg)
More Unnesting
CSEP 544 - Fall 2017 13
Author(login,name)Wrote(login,url)
Find authors who wrote ≥ 10 documents:
![Page 14: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/14.jpg)
More Unnesting
CSEP 544 - Fall 2017 14
SELECT DISTINCT Author.nameFROM AuthorWHERE (SELECT count(Wrote.url)
FROM WroteWHERE Author.login=Wrote.login)
>= 10
This isSQL bya novice
Attempt 1: with nested queries
Author(login,name)Wrote(login,url)
Find authors who wrote ≥ 10 documents:
![Page 15: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/15.jpg)
More Unnesting
CSEP 544 - Fall 2017 15
Attempt 1: with nested queries
Author(login,name)Wrote(login,url)
Find authors who wrote ≥ 10 documents:
SELECT Author.nameFROM Author, WroteWHERE Author.login=Wrote.loginGROUP BY Author.nameHAVING count(wrote.url) >= 10
This isSQL by
an expert
Attempt 2: using GROUP BY and HAVING
![Page 16: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/16.jpg)
Finding Witnesses
CSEP 544 - Fall 2017 16
Product (pname, price, cid)Company (cid, cname, city)
For each city, find the most expensive product made in that city
![Page 17: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/17.jpg)
Finding Witnesses
CSEP 544 - Fall 2017 17
SELECT x.city, max(y.price)FROM Company x, Product yWHERE x.cid = y.cidGROUP BY x.city;
Finding the maximum price is easy…
But we need the witnesses, i.e., the products with max price
For each city, find the most expensive product made in that city
Product (pname, price, cid)Company (cid, cname, city)
![Page 18: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/18.jpg)
Finding Witnesses
CSEP 544 - Fall 2017 18
To find the witnesses, compute the maximum pricein a subquery
SELECT DISTINCT u.city, v.pname, v.priceFROM Company u, Product v,
(SELECT x.city, max(y.price) as maxpriceFROM Company x, Product yWHERE x.cid = y.cidGROUP BY x.city) w
WHERE u.cid = v.cidand u.city = w.cityand v.price = w.maxprice;
Product (pname, price, cid)Company (cid, cname, city)
![Page 19: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/19.jpg)
Finding Witnesses
CSEP 544 - Fall 2017 19
Or we can use a subquery in where clause
SELECT u.city, v.pname, v.priceFROM Company u, Product vWHERE u.cid = v.cid
and v.price >= ALL (SELECT y.priceFROM Company x, Product y WHERE u.city=x.cityand x.cid=y.cid);
Product (pname, price, cid)Company (cid, cname, city)
![Page 20: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/20.jpg)
Finding Witnesses
CSEP 544 - Fall 2017 20
There is a more concise solution here:
SELECT u.city, v.pname, v.priceFROM Company u, Product v, Company x, Product yWHERE u.cid = v.cid and u.city = x.cityand x.cid = y.cidGROUP BY u.city, v.pname, v.priceHAVING v.price = max(y.price)
Product (pname, price, cid)Company (cid, cname, city)
![Page 21: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/21.jpg)
SQL: Our first language for the relational model
• Projections• Selections• Joins (inner and outer)• Inserts, updates, and deletes• Aggregates• Grouping• Ordering• Nested queries
CSEP 544 - Fall 2017 21
![Page 22: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/22.jpg)
Relational Algebra
22CSEP 544 - Fall 2017
![Page 23: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/23.jpg)
23
Class overview• Data models
– Relational: SQL, RA, and Datalog– NoSQL: SQL++
• RDMBS internals– Query processing and optimization– Physical design
• Parallel query processing– Spark and Hadoop
• Conceptual design– E/R diagrams– Schema normalization
• Transactions– Locking and schedules– Writing DB applications
CSEP 544 - Fall 2017
Data models
UsingDBMS
Query Processing
![Page 24: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/24.jpg)
Next: Relational Algebra
• Our second language for the relational model– Developed before SQL– Simpler syntax than SQL
CSEP 544 - Fall 2017 24
![Page 25: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/25.jpg)
Why bother with another language?• Used extensively by
DBMS implementations– As we will see in 2 weeks
• RA influences the design SQL
CSEP 544 - Fall 2017 25
![Page 26: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/26.jpg)
Relational Algebra
• In SQL we say what we want• In RA we can express how to get it• Set-at-a-time algebra, which manipulates
relations• Every RDBMS implementations converts a
SQL query to RA in order to execute it
• An RA expression is also called a query plan
CSEP 544 - Fall 2017 26
![Page 27: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/27.jpg)
Basics
CSEP 544 - Fall 2017 27
• Relations and attributes• Functions that are applied to relations
– Return relations– Can be composed together– Often displayed using a tree rather than linearly– Use Greek symbols: σ, π, δ, etc
![Page 28: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/28.jpg)
Sets v.s. Bags
• Sets: {a,b,c}, {a,d,e,f}, { }, . . .• Bags: {a, a, b, c}, {b, b, b, b, b}, . . .
Relational Algebra has two flavors:• Set semantics = standard Relational Algebra• Bag semantics = extended Relational Algebra
DB systems implement bag semantics (Why?)CSEP 544 - Fall 2017 28
![Page 29: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/29.jpg)
Relational Algebra Operators• Union ∪, intersection ∩, difference -• Selection σ• Projection π• Cartesian product X, join ⨝• (Rename ρ)• Duplicate elimination δ• Grouping and aggregation ɣ• Sorting 𝛕
CSEP 544 - Fall 2017 29
RA
Extended RA
All operators take in 1 or more relations as inputs and return another relation
![Page 30: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/30.jpg)
Union and Difference
CSEP 544 - Fall 2017 30
What do they mean over bags ?
R1 ∪ R2R1 – R2
Only make sense if R1, R2 have the same schema
![Page 31: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/31.jpg)
What about Intersection ?
• Derived operator using minus
• Derived using join
CSEP 544 - Fall 2017 31
R1 ∩ R2 = R1 – (R1 – R2)
R1 ∩ R2 = R1 ⨝ R2
![Page 32: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/32.jpg)
Selection• Returns all tuples which satisfy a condition
• Examples– σSalary > 40000 (Employee)– σname = “Smith” (Employee)
• The condition c can be =, <, <=, >, >=, <>combined with AND, OR, NOT
CSEP 544 - Fall 2017 32
σc(R)
![Page 33: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/33.jpg)
σSalary > 40000 (Employee)
SSN Name Salary1234545 John 200005423341 Smith 600004352342 Fred 50000
SSN Name Salary5423341 Smith 600004352342 Fred 50000
Employee
CSEP 544 - Fall 2017 33
![Page 34: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/34.jpg)
Projection• Eliminates columns
• Example: project social-security number and names:– πSSN, Name (Employee) à Answer(SSN, Name)
CSEP 544 - Fall 2017 34
π A1,…,An(R)
Different semantics over sets or bags! Why?
![Page 35: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/35.jpg)
π Name,Salary (Employee)
SSN Name Salary1234545 John 200005423341 John 600004352342 John 20000
Name SalaryJohn 20000John 60000John 20000
Employee
Name SalaryJohn 20000John 60000
Bag semantics Set semantics
CSEP 544 - Fall 2017 35Which is more efficient?
![Page 36: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/36.jpg)
Functional Composition of RA Operators
CSEP 544 - Fall 2017 36
no name zip disease1 p1 98125 flu2 p2 98125 heart3 p3 98120 lung4 p4 98120 heart
Patient
σdisease=‘heart’(Patient)no name zip disease2 p2 98125 heart4 p4 98120 heart
zip disease98125 flu98125 heart98120 lung98120 heart
πzip,disease(Patient)
πzip,disease(σdisease=‘heart’(Patient))zip disease98125 heart98120 heart
![Page 37: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/37.jpg)
Cartesian Product
• Each tuple in R1 with each tuple in R2
• Rare in practice; mainly used to express joins
CSEP 544 - Fall 2017 37
R1 × R2
![Page 38: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/38.jpg)
Name SSNJohn 999999999Tony 777777777
EmployeeEmpSSN DepName999999999 Emily777777777 Joe
Dependent
Employee X DependentName SSN EmpSSN DepNameJohn 999999999 999999999 EmilyJohn 999999999 777777777 JoeTony 777777777 999999999 EmilyTony 777777777 777777777 Joe
Cross-Product Example
CSEP 544 - Fall 2017 38
![Page 39: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/39.jpg)
Renaming
• Changes the schema, not the instance
• Example: – Given Employee(Name, SSN)– ρN, S(Employee) à Answer(N, S)
CSEP 544 - Fall 2017 39
ρB1,…,Bn (R)
![Page 40: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/40.jpg)
Natural Join
• Meaning: R1⨝R2 = πA(σθ (R1 × R2))
• Where:– Selection σθ checks equality of all common
attributes (i.e., attributes with same names)– Projection πA eliminates duplicate common
attributesCSEP 544 - Fall 2017 40
R1 ⨝R2
![Page 41: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/41.jpg)
Natural Join Example
CSEP 544 - Fall 2017 41
A BX YX ZY ZZ V
B CZ UV WZ V
A B CX Z UX Z VY Z UY Z VZ V W
R S
R ⨝ S =πABC(σR.B=S.B(R × S))
![Page 42: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/42.jpg)
Natural Join Example 2
CSEP 544 - Fall 2017 42
age zip disease54 98125 heart20 98120 flu
AnonPatient P Voters V
P V
name age zipAlice 54 98125Bob 20 98120
age zip disease name
54 98125 heart Alice
20 98120 flu Bob
![Page 43: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/43.jpg)
Natural Join
• Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⨝ S ?
• Given R(A, B, C), S(D, E), what is R ⨝ S?
• Given R(A, B), S(A, B), what is R ⨝ S?
CSEP 544 - Fall 2017 43
![Page 44: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/44.jpg)
Theta Join
• A join that involves a predicate
• Here θ can be any condition• No projection in this case!• For our voters/patients example:
44
R1 ⨝θ R2 = σθ (R1 X R2)
P ⨝ P.zip = V.zip and P.age >= V.age -1 and P.age <= V.age +1 V
AnonPatient (age, zip, disease)Voters (name, age, zip)
![Page 45: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/45.jpg)
Equijoin• A theta join where θ is an equality predicate
• By far the most used variant of join in practice• What is the relationship with natural join?
CSEP 544 - Fall 2017 45
R1 ⨝θ R2 = σθ (R1 × R2)
![Page 46: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/46.jpg)
Equijoin Example
CSEP 544 - Fall 2017 46
age zip disease54 98125 heart20 98120 flu
AnonPatient P Voters V
P P.age=V.age V
name age zipp1 54 98125p2 20 98120
P.age P.zip P.disease V.name V.age V.zip
54 98125 heart p1 54 98125
20 98120 flu p2 20 98120
![Page 47: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/47.jpg)
Join Summary• Theta-join: R ⨝θ S = σθ (R × S)
– Join of R and S with a join condition θ– Cross-product followed by selection θ– No projection
• Equijoin: R ⨝θ S = σθ (R × S)– Join condition θ consists only of equalities– No projection
• Natural join: R ⨝ S = πA (σθ (R × S))– Equality on all fields with same name in R and in S– Projection πA drops all redundant attributes
CSEP 544 - Fall 2017 47
![Page 48: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/48.jpg)
So Which Join Is It ?
When we write R ⨝ S we usually mean an equijoin, but we often omit the equality predicate when it is clear from the context
CSEP 544 - Fall 2017 48
![Page 49: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/49.jpg)
More Joins
• Outer join– Include tuples with no matches in the output– Use NULL values for missing attributes– Does not eliminate duplicate columns
• Variants– Left outer join– Right outer join– Full outer join
CSEP 544 - Fall 2017 49
![Page 50: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/50.jpg)
Outer Join Example
CSEP 544 - Fall 2017 50
age zip disease54 98125 heart20 98120 flu33 98120 lung
AnonPatient P
P ⋊ J
P.age P.zip P.disease J.job J.age J.zip
54 98125 heart lawyer 54 98125
20 98120 flu cashier 20 98120
33 98120 lung null null null
AnnonJob Jjob age ziplawyer 54 98125cashier 20 98120
![Page 51: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/51.jpg)
Some ExamplesSupplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,qty,price)
Name of supplier of parts with size greater than 10πsname(Supplier ⨝ Supply ⨝ (σpsize>10 (Part))
Name of supplier of red parts or parts with size greater than 10πsname(Supplier ⨝ Supply ⨝ (σ psize>10 (Part) ∪ σpcolor=‘red’ (Part) ) )πsname(Supplier ⨝ Supply ⨝ (σ psize>10 ∨ pcolor=‘red’ (Part) ) )
Can be represented as trees as wellCSEP 544 - Fall 2017 51
![Page 52: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/52.jpg)
Representing RA Queries as TreesSupplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,qty,price)
πsname(Supplier ⨝ Supply ⨝ (σpsize>10 (Part))
CSEP 544 - Fall 2017 52
Part
Supplyσpsize>10
πsname
Answer
Supplier
![Page 53: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/53.jpg)
Relational Algebra Operators• Union ∪, intersection ∩, difference -• Selection σ• Projection π• Cartesian product X, join ⨝• (Rename ρ)• Duplicate elimination δ• Grouping and aggregation ɣ• Sorting 𝛕
CSEP 544 - Fall 2017 53
RA
Extended RA
All operators take in 1 or more relations as inputs and return another relation
![Page 54: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/54.jpg)
Extended RA: Operators on Bags
• Duplicate elimination δ• Grouping γ
– Takes in relation and a list of grouping operations (e.g., aggregates). Returns a new relation.
• Sorting τ– Takes in a relation, a list of attributes to sort on,
and an order. Returns a new relation.
CSEP 544 - Fall 2017 54
![Page 55: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/55.jpg)
Using Extended RA Operators
CSEP 544 - Fall 2017 55
SELECT city, sum(quantity)FROM salesGROUP BY cityHAVING count(*) > 100
T1, T2 = temporary tables sales(product, city, quantity)
γ city, sum(quantity)→q, count(*) → c
σ c > 100
π city, q
Answer
T1(city,q,c)
T2(city,q,c)
![Page 56: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/56.jpg)
Typical Plan for a Query (1/2)
CSEP 544 - Fall 2017 56
R S
join condition
σselection condition
πfields
join condition
…
SELECT-PROJECT-JOINQuery
AnswerSELECT fieldsFROM R, S, …WHERE condition
![Page 57: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/57.jpg)
Typical Plan for a Query (1/2)
57
πfields
ɣfields, sum/count/min/max(fields)
σhaving condition
σwhere condition
join condition
… …
SELECT fieldsFROM R, S, …WHERE conditionGROUP BY fieldsHAVING condition
![Page 58: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/58.jpg)
How about Subqueries?
CSEP 544 - Fall 2017 58
Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,price)
SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’
and not exists(SELECT *FROM Supply PWHERE P.sno = Q.sno
and P.price > 100)
![Page 59: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/59.jpg)
SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’
and not exists(SELECT *FROM Supply PWHERE P.sno = Q.sno
and P.price > 100)
How about Subqueries?
CSEP 544 - Fall 2017 59
Correlation !
Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,price)
![Page 60: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/60.jpg)
SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’
and not exists(SELECT *FROM Supply PWHERE P.sno = Q.sno
and P.price > 100)
How about Subqueries?
CSEP 544 - Fall 2017 60
De-Correlation
SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’
and Q.sno not in(SELECT P.snoFROM Supply PWHERE P.price > 100)
Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,price)
![Page 61: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/61.jpg)
SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’
and Q.sno not in(SELECT P.snoFROM Supply PWHERE P.price > 100)
How about Subqueries?
CSEP 544 - Fall 2017 61
(SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’)
EXCEPT(SELECT P.sno
FROM Supply PWHERE P.price > 100)
EXCEPT = set difference
Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,price)
Un-nesting
![Page 62: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/62.jpg)
(SELECT Q.snoFROM Supplier QWHERE Q.sstate = ‘WA’)
EXCEPT(SELECT P.sno
FROM Supply PWHERE P.price > 100)
How about Subqueries?
CSEP 544 - Fall 2017 62
Supply
σsstate=‘WA’
Supplier
σPrice > 100
−Finally…
πsnoπsno
Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supply(sno,pno,price)
![Page 63: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/63.jpg)
Summary of RA and SQL
• SQL = a declarative language where we say what data we want to retrieve
• RA = an algebra where we say how we want to retrieve the data
• Both implements the relational data model• Theorem: SQL and RA can express
exactly the same class of queriesRDBMS translate SQL à RA, then optimize RA
![Page 64: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/64.jpg)
Summary of RA and SQL
• SQL (and RA) cannot express ALL queries that we could write in, say, Java
• Example:– Parent(p,c): find all descendants of ‘Alice’– No RA query can compute this!– This is called a recursive query
• Next: Datalog is an extension that can compute recursive queries
CSEP 544 - Fall 2017 64
![Page 65: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/65.jpg)
Summary of RA and SQL
• Translating from SQL to RA gives us a way to evaluate the input query
• Transforming one RA plan to another forms the basis of query optimization
• Will see more in 2 weeks
![Page 66: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/66.jpg)
Datalog
66CSEP 544 - Fall 2017
![Page 67: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/67.jpg)
What is Datalog?• Another declarative query language for
relational model– Designed in the 80’s– Minimal syntax– Simple, concise, elegant– Extends relational queries with recursion
• Today:– Adopted by some companies for data analytics,
e.g., LogicBlox (HW4)– Usage beyond databases: e.g., network protocols,
static program analysis 67
![Page 68: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/68.jpg)
SQL Query vs Datalog(which would you rather write?)
(any Java fans out there?)
Manager(eid) :- Manages(_, eid)
DirectReports(eid, 0) :-Employee(eid),not Manager(eid)
DirectReports(eid, level+1) :-DirectReports(mid, level),Manages(mid, eid)
![Page 69: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/69.jpg)
HW4: Preview
CSEP 544 - Fall 2017 69
![Page 70: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/70.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 70
Facts = tuples in the database Rules = queries
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
Schema
![Page 71: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/71.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 71
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 72: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/72.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 72
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 73: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/73.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 73
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Find Movies made in 1940
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 74: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/74.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 74
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 75: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/75.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 75
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
Find Actors who acted in Movies made in 1940
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 76: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/76.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 76
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
Q3(f,l) :- Actor(z,f,l), Casts(z,x1), Movie(x1,y1,1910),Casts(z,x2), Movie(x2,y2,1940)
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 77: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/77.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 77
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
Q3(f,l) :- Actor(z,f,l), Casts(z,x1), Movie(x1,y1,1910),Casts(z,x2), Movie(x2,y2,1940)
Find Actors who acted in a Movie in 1940 and in one in 1910
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 78: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/78.jpg)
Datalog: Facts and Rules
CSEP 544 - Fall 2017 78
Actor(344759,‘Douglas’, ‘Fowley’).Casts(344759, 29851).Casts(355713, 29000).Movie(7909, ‘A Night in Armour’, 1910).Movie(29000, ‘Arizona’, 1940).Movie(29445, ‘Ave Maria’, 1940).
Facts = tuples in the database Rules = queries
Q1(y) :- Movie(x,y,z), z=‘1940’.
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
Q3(f,l) :- Actor(z,f,l), Casts(z,x1), Movie(x1,y1,1910),Casts(z,x2), Movie(x2,y2,1940)
Extensional Database Predicates = EDB = Actor, Casts, MovieIntensional Database Predicates = IDB = Q1, Q2, Q3
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 79: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/79.jpg)
Datalog: Terminology
CSEP 544 - Fall 2017 79
Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y,’1940’).
bodyhead
atom atom atom (aka subgoal)
f, l = head variablesx,y,z = existential variables
In this class we discuss datalog evaluated under set semantics
![Page 80: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/80.jpg)
More Datalog Terminology
• Ri(argsi) is called an atom, or a relational predicate• Ri(argsi) evaluates to true when relation Ri contains
the tuple described by argsi.– Example: Actor(344759, ‘Douglas’, ‘Fowley’) is true
• In addition to relational predicates, we can also have arithmetic predicates– Example: z > ‘1940’.
• Note: Logicblox uses <- instead of :-80
Q(args) :- R1(args), R2(args), .... Your book uses:Q(args) :- R1(args) AND R2(args) AND ....
Q(args) <- R1(args), R2(args), ....
![Page 81: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/81.jpg)
Semantics of a Single Rule• Meaning of a datalog rule = a logical statement !
CSEP 544 - Fall 2017 81
Q1(y) :- Movie(x,y,z), z=‘1940’.
• For all values of x, y, z:if (x,y,z) is in the Movies relation, and that z = ‘1940’ then y is in Q1 (i.e., it is part of the answer)
• Logically equivalent:∀ y. [(∃x.∃ z. Movie(x,y,z) and z=‘1940’) ⇒ Q1(y)]
• That's why head variables are called "existential variables”
• We want the smallest set Q1 with this property (why?)
Actor(id, fname, lname)Casts(pid, mid)Movie(id, name, year)
![Page 82: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/82.jpg)
Datalog program
• A datalog program consists of several rules
• Importantly, rules may be recursive!• Usually there is one distinguished
predicate that’s the output• We will show an example first, then give
the general semantics.
CSEP 544 - Fall 2017 82
![Page 83: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/83.jpg)
Example
1
2
4
3
R encodes a graph
1 22 12 3
1 4
3 44 5
R=
5
![Page 84: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/84.jpg)
Example
1
2
4
3
R encodes a graph
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
What doesit compute?
5
![Page 85: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/85.jpg)
Example
1
2
4
3
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
Initially: T is empty.
5R encodes a graph
What doesit compute?
![Page 86: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/86.jpg)
Example
1
2
4
3
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
Initially: T is empty.
1 2
2 1
2 3
1 4
3 4
4 5
First iteration:T =
5R encodes a graph
What doesit compute?
Second rulegenerates nothing(because T is empty)
First rule generates this
![Page 87: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/87.jpg)
Example
1
2
4
3
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
Initially: T is empty.
1 2
2 1
2 3
1 4
3 4
4 5
Second iteration:T = 1 2
2 1
2 3
1 4
3 4
4 5
1 1
2 2
1 3
2 4
1 5
3 5
First iteration:T =
5R encodes a graph
What doesit compute?
New facts
First rule generates this
Second rule generates this
![Page 88: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/88.jpg)
Example
1
2
4
3
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
Initially: T is empty.
1 2
2 1
2 3
1 4
3 4
4 5
Second iteration:T = 1 2
2 1
2 3
1 4
3 4
4 5
1 1
2 2
1 3
2 4
1 5
3 5
First iteration:T =
5
Third iteration:T =
1 2
2 1
2 3
1 4
3 4
4 5
1 1
2 2
1 3
2 4
1 5
3 5
2 5
R encodes a graphWhat doesit compute?
New fact
First rule
Secondrule
Both rules
![Page 89: Database Management Systems CSEP 544 · Database Management Systems CSEP 544 Lecture 3: SQL Relational Algebra, and Datalog ... count(*) FROM Product x, Purchase y WHERE x.pname =](https://reader036.fdocuments.net/reader036/viewer/2022062306/5edf99e1ad6a402d666aef46/html5/thumbnails/89.jpg)
Example
1
2
4
3
1 22 12 3
1 4
3 44 5
R=
T(x,y) :- R(x,y)T(x,y) :- R(x,z), T(z,y)
Initially: T is empty.
1 2
2 1
2 3
1 4
3 4
4 5
Second iteration:T = 1 2
2 1
2 3
1 4
3 4
4 5
1 1
2 2
1 3
2 4
1 5
3 5
First iteration:T =
5
Third iteration:T =
1 2
2 1
2 3
1 4
3 4
4 5
1 1
2 2
1 3
2 4
1 5
3 5
2 5
R encodes a graphWhat doesit compute?
Nonewfacts.DONE
FourthiterationT =(same)
This is called the fixpoint semanticsof a datalog program