CS 564 Database Management Systems: Design and Implementation Lecture 5: SQL Chapters 3 and 5 in Cow...
-
Upload
denis-wiggins -
Category
Documents
-
view
222 -
download
1
Transcript of CS 564 Database Management Systems: Design and Implementation Lecture 5: SQL Chapters 3 and 5 in Cow...
1
CS 564Database Management Systems: Design and Implementation
Lecture 5: SQL
Chapters 3 and 5 in Cow Book
Slide ACKs: AnHai Doan, Jeff Naughton, and Jignesh Patel
Arun Kumar
2
Introducing SQL
Structured English QUEry Language (SEQUEL);
TL;DR name is SQL Invented at - you guessed it - IBM!
3
What is SQL?
SQL is “the” querying language for relational data Simple English-based syntax, but precise, formal
semantics (compiled down to relational algebra) Key advantages:
Physical Data Independence (“how” data is
stored on machine independent of “what”, i.e., SQL
queries)
Logical Data Independence (notion of views in
SQL enables simpler queries on same schema)
4
Major SQL Components
Data Definition Language (DDL)
Data Manipulation Language (DML)
Embedded and Dynamic SQL
Cursors and Triggers
Security
Transaction Management
Remote Database Access
5
Data Definition Language (DDL)
Create Table
Drop Table, Alter Table
6
CREATE TABLE
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
Movies
CREATE TABLE Movies (
MovieID INTEGER,
Name CHAR(30),
ReleaseDate DATE,
Director CHAR(20),
PRIMARY KEY (MovieID))
7
Integrity Constraint (IC)
A logical condition (invariant) that must hold true
on any instance of a given database schema
A legal relation instance satisfied all ICs
Overuse/abuse of ICs is a danger!
Part of schema; cannot infer from data exactly!
Two main types:
Key Constraint
Referential Integrity Constraint
8
Key Constraints in SQL
Key vs. Superkey
Primary key vs. Candidate key vs. Alternate key
MovieID Name ReleaseDate Director IMDB_URLMovies
CREATE TABLE Movies (MovieID INTEGER,Name CHAR(30),ReleaseDate DATE,Director CHAR(20),IMDB_URL CHAR(20),PRIMARY KEY (MovieID),UNIQUE (IMDB_URL))
9
Referential Integrity Constraints
CREATE TABLE Ratings( RatingID INTEGER, Numstars REAL, Timestamp DATE, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
Ratings
10
Referential Integrity Constraint (RIC)
A Foreign Key value should not be NULL!
Student
Name AgeSID
Department
Name AddressDID
Major
SID Name Age MajorDID
79 Alice 19 CS
13 Bob 21 NULL
48 John NULL ST
… … … …
Students Foreign Key?DID Name Addre
ss
CS Computer Sciences
1210 …
ST Statistics Blah
… … …
Department
11
What if a Department tuple is
deleted?!
12
Enforcing RIC
We have 3 options:
Refuse to allow the deletion! Delete all tuples in Students that reference the
deleted DID in Department
Set the corresponding DID in Students to some
default value, or in the worst case, NULL
13
Enforcing RIC in SQL
Refuse to allow the deletion!
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE NO ACTION)
14
Enforcing RIC in SQL
Delete all tuples in Students that reference the
deleted DID in Department
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE CASCADE)
15
Enforcing RIC in SQL
Set the corresponding DID in Students to some
default value, or in the worst case, NULL
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE SET DEFAULT)
16
Participation Constraint in SQL
Student
Name AgeSID
Department
Name AddressDID
Major
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER NOT NULL, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE NO ACTION)
17
Data Definition Language (DDL)
Create Table
Drop Table, Alter Table
18
DROP TABLE
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
Movies
DROP TABLE Movies;
Entire relation (both instance and schema) will be removed
forever from the database! Be careful!
19
ALTER TABLE
MovieID Name ReleaseDate Director
… … … …
Movies
ALTER TABLE Movies
ADD COLUMN IMDB_URL CHAR(20);
IMDB_URL
ALTER TABLE Movies
ADD CONSTRAINT UNIQUE(IMDB_URL);
20
Major SQL Components
Data Definition Language (DDL)
Data Manipulation Language (DML)
Embedded and Dynamic SQL
Cursors and Triggers
Security
Transaction Management
Remote Database Access
21
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
22
INSERT
CREATE TABLE Movies (
MovieID INTEGER,
Name CHAR(30),
ReleaseDate DATE,
Director CHAR(20),
PRIMARY KEY (MovieID))
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
MovieID Name ReleaseDate DirectorMovies
INSERT INTO Movies
VALUES ( 20,
“Inception”,
“07/13/2010”,
“Christopher Nolan”
)
23
INSERT
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
MovieID Name ReleaseDate DirectorMovies
On an INSERT, the RDBMS always checks all Integrity
Constraints; if any IC fails, insertion of the tuple fails!
INSERT INTO Movies VALUES
(20, “Her”, “12/18/2013”, “Spike Jonze”);
24
DELETE
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
MovieID Name ReleaseDate DirectorMovies
DELETE FROM Movies
WHERE Name = “Avatar”;
All tuples that match the WHERE constraint/predicate will be
removed forever from the instance! Be careful!
25
UPDATE
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
MovieID Name ReleaseDate DirectorMovies
UPDATE Movies
SET Director = “James Cameron”
WHERE Director = “Jim Cameron”;
All tuples that match the WHERE constraint/predicate in the
instance will be updated! Be careful!
James Cameron
26
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
27
Basic Form of an SQL Query
SELECT [DISTINCT] target-listFROM relation-list[WHERE condition]
List of attributes to projectOptional
List of relations (possibly with “aliases”)
Selection/join condition (optional)
28
What does it mean logically?
SELECT [DISTINCT] target-listFROM relation-list[WHERE condition]
1. Cross-product of relations in relation-list
2. If condition given, apply it to filter out tuples
3. Remove attributes not present in target-list
4. If DISTINCT given, deduplicate tuples in result
The above is only a logical interpretation. It is NOT a “plan” an RDBMS would use in general to run an SQL query!
29
Recall the Netflix Example
RatingID Stars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
UserID Name Age JoinDate
79 Alice 23 01/10/13
80 Bob 41 05/10/13
… … … …
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
… … … …
Ratings
Users
Movies
30
Example SQL Query
SELECT M.NameFROM Movies MWHERE M.Year = 2013
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Name (Year = 2013 (M))Example: Get the names of movies released in 2013
31
Example SQL Query
SELECT M.Name FROM Movies M WHERE M.Year = 2013
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Name
Gravity
Blue Jasmine
32
Example SQL Query
SELECT * FROM Movies M WHERE M.Year = 2013
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
MovieID Name Year Director
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
33
Example SQL Query
SELECT M.NameFROM Movies MWHERE M.Year <> 2013
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Name (Year ≠ 2013 (M))Example: Get the names of movies from years other than 2013
34
Example SQL Query
SELECT M.YearFROM Movies M
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Year(M)Example: For which years do we have movie data?
35
Example SQL Query
SELECT M.Year FROM Movies M
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Year
2010
2009
2013
2013
SQL allows repetitions of tuples in a relation!
Called “bag semantics” vs. RA’s set semantics
36
DISTINCT in SQL
SELECT DISTINCT M.Year FROM Movies M
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Year
2010
2009
2013
DISTINCT needed to achieve set
semantics of RA’s Project in SQL
37
Aliases in SQL
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
SELECT M.Name FROM Movies M WHERE M.Year = 2013
Why bother with the alias? Not needed here!
SELECT Name FROM Movies WHERE Year = 2013
38
Aliases in SQL – Useful for Joins!
Movies (M)MovieID Name Year DirectorID
SELECT M.Name FROM Movies M, Directors DWHERE D.Name = “Jim Cameron”
AND M.DirectorID = D.DIDAliases help disambiguate attributes with the same name from multiple relations (or even a self-join!)
Example: Get names of movies directed by “Jim Cameron”
Directors (D)DID Name Age
39
More SQL Examples
Example: Get names of movies released in 2013 by Woody
Allen or some other director 50 years or older
SELECT M.Name FROM Movies M, Directors DWHERE (D.Name = “Woody Allen” OR
D.Age >= 50) AND M.Year = 2013AND M.DirectorID = D.DID
Movies (M)MovieID Name Year DirectorID
Directors (D)DID Name Age
40
SQL vs. TRC
Example: Get names of movies released in 2013 by Woody Allen or some other director 50 years or older
SELECT M.Name FROM Movies M, Directors DWHERE (D.Name = “Woody Allen” OR D.Age >= 50) AND M.Year = 2013 AND M.DirectorID = D.DID
{t | m M d D ((d.Name = “Woody Allen” d.Age 50)
m.Year = 2013 m.DirectorID = d.DID t.Name = m.Name))}
Movies (M)MovieID Name Year DirectorID
Directors (D)DID Name Age
You can even “LIKE” in SQL!
42
LIKE in SQL
SELECT DISTINCT M.NameFROM Movies MWHERE M.Director LIKE “Blue%”
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Example: Get the directors of movies that start with “Blue”
“%” matches any number of characters; “_” matches one
43
ORDER BY in SQL
SELECT M.Name FROM Movies M WHERE M.Year = 2013
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
Name
Gravity
Blue Jasmine
ORDER BY M.NameName
Blue Jasmine
Gravity
Useful for data readability
Ordering defined by domain semantics
Can specify DESC; multiple attributes
44
LIMIT in SQL
SELECT M.Name FROM Movies M WHERE M.Year >= 2010
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
74 Blue Jasmine 2013 Woody Allen
ORDER BY M.YearYear
Inception
Gravity
Blue Jasmine
Also useful for data readability
Prevents “flooding” of screen with data
Be wary of using it without ORDER BY!
LIMIT 2
45
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
46
UNION in SQL
Get the IDs of users that have rated a movie directed by “Ang Lee” or a movie that released in 2013
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND
M.Director = “Ang Lee”UNIONSELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013
Union-compatible!
47
Semantics of UNION in SQL
MID Name Year Director
20 Inception 2010 Christopher Nolan
42 Life of Pi 2012 Ang Lee
53 Gravity 2013 Alfonso Cuaron
RID Stars UID MID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42R M
SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND
M.Director = “Ang Lee”UNIONSELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013
UID
79
UID
79
123
48
Semantics of UNION in SQL
UNIONUID
79
123
UID
79
UID
79
123
UNION implicitly deduplicates tuples (unlike SELECT)!
Q. How to retain duplicates with UNION?
UNION ALLUID
79
123
UID
79
UID
79
79
123
49
INTERSECT in SQL
MID Name Year Director
20 Inception 2010 Christopher Nolan
42 Life of Pi 2012 Ang Lee
53 Gravity 2013 Alfonso Cuaron
RID Stars UID MID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42R M
SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND
M.Director = “Ang Lee”INTERSECTSELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013
UID
79
UID
79
123 UID
79
INTERSECT
50
EXCEPT (Set Difference) in SQL
MID Name Year Director
20 Inception 2010 Christopher Nolan
42 Life of Pi 2012 Ang Lee
53 Gravity 2013 Alfonso Cuaron
RID Stars UID MID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42R M
SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND
M.Director = “Ang Lee”EXCEPTSELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013
UID
79
UID
79
123 UID
123
EXCEPT
51
The Contentious Bag Semantics!
UID
79
79
79
123
123
80
UID
79
123
123
92
UNION ALL UID
79
79
79
79
123
123
123
123
80
92
Add the number of repetitions
52
The Contentious Bag Semantics!
UID
79
79
79
123
123
80
UID
79
123
123
92
EXCEPT ALL UID
79
79
80Subtract the number
of repetitions
53
The Contentious Bag Semantics!
UID
79
79
79
123
123
80
UID
79
123
123
92
INTERSECT ALL UID
79
123
123Minimum of the
number of repetitions
54
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
55
Nested Queries
SELECT M.NameFROM Movies MWHERE M.MID IN (
A powerful feature of SQL!
A query within a query!
Get the names of movies that have a 5-star rating
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.MIDFROM Ratings RWHERE R.Stars = 5 )
QUERCEPTION
57
Nested Queries
SELECT M.NameFROM Movies MWHERE M.MID NOT IN (
Get the names of movies that do not have a 5-star rating
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.MIDFROM Ratings RWHERE R.Stars = 5
)
Be careful!
Not the same as
IN with R.Stars <> 5
58
Nested Queries with Correlation
SELECT M.NameFROM Movies MWHERE EXISTS (
Returns TRUE if the
nested query’s result
is NOT empty
Get the names of movies that have a 5-star rating
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT *FROM Ratings RWHERE R.Stars = 5
AND R.MID = M.MID )
59
Nested Queries with Correlation
SELECT M.NameFROM Movies MWHERE NOT EXISTS (
Get the names of movies that do not have a 5-star rating
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT *FROM Ratings RWHERE R.Stars = 5
AND R.MID = M.MID )
60
Nested Queries with Correlation
SELECT M.NameFROM Movies MWHERE UNIQUE (
Get the names of movies with at most one 5-star rating
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.StarsFROM Ratings RWHERE R.Stars = 5
AND R.MID = M.MID )
Returns TRUE if the
nested query’s result
has NO duplicates
Empty set
yields TRUE!
61
Nested Queries with Correlation
SELECT M.NameFROM Movies MWHERE NOT UNIQUE (
Get the names of movies that have multiple 5-star ratings
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.StarsFROM Ratings RWHERE R.Stars = 5
AND R.MID = M.MID )
62
Set Comparison Operators in SQL
We saw IN, NOT IN, EXISTS, NOT EXISTS,
UNIQUE, NOT UNIQUE
Also available: op ANY, op ALL
op is an arithmetic/equality comparator(=, <>, >, >=, <, <=)
63
Set Comparison Operators in SQL
SELECT M1.NameFROM Movies M1WHERE M1.Year > ANY (
Get the names of movies that released after some movie directed by Jim Cameron
MID Name Year DirectorMovies (M)
SELECT M2.YearFROM Movies M2WHERE M2.Director =
“Jim Cameron” )
64
Set Comparison Operators in SQL
SELECT M1.NameFROM Movies M1WHERE M1.Year > ALL (
Get the names of movies that released after Jim Cameron stopped making movies
MID Name Year DirectorMovies (M)
SELECT M2.YearFROM Movies M2WHERE M2.Director =
“Jim Cameron” )
65
Rewriting INTERSECT using IN
Get the IDs of users that have rated a movie directed by “Ang Lee” and a movie that released in 2013
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND
M.Director = “Ang Lee”INTERSECTSELECT R.UID FROM Ratings R, Movies M WHERE R.MID = M.MID AND M.Year = 2013
66
Rewriting INTERSECT using IN
Get the IDs of users that have rated a movie directed by “Ang Lee” and a movie that released in 2013
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R2.UID FROM Ratings R2, Movies M2 WHERE R2.MID = M2.MID
AND M2.Year = 2013 )
SELECT R1.UID FROM Ratings R1, Movies M1 WHERE M1.Director = “Ang Lee”
AND R1.MID = M1.MID AND R1.UID IN (Similarly,EXCEPT can be rewritten using NOT IN
67
Division in SQL
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R1.UID FROM Ratings R1WHERE NOT EXISTS (
(SELECT M.MID FROM Movies M)EXCEPT(SELECT R2.MID FROM Ratings R2WHERE R2.UID = R1.UID)
)
List the IDs of users that have rated ALL movies
UID,MID (R) / MID (M)
68
Division without using EXCEPT
RID Stars UID MID
MID Name Year Director
Ratings (R)
Movies (M)
SELECT R1.UID FROM Ratings R1WHERE NOT EXISTS (
(SELECT M.MID FROM Movies MWHERE NOT EXISTS (
(SELECT R2.MID FROM Ratings R2WHERE R2.MID = M.MID
AND R2.UID = R1.UID ))))
List the IDs of users that have rated ALL movies
UID,MID (R) / MID (M)
69
70
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
71
5 Native Aggregate Functions in SQL
COUNT ([DISTINCT] attribute)
AVG ([DISTINCT] attribute)
SUM ([DISTINCT] attribute)
MAX (attribute)
MIN (attribute)
72
Aggregate Functions in SQL
SELECT COUNT(*)FROM Movies MWHERE M.Year > 2010
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
91 Interstellar 2014 Christopher Nolan
How many movies came out after 2010?
73
Aggregate Functions in SQL
SELECT COUNT(*) FROM Movies M WHERE M.Year > 2010
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
91 Interstellar 2014 Christopher Nolan
How many movies came out after 2010?
COUNT(*)
2
74
Aggregate Functions in SQL
SELECT COUNT(DISTINCT M.Director)FROM Movies M
Movies (M)MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
91 Interstellar 2014 Christopher Nolan
For how many directors do we have?
75
Aggregate Functions in SQL
What is the average age of the users?
UserID Name Age JoinDate
79 Alice 23 01/10/13
80 Bob 41 05/10/13
123 Carol 19 08/09/14
420 Dan 20 03/01/15
Users (U)
SELECT AVG(U.Age) FROM Users U
76
Aggregate Functions in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT MAX(R.Stars) AS MaxStars,MIN(R.Stars) AS MinStars
FROM Ratings RWHERE R.MovieID = 42
What are the highest and lowest ratings for MovieID 42?
77
Aggregate Functions in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT MAX(R.Stars) AS MaxStars,MIN(R.Stars) AS MinStarsFROM Ratings R WHERE R.MovieID = 42
MaxStars MinStars
4.5 3.5
78
Aggregate Functions in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT R.MovieID, MAX(R.Stars)FROM Ratings R
Which MovieID(s) have the highest rating?
Other attributes NOT allowed in the target-list as such!
79
Aggregate Functions in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT DISTINCT R.MovieIDFROM Ratings RWHERE R.Stars = (SELECT MAX(R2.Stars)
FROM Ratings R2)
Which MovieID(s) have the highest rating?
80
Group By Aggregate in SQL
(R)
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT R.UserID, COUNT(DISTINCT R.MovieID)
FROM Ratings RGROUP BY R.UserID
How many movies has each user rated?
81
Group By Aggregate in SQL
(R)
SELECT [DISTINCT] target-listFROM relation-list[WHERE condition]GROUP BY grouping-listHAVING group-condition
X
Condition on each groupin aggregate
target-list must be in this form:
X’, Agg(Y)
Subset of X
82
Group By Aggregate in SQL
(R)
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
What is the average rating for each movie?
SELECT R.MovieID, AVG(R.Stars) AS AvgRating
FROM Ratings RGROUP BY R.MovieID
83
Group By Aggregate in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT R.MovieID, AVG(R.Stars)AS AvgRating
FROM Ratings R GROUP BY R.MovieID
MovieID AvgRating
20 4.0
42 4.0
53 2.5
One tuple in output per unique value of R.MovieID (aka “group”)
84
Group By Aggregate in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
8 3.5 79 71
10 2.5 79 50
Ratings (R)
How many times has each user given each rating?
SELECT R.UserID, R.Stars, COUNT(*) AS RateCount
FROM Ratings RGROUP BY R.UserID, R.Stars
85
Group By Aggregate in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
8 3.5 79 71
10 2.5 79 50
R
SELECT R.UserID, R.Stars, COUNT(*) AS RateCount
FROM Ratings RGROUP BY R.UserID, R.Stars
UserID Stars RateCount
79 2.5 1
79 3.5 2
80 4.0 1
86
Group By Aggregate in SQL
RatingID Stars UserID MovieID1 3.5 79 42
2 4.0 80 20
8 3.5 79 71
10 2.5 79 50
Ratings (R)
How many times has each user given each rating above 3?
SELECT R.UserID, R.Stars, COUNT(*) AS RateCount
FROM Ratings RWHERE R.Stars > 3GROUP BY R.UserID, R.Stars
87
Group By Aggregate in SQL
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
8 3.5 79 71
10 2.5 79 50
R
SELECT R.UserID, R.Stars, COUNT(*) AS RateCount
FROM Ratings R WHERE R.Stars > 3GROUP BY R.UserID, R.Stars
UserID Stars RateCount
79 3.5 2
80 4.0 1
88
Group By Aggregate with HAVING
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
Get the average rating by users with at least 2 ratings
SELECT R.UserID, AVG(R.Stars) AS AvgStars
FROM Ratings RGROUP BY R.UserIDHAVING COUNT(*) >= 2
89
Group By Aggregate with HAVING
RatingID Stars UserID MovieID
1 3.5 79 42
2 4.0 80 20
3 2.5 79 53
4 4.5 123 42
Ratings (R)
SELECT R.UserID, AVG(R.Stars) AS AvgRating
FROM Ratings R GROUP BY R.UserIDHAVING COUNT(*) > 2
UserID AvgRating
79 3.0For each “group”, apply the group-condition in HAVING
90
Surprise Review Question!
RatingID Stars UserID MovieIDRatings (R)
Get the number of ratings for each movie directed by
“Christopher Nolan” wherein the average rating is over 4
SELECT R.MovieID, COUNT(R.Stars) AS NumHighRatings
FROM Ratings R, Movies MWHERE M.Director = “Christopher Nolan”
AND R.MovieID = M.MovieIDGROUP BY R.MovieIDHAVING AVG(R.Stars) > 4
Movies (M) MovieID Name Year Director
91
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
92
NULL Values in SQL
Recall that NULL is useful to represent:
- Unknown entries (data collection/entry issue)
- Inapplicable attributes (bad schema design!)
A “headache” for SQL and RDBMS developers!
93
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
2 4.0 80 20
8 NULL 79 71
10 2.5 79 50
Ratings (R)
Get the movies and ratings with over 3 stars
SELECT R.MovieID, R.StarsFROM Ratings RWHERE R.Stars > 3
SQL will “fail” a tuple it is NOT *sure* will satisfiy condition!
94
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
2 4.0 80 20
8 NULL 79 71
10 2.5 79 50
Ratings (R)
SELECT R.MovieID, R.StarsFROM Ratings RWHERE R.Stars > 3
MovieID Stars
42 3.5
20 4.0
95
Wait, what about complex predicates?
96
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
2 4.0 80 20
8 NULL 79 71
10 2.5 79 50
Ratings (R)
Get the movies and ratings over 3 stars or by UserID 79
SELECT R.MovieID, R.StarsFROM Ratings RWHERE R.Stars > 3
OR UserID = 79
97
Binary Logic (the usual)
AND TRUE FALSE
TRUE TRUE FALSE
FALSE FALSE FALSE
OR TRUE FALSE
TRUE TRUE TRUE
FALSE TRUE FALSE
NOT TRUE FALSE
FALSE TRUE
98
This great edifice of scientific history
collapses in the face of NULL!
99
Three-Valued (Ternary) Logic in SQL
OR TRUE FALSE UNKOWN
TRUE TRUE TRUE TRUE
FALSE TRUE FALSE UNKOWN
UNKOWN TRUE UNKOWN UNKOWN
Along with the regular TRUE and FALSE, a new value
“UNKNOWN” in logic to handle predicates on NULL
100
Three-Valued (Ternary) Logic in SQL
AND TRUE FALSE UNKOWN
TRUE TRUE FALSE UNKOWN
FALSE FALSE FALSE FALSE
UNKOWN UNKOWN FALSE UNKOWN
NOT TRUE FALSE UNKOWN
FALSE TRUE UNKOWN
101
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
22 2.0 80 21
8 NULL 79 71
10 2.5 79 50
Ratings (R)
Get the movies rated over 3 stars or by UserID 79
SELECT R.MovieID, R.StarsFROM Ratings RWHERE R.Stars > 3
OR UserID = 79
OR T F U
T T T T
F T F U
U T U U
SQL will only “pass” tuples that evaluate to “TRUE”!
102
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
22 2.0 80 21
8 NULL 79 71
10 2.5 79 50
Ratings (R)
SELECT R.MovieID, R.StarsFROM Ratings RWHERE R.Stars > 3
OR UserID = 79MovieID Stars
42 3.5
79 NULL
79 2.5
UNKOWN OR TRUE = TRUE
103
NULL Values in SQL
RatingID Stars UserID MovieID1 3.5 79 42
21 3.5 81 20
8 NULL 79 71
11 NULL 99 50
Ratings (R)
SELECT DISTINCT R.StarsFROM Ratings R
Stars
3.5
NULL
So, “UNKNOWN = UNKNOWN” for duplicate elimination
(a quirk of SQL!)
104
Surprise Review Question!
RatingID Stars UserID MovieID1 3.5 79 42
2 4.0 80 20
8 NULL 79 71
10 2.5 79 50
What is the output of this query?
SELECT R.RatingID FROM Ratings RWHERE R.Stars > 3 OR R.Stars <= 3
RatingID1
2
10
105
“The Law of Excluded Middle” Fails!
A fundamental law in binary logic:
Given a predicate , we always have
R.Stars > 3 OR R.Stars <= 3
The above law does NOT hold in three-valued logic!
Given a predicate , can be TRUE or UNKOWN!
OR T F U
T T T T
F T F U
U T U U
NOT T F U
F T U
106
Are there other uses for NULL?
107
“Outer Joins” in SQL
RatingID Stars UserID MovieIDRatings (R)
Example: Get all movies by name and its average ratings
Movies (M) MovieID Name Year Director
SELECT M.Name, AVG(R.Stars)FROM Ratings R, Movies MWHERE R.MovieID = M.MovieIDGROUP BY M.Name
Q. But what if some movie has no ratings in R?
108
“Outer Joins” in SQL
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
Movies (M)
RatingID Stars UserID MovieID
1 3.5 79 20
2 4.0 80 20
3 2.5 79 16
4 4.5 80 16
Ratings (R)
109
“Outer Joins” in SQL
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
RatingID Stars UserID MovieID
1 3.5 79 20
2 4.0 80 20
3 2.5 79 16
4 4.5 80 16
SELECT M.Name, AVG(R.Stars)FROM Ratings R, Movies MWHERE R.MovieID = M.MovieIDGROUP BY M.Name
Name AVG(Stars)
Inception 3.75
Avatar 3.5
110
“Outer Joins” in SQL
RatingID Stars UserID MovieIDRatings (R)
Get all movies by name and its average ratings
Movies (M) MovieID Name Year Director
SELECT M.Name, AVG(R.Stars)FROM Ratings R
NATURAL RIGHT OUTER JOIN Movies M
GROUP BY M.Name
Outer Joins in SQL ensure tuples from one (or both) relation(s)
are present in the output even if they do not have matches!
111
“Outer Joins” in SQL
SELECT M.Name, R.StarsFROM Ratings R NATURAL RIGHT OUTER JOIN Movies M
Name Stars
Inception 3.5
Inception 4.0
Avatar 2.5
Avatar 4.5
Gravity NULL
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
RatingID Stars UserID MovieID
1 3.5 79 20
2 4.0 80 20
3 2.5 79 16
4 4.5 80 16
Outer Joins introduce
NULL values!
This is okay – we
have already made
peace with NULL!
112
“Outer Joins” in SQL
SELECT M.Name, AVG(R.Stars)FROM Ratings R NATURAL RIGHT OUTER JOIN Movies M
Name AVG(Stars)
Inception 3.75
Avatar 3.5
Gravity NULLGROUP BY M.Name
MovieID Name Year Director
20 Inception 2010 Christopher Nolan
16 Avatar 2009 Jim Cameron
53 Gravity 2013 Alfonso Cuaron
RatingID Stars UserID MovieID
1 3.5 79 20
2 4.0 80 20
3 2.5 79 16
4 4.5 80 16
Aggregation of anattribute with NULL
leads to NULL!
Name Stars
Inception 3.5
Inception 4.0
Avatar 2.5
Avatar 4.5
Gravity NULL
SELECT M.Name, R.StarsFROM Ratings R NATURAL RIGHT OUTER JOIN Movies M
113
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
114
Recall Integrity Constraints in DDL
Key Constraints
Referential Integrity Constraints
115
CREATE TABLE
MovieID Name ReleaseDate DirectorMovies
CREATE TABLE Movies (
MovieID INTEGER,
Name CHAR(30),
ReleaseDate DATE,
Director CHAR(20),
PRIMARY KEY (MovieID))
116
Key Constraints in SQL
Key vs. Superkey
Primary key vs. Candidate key vs. Alternate key
MovieID Name ReleaseDate Director IMDB_URLMovies
CREATE TABLE Movies (MovieID INTEGER,Name CHAR(30),ReleaseDate DATE,Director CHAR(20),IMDB_URL CHAR(20),PRIMARY KEY (MovieID),UNIQUE (IMDB_URL))
117
Referential Integrity Constraints
CREATE TABLE Ratings( RatingID INTEGER, Stars REAL, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))
RatingID Stars UserID MovieIDRatings
118
DML-based Integrity Constraints
CHECK (Stars >= 0 AND Stars <= 5)
CREATE TABLE Ratings( RatingID INTEGER, Stars REAL, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))
RatingID Stars UserID MovieIDRatings
119
DML-based Integrity Constraints
CONSTRAINT NoSpectreRatings CHECK (“Spectre” <> (
SELECT M.MovieName FROM Movies M WHERE M.MovieID = Ratings.MovieID))
CREATE TABLE Ratings( RatingID INTEGER, Stars REAL, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))
RatingID Stars UserID MovieIDRatings
120
ASSERTIONS in SQL
Nicer way to enforce constraints that straddle relations
Movies (M) MovieID Name Year Director
Ensure that the number of movies is no more than users
CREATE ASSERTION NoMoreMoviesCHECK (
(SELECT COUNT(*) FROM Movies) <= (SELECT COUNT(*) FROM Users))
UserID Name Age JoinDateUsers (U)
121
Data Manipulation Language (DML)
Insert, Delete, Update
Basic SQL queries
Set and bag operations in SQL
Nested queries
Aggregates in SQL
Null Values in SQL
Usage in Integrity Constraints in DDL
Views
122
What is a View?
A “virtual relation” – exists only as a query, not physically
Existence depends upon the real parent relations
Useful to avoid writing some (sub) queries repeatedly
Also useful to “split up” complex SQL queries
RatingID Stars UserID MovieIDRatings (R)
Movies (M) MovieID Name Year Director
Get the average ratings, year, and director for each movie
123
Creation of Views with DDL+DML
RatingID Stars UserID MovieIDRatings (R)
Movies (M) MovieID Name Year Director
Get the average ratings, year, and director for each movie
CREATE VIEW MovieAvg AS SELECT M.MovieID, M.Year, M.Director
AVG(R.RatingID) AS AvgStarsFROM Movies M, Ratings RWHERE M.MovieID = R.MovieIDGROUP BY R.MovieID
MovieAvg MovieID Year Director AvgStars
124
Querying Views with DML
Get the overall average ratings of movies by each directorSELECT M.Director, AVG(M.AvgStars)FROM MoviesAvg MGROUP BY M.Director
MovieAvg MovieID Year Director AvgStars
SELECT M.Year, AVG(M.AvgStars)FROM MoviesAvg MGROUP BY M.Year
Get the overall average ratings of movies by each year
The RDBMS will implicitly replace MoviesAvg in the
FROM clause with the DML that creates MoviesAvg!
125
Major SQL Components
Data Definition Language (DDL)
Data Manipulation Language (DML)
Embedded SQL and Cursors
Triggers
Security
Transaction Management
Remote Database Access
126
Embedded SQL
Issue SQL commands from within a “host” programming
language, e.g., a C++/Java program
- ODBC/JDBC with “connect” statement
- “Host variables” in the C++/Java program
SQL relations are multi-sets of tuples; no apriori bound
on number of records!
- Dynamic linked lists in host languages, e.g., C
- A SQL-supported mechanism called “Cursor”
127
Cursor
An SQL construct to obtain the results of an SQL query
one tuple-at-a-time in a host PL (“a pointer that moves”) Syntax to “open”, “fetch”, and “move” a cursor’s pointer ORDER BY is often needed to be sure of the tuples’
orderEXEC SQL DECLARE MyCursor CURSOR FOR
SELECT U.Name FROM Users UWHERE U.Age >= 18ORDER BY U.Name
Can also use Cursors to
check complex
conditions in host PL;
also update the relation!
128
Major SQL Components
Data Definition Language (DDL)
Data Manipulation Language (DML)
Embedded SQL and Cursors
Triggers
Security
Transaction Management
Remote Database Access
129
What is a Trigger?
An SQL construct to “monitor” an DBMS instance –
behaves like a “daemon process” that is always running It enables us to specify some “actions” every time some
pre-defined “change” occurs, e.g., an INSERT/DELETE
CREATE TRIGGER triggername
EVENT
CONDITION
ACTION
What “triggers” this statement
A boolean-valued SQL statement
A procedure in PL/SQL (a
procedural extension to SQL)
130
Triggers: Example
CREATE TRIGGER RateCounterInit BEFORE INSERT ON UsersDECLARE
rcount INTEGER;BEGIN
rcount := 0;END
CREATE TRIGGER RateCounterIncr AFTER INSERT ON UsersWHEN (new.Stars < 1) FOR EACH ROWBEGIN
rcount := rcount + 1;END
A PL/SQL procedure
A PL/SQL procedure
Act only if condition satisfied
Define “when” and “what” the change is
BEFORE/AFTER for “when”
INSERT/UPDATE/DELETE for “what”
131
Triggers: Options
“When” of the event:
BEFORE
AFTER
INSTEAD OF (applicable only to views)
“What” of the event:
INSERT
DELETE
UPDATE (can also do UPDATE OF a single attribute)
132
Triggers: Options
Action can be performed on two different basis:
FOR EACH ROW
FOR EACH STATEMENT
Values that are changed can be referred to using:
“new.attribute” – the new value before the event
“old.attribute” – the old value after the event
Set of changed records as a “temporary relation too!
133
Triggers: Example
Referencing old and new values of an attribute
CREATE TRIGGER RatingGetsReducedAFTER UPDATE OF Stars ON RatingsREFERENCING
OLD ROW AS oldrNEW ROW AS newr
FOR EACH ROWWHEN (newr.Stars < oldr.Stars) INSERT INTO ReducedRatingsVALUES (newr.UID, newr.MID,
newr.Stars - oldr.Stars)
134
Triggers: Example
Set of changed records as a “temporary relation”:
CREATE TRIGGER LowNolanRatingsAFTER INSERT ON RatingsREFERENCING NEW TABLE NewRatingsFOR EACH STATEMENT
INSERT INTO LowNolanRatingsSELECT N.UserID, N.MovieID, N.StarsFROM NewRatings NWHERE N.Stars <= 3 AND
N.MovieID IN (SELECT M.MovieIDFROM Movies M
WHERE M.DIRECTOR = “Christopher Nolan”)
135
Triggers and Views: Example
RatingID Stars UserID MovieIDRatings (R)
Movies (M) MovieID Name Year Director
CREATE VIEW MovieAvg AS SELECT M.MovieID, M.Year, M.Director
AVG(R.RatingID) AS AvgStarsFROM Movies M, Ratings RWHERE M.MovieID = R.MovieIDGROUP BY R.MovieID
MovieAvg MovieID Year Director AvgStars
Triggers help propagate changes when views are “modified”
136
Triggers and Views: Example
MovieAvg MovieID Year Director AvgStars
Triggers help propagate changes when views are “edited”RatingID Stars UserID MovieIDRatings (R)
Movies (M) MovieID Name Year Director
CREATE TRIGGER MovieAvgAddINSTEAD OF INSERT ON MovieAvgREFERENCING NEW ROW AS newrFOR EACH ROW
INSERT INTO MOVIES (MovieID, Year, Director)VALUES (newr.MovieID, newr.Year,
newr.Director)
137
Triggers vs. Constraints
Both help maintain data consistency/integrity
Constraints are “declarative”; triggers are “operational”
Having many interrelated triggers can cause weird /
unexpected behaviors; constraints are easier to
understand/reason about
Triggers are more powerful, expressive; several use-cases:
- For complex app actions (e.g., enforce credit limits)
- Helps auto-complete “forms” in some apps!
- Generate logs for specific auditing/security reasons
- and many others …