Using Relational Databases and SQL Department of Computer Science California State University, Los...
-
Upload
candace-wade -
Category
Documents
-
view
216 -
download
0
Transcript of Using Relational Databases and SQL Department of Computer Science California State University, Los...
Using Relational Databases and SQL
Department of Computer ScienceCalifornia State University, Los Angeles
Lecture 7:
Topics for Today
GROUP BY
HAVING
Set Operations
Scripts
Views
Aggregating in Groups
Can we calculate this with a single query?
Yes, but we need a way to group data together
Solution: Use the GROUP BY clause
Grouping Data
You can make an aggregate function return multiple values per table by grouping the table
-- Count the number of male and female members.SELECT Gender, COUNT(*)FROM MembersGROUP BY Gender;
Because there are two gender groups, male and female, the COUNT function will return two values, one for each gender group
How GROUP BY Works
GROUP BY begins by sorting the table based on the grouping attributes (in our case, Gender)
If any aggregate functions are present, GROUP BY causes each aggregate to be applied per-group rather than per-table
GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated values along with it
Grouping on Multiple Fields
GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames)
SELECT Genre, ArtistID, COUNT(*) FROM Titles GROUP BY Genre, ArtistID order by Genre
Notice that the more fields you group by, the more results you get!
GROUP BY AND DISTINCTIn some cases, using GROUP BY outside an aggregate would produce the same result as using DISTINCT
SELECT Gender FROM MembersGROUP BY Gender;
SELECT DISTINCT GenderFROM Members;
This is sloppy practice!
GROUP BY and DISTINCT
GROUP BY works in conjunction with aggregate functions, DISTINCT does not
GROUP BY affects how data is aggregated and thus has more work to do. DISTINCT just checks to see whether the value is a repeat
In other words, if you are using GROUP BY, you had better have an aggregate function somewhere!
From now on, using GROUP BY as a DISTINCT replacement is an error!
GROUP BY and Primary Keys
Let’s say you want to display each title and the number of members who played on it.
We could write
SELECT Title, COUNT(M.FirstName) FROM titles JOIN XRefArtistsMembers USING(artistID) join members M using(memberID) GROUP BY Title
However, if there are multiple albums with the same title, we will get garbage!
GROUP BY and Primary Keys
The solution is to GROUP BY the primary key, TitleID, followed by Title
SELECT Title, COUNT(M.FirstName) FROM titles JOIN XRefArtistsMembers USING(artistID) join members M using(memberID) GROUP BY titleID, title
We now have a query that will always work, even if there are multiple titles with the same name
Filtering Aggregated Results
Using the previous example, once we have our aggregated result table, is it possible to filter out certain groups, say where COUNT(*) = 1?
The HAVING Clause
Yes, but we must have a way of filtering results AFTER aggregation!
Solution is to use the HAVING clause
The HAVING clause filters AFTER aggregation (this is why you CAN use aggregate functions in the HAVING clause)
The WHERE clause filters BEFORE aggregation (this is why you CANNOT use aggregate functions in the WHERE clause)
HAVING Summary
Using HAVING to filter out groups in an aggregated table
In a HAVING clause, you may use:
aggregate functions
regular functions
constant values
grouping attributes
HAVING Clause Example
Example:
SELECT Genre, COUNT(*) FROM Titles J GROUP BY Genre HAVING COUNT(*) > 1;
Aggregating Distinct Values
A normal SELECT DISTINCT query filters out duplicates after aggregation
Therefore, if a field contains duplicate values, and you aggregate on that field, SELECT DISTINCT WILL NOT filter out duplicate values from being aggregated.
This could produce incorrect answers with, for example, count() or avg()
Aggregating Distinct Values
The solution is to use the DISTINCT keyword within the aggregate function
Example:
WRONG: SELECT DISTINCT COUNT(Firstname) FROM Members;
There is only one value, the number 23!
RIGHT: SELECT COUNT(DISTINCT(Firstname)) FROM Members;
Aggregating Distinct Valuescreate table musicians(
id int primary key,name varchar(30)
);create table bands(
id int primary key,name varchar(30)
);create table xref(
musicianId int,bandId int,instrument varchar(30),primary key ( musicianid, bandId, instrument)
);insert into musicians values(1, "Mick");insert into musicians values(2, "Keith");insert into musicians values(3, "Charlie");insert into musicians values(4, "Ron");insert into musicians values(5, "Jan");insert into musicians values(6, "Dean");insert into bands values(1, "Rolling Stones");insert into bands values(2, "Jan and Dean");insert into xref values(1, 1, "Vocals");insert into xref values(2, 1, "Guitar");insert into xref values(3, 1, "Drums");insert into xref values(4, 1, "Guitar");insert into xref values(5, 2, "Vocals");insert into xref values(6, 2, "Vocals");insert into xref values(6, 2, "Vuvuzuela");
Aggregating Distinct Values
Find the number of different musicians in each band
-- WRONG!!!
SELECT b.name, COUNT(m.id) AS 'Musicians’FROM bands b JOIN xref x ON b.id = x.bandId JOIN musicians m ON x.musicianId = m.id GROUP BY bandID;
This is incorrect since some musicians in this DB are listed more than once in xref because they play more than one instrument
Aggregating Distinct Values
-- CORRECT!!!SELECT b.name, COUNT(distinct m.id) AS 'Musicians’ FROM bands b JOIN xref x ON b.id = x.bandId JOIN musicians m ON x.musicianId = m.id GROUP BY bandID;
This is the correct answer!
Views
Views are stored queries
Views are part of the database schema
Use views as though they were actual tables
With complex DBs, may be better to query views than tables
Views
Syntax:
CREATE [OR REPLACE]
VIEW view_name
AS select_statement
• It’s a good practice to include the primary keys of the tables you select from.
• You may want to get the SELECT right first, then build the CREATE VIEW around it.
Views
Example:
CREATE VIEW SalespersonMembers AS SELECT s.lastname AS "salespersonName", s.salesID , m.lastname AS "memberName", m.memberID FROM salespeople s JOIN members m USING(salesID);
Views
Select from views just as if they were tables:
SELECT * FROM SalespersonMembers;
ViewsBenefits:
Consider an application with queries and DML based on a third-party database
Database manager changes the table definitions: application code breaks!
Any changes require careful coordination between DB manager and application programmers
Consider an app that queries against database views
DB manager can change the tables but rewrite the views to maintain an unchanged interface
Application programmers don’t have to worry about underlying changes
Union
List all regions that contain members, studios, or both
This is a SET operation, so there will be no duplicates.
SELECT region FROM members UNION SELECT region FROM studios;
Summary Statistics with UNIONSelect 1, '< 2 minutes' As Length, Count(*) As NumTracks From Tracks Where LengthSeconds<120
Union
Select 2, '2-3 minutes', Count(*)From Tracks Where LengthSeconds Between 120 and 180
Union
Select 3, '>3 minutes', Count(*)
From Tracks Where LengthSeconds>180;
UNION ALL• Counts duplicates multiple time; breaks set definition of
UNION
• Get number of electronic versions available for all tracks:
Select Count(SoundFile) As Num_Electronic_Files from (Select MP3 As SoundFile From Tracks Where MP3=1 Union All Select RealAud From Tracks Where RealAud=1) as elec;
In Class Exercise
Write SQL queries to find the following:
By how many seconds does the length of the longest track in the db exceed that of the average track?
List the title names and IDs with the number of tracks on each one