Using Relational Databases and SQL Department of Computer Science California State University, Los...

Post on 21-Dec-2015

216 views 0 download

Transcript of Using Relational Databases and SQL Department of Computer Science California State University, Los...

Using Relational Databases and SQL

Department of Computer ScienceCalifornia State University, Los Angeles

Lecture 7:

Topics for Today

GROUP BY

HAVING

Set Operations

Scripts

Views

Aggregating in Groups

Can we calculate this with a single query?

Yes, but we need a way to group data together

Solution: Use the GROUP BY clause

Grouping Data

You can make an aggregate function return multiple values per table by grouping the table

-- Count the number of male and female members.SELECT Gender, COUNT(*)FROM MembersGROUP BY Gender;

Because there are two gender groups, male and female, the COUNT function will return two values, one for each gender group

How GROUP BY Works

GROUP BY begins by sorting the table based on the grouping attributes (in our case, Gender)

If any aggregate functions are present, GROUP BY causes each aggregate to be applied per-group rather than per-table

GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated values along with it

Grouping on Multiple Fields

GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames)

SELECT Genre, ArtistID, COUNT(*) FROM Titles GROUP BY Genre, ArtistID order by Genre

Notice that the more fields you group by, the more results you get!

GROUP BY AND DISTINCTIn some cases, using GROUP BY outside an aggregate would produce the same result as using DISTINCT

SELECT Gender FROM MembersGROUP BY Gender;

SELECT DISTINCT GenderFROM Members;

This is sloppy practice!

GROUP BY and DISTINCT

GROUP BY works in conjunction with aggregate functions, DISTINCT does not

GROUP BY affects how data is aggregated and thus has more work to do. DISTINCT just checks to see whether the value is a repeat

In other words, if you are using GROUP BY, you had better have an aggregate function somewhere!

From now on, using GROUP BY as a DISTINCT replacement is an error!

GROUP BY and Primary Keys

Let’s say you want to display each title and the number of members who played on it.

We could write

SELECT Title, COUNT(M.FirstName) FROM titles JOIN XRefArtistsMembers USING(artistID) join members M using(memberID) GROUP BY Title

However, if there are multiple albums with the same title, we will get garbage!

GROUP BY and Primary Keys

The solution is to GROUP BY the primary key, TitleID, followed by Title

SELECT Title, COUNT(M.FirstName) FROM titles JOIN XRefArtistsMembers USING(artistID) join members M using(memberID) GROUP BY titleID, title

We now have a query that will always work, even if there are multiple titles with the same name

Filtering Aggregated Results

Using the previous example, once we have our aggregated result table, is it possible to filter out certain groups, say where COUNT(*) = 1?

The HAVING Clause

Yes, but we must have a way of filtering results AFTER aggregation!

Solution is to use the HAVING clause

The HAVING clause filters AFTER aggregation (this is why you CAN use aggregate functions in the HAVING clause)

The WHERE clause filters BEFORE aggregation (this is why you CANNOT use aggregate functions in the WHERE clause)

HAVING Summary

Using HAVING to filter out groups in an aggregated table

In a HAVING clause, you may use:

aggregate functions

regular functions

constant values

grouping attributes

HAVING Clause Example

Example:

SELECT Genre, COUNT(*) FROM Titles J GROUP BY Genre HAVING COUNT(*) > 1;

Aggregating Distinct Values

A normal SELECT DISTINCT query filters out duplicates after aggregation

Therefore, if a field contains duplicate values, and you aggregate on that field, SELECT DISTINCT WILL NOT filter out duplicate values from being aggregated.

This could produce incorrect answers with, for example, count() or avg()

Aggregating Distinct Values

The solution is to use the DISTINCT keyword within the aggregate function

Example:

WRONG: SELECT DISTINCT COUNT(Firstname) FROM Members;

There is only one value, the number 23!

RIGHT: SELECT COUNT(DISTINCT(Firstname)) FROM Members;

Aggregating Distinct Valuescreate table musicians(

id int primary key,name varchar(30)

);create table bands(

id int primary key,name varchar(30)

);create table xref(

musicianId int,bandId int,instrument varchar(30),primary key ( musicianid, bandId, instrument)

);insert into musicians values(1, "Mick");insert into musicians values(2, "Keith");insert into musicians values(3, "Charlie");insert into musicians values(4, "Ron");insert into musicians values(5, "Jan");insert into musicians values(6, "Dean");insert into bands values(1, "Rolling Stones");insert into bands values(2, "Jan and Dean");insert into xref values(1, 1, "Vocals");insert into xref values(2, 1, "Guitar");insert into xref values(3, 1, "Drums");insert into xref values(4, 1, "Guitar");insert into xref values(5, 2, "Vocals");insert into xref values(6, 2, "Vocals");insert into xref values(6, 2, "Vuvuzuela");

Aggregating Distinct Values

Find the number of different musicians in each band

-- WRONG!!!

SELECT b.name, COUNT(m.id) AS 'Musicians’FROM bands b JOIN xref x ON b.id = x.bandId JOIN musicians m ON x.musicianId = m.id GROUP BY bandID;

This is incorrect since some musicians in this DB are listed more than once in xref because they play more than one instrument

Aggregating Distinct Values

-- CORRECT!!!SELECT b.name, COUNT(distinct m.id) AS 'Musicians’ FROM bands b JOIN xref x ON b.id = x.bandId JOIN musicians m ON x.musicianId = m.id GROUP BY bandID;

This is the correct answer!

Views

Views are stored queries

Views are part of the database schema

Use views as though they were actual tables

With complex DBs, may be better to query views than tables

Views

Syntax:

CREATE [OR REPLACE]

VIEW view_name

AS select_statement

• It’s a good practice to include the primary keys of the tables you select from.

• You may want to get the SELECT right first, then build the CREATE VIEW around it.

Views

Example:

CREATE VIEW SalespersonMembers AS SELECT s.lastname AS "salespersonName", s.salesID , m.lastname AS "memberName", m.memberID FROM salespeople s JOIN members m USING(salesID);

Views

Select from views just as if they were tables:

SELECT * FROM SalespersonMembers;

ViewsBenefits:

Consider an application with queries and DML based on a third-party database

Database manager changes the table definitions: application code breaks!

Any changes require careful coordination between DB manager and application programmers

Consider an app that queries against database views

DB manager can change the tables but rewrite the views to maintain an unchanged interface

Application programmers don’t have to worry about underlying changes

Union

List all regions that contain members, studios, or both

This is a SET operation, so there will be no duplicates.

SELECT region FROM members UNION SELECT region FROM studios;

Summary Statistics with UNIONSelect 1, '< 2 minutes' As Length, Count(*) As NumTracks From Tracks Where LengthSeconds<120

Union

Select 2, '2-3 minutes', Count(*)From Tracks Where LengthSeconds Between 120 and 180

Union

Select 3, '>3 minutes', Count(*)

From Tracks Where LengthSeconds>180;

UNION ALL• Counts duplicates multiple time; breaks set definition of

UNION

• Get number of electronic versions available for all tracks:

Select Count(SoundFile) As Num_Electronic_Files from (Select MP3 As SoundFile From Tracks Where MP3=1 Union All Select RealAud From Tracks Where RealAud=1) as elec;

In Class Exercise

Write SQL queries to find the following:

By how many seconds does the length of the longest track in the db exceed that of the average track?

List the title names and IDs with the number of tracks on each one