Post on 05-Jan-2016
Copyright © 2003-2008 Curt Hill
Queries in SQLMore options
Copyright © 2003-2008 Curt Hill
Duplicates• A select usually joins several
tables creating large unique tuples• Temporary table has an
unspecified key• If the select removes portions of
the key, then duplicates can occur• Consider the query that links
faculty to the students taking any of their classes
The query SELECT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naid
This produces 238 rowsWhat is the key?
Copyright © 2003-2008 Curt Hill
The Key• Does not need to be specified• In this case it is the linking fields
– F_naid (or ct_naid)– Ct_dept– Ct_number– S_id
• Since some of these fields will be removed by the Select duplicates occur
Copyright © 2003-2008 Curt Hill
Removing duplicates• In this query duplicates occurs
when a student takes multiple classes from the teacher
• The result is not a set (which eliminates duplicates) but a multi-set (which allows duplicates)
• Placing the reserved word DISTINCT immediately after the Select removes these
• The new query follows:Copyright © 2003-2008 Curt Hill
Revised query SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naid
This produces 213 rows
Copyright © 2003-2008 Curt Hill
How does this work?• Removing duplicates is not trivial• There are several ways, but all are
work• One possibility is to sort the tuples
– Duplicates then must be adjacent
• Another is to hash them– Duplicates have the same key
• Small queries could be done in memory, larger ones cannot
• We will consider sorting and hashing later
Copyright © 2003-2008 Curt Hill
Deception• The difference between the two
queries is just one keyword• That keyword forces the DBMS to
do substantial extra work• Looks like no big deal but actually
is• Hence the query is deceptively
different• However, make the database do its
jobCopyright © 2003-2008 Curt Hill
Copyright © 2003-2008 Curt Hill
All
• The opposite of the Distinct is the All
• Specifies that duplicates should not be eliminated
• Since elimination is expensive, it is usually not done– Thus All gives same result whether
present or absent
Order• The order of the output table is
dependent on many unpredictable things
• Different DBMSs may give different orderings, even with same data– Based on how they process the data
• The order of the above queries is different on Oracle and MySQL
• Worse yet neither will put all the students from one faculty together
Copyright © 2003-2008 Curt Hill
Copyright © 2003-2008 Curt Hill
Order by clause • Order by follows the Where• It specifies a sort order for the
output• May specify one or more fields• Fields do not have to be displayed
Sorted query 1 SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_name, S_name
Copyright © 2003-2008 Curt Hill
Sorting• The default behavior is to sort:
– Case sensitive way– Ascending order (lowest to highest)
• Usually we sort on the display values– Oracle only allows this– SQL Server and MySQL allow sorts on
other fields
Copyright © 2003-2008 Curt Hill
Sorted query 2 SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_naid, S_id
Copyright © 2003-2008 Curt Hill
Sort Order• The default is sort in ascending
order for all sort keys• The key may be followed by ASC or
DESC• ASC makes ascending order• DESC is descending order• These may not be spelled out• If left out ASC is default
Copyright © 2003-2008 Curt Hill
Sorted query 3SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_name DESC, s_name ASC
Copyright © 2003-2008 Curt Hill
Aggregate operations• We can collapse several rows into one• This produces a summary report• Several rows of table become one row
of output• This requires the Group By clause with
Aggregate functions• The Group By follows Where• Aggregate functions are in Select
Copyright © 2003-2008 Curt Hill
Group By and Aggregate functions
• Each of these Aggregate functions specify a field:– Count– Avg– Sum– Max– Min
• Usually used with Group by but not always
• Group by follows Where• Specifies the groups as changes in
fieldsCopyright © 2003-2008 Curt Hill
Grouped Query 1 SELECT f_name, count(s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name
This produces 16 rows
Copyright © 2003-2008 Curt Hill
Copyright © 2003-2008 Curt Hill
Commentary• Group by forces a sort• This is only means to ensure that the
items are together• The DISTINCT keyword may be used
within aggregate functions:– Count– Avg– Sum
Grouped Query 2 SELECT f_name, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name
This produces 16 rows but different counts
Copyright © 2003-2008 Curt Hill
Copyright © 2003-2008 Curt Hill
Secondary Selection• The Where does an initial selection
– It eliminates numerous combinations of tuples of no interest
• We may also wish to remove aggregated rows
• This must occur after the Where but before final table
• This is done with the HAVING clause of the GROUP BY
Having• The Having clause follows the
Group By fields• It gives a selection criteria for rows• Usually based upon the aggregate
functions• Form:
Having comparison• See following
Copyright © 2003-2008 Curt Hill
Grouped Query 3 SELECT f_name, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name HAVING count(*)>10
Copyright © 2003-2008 Curt Hill
Commentary• This produces 9 rows • Notice the * is the parameter
of count• Other Aggregate functions
could be used as well• A Having without a Group By is
like a Where
Copyright © 2003-2008 Curt Hill
Ungrouped Query• Suppose we just want a count or
sum• Then we can use an aggregate
function without Group By• This will generally collapse the
entire table into a single row• Consider the next screen
Copyright © 2003-2008 Curt Hill
Aggregates• Counting rows:Select count(*)from faculty– Results in one row with count of 19
• Sum of student balances:Select sum(s_balance)from students– Results in one row with the sum:
93240.34
Copyright © 2003-2008 Curt Hill
Variations• Recall this query SELECT f_name, count(DISTINCT s_name)…GROUP BY f_name
• Suppose f_naid were included in the SelectSELECT f_name, f_naid,
• In Oracle and SQL Server it would also have to be part of the Group By– But not in MySQL
Copyright © 2003-2008 Curt Hill
Bad Oracle QuerySELECT f_name, f_naid, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name– Receives an error:
ORA-00979: not a GROUP BY expression
Copyright © 2003-2008 Curt Hill