SQL Unit 8 Subqueries with IN, Joins, and Other Topics Kirk Scott 1.

95
SQL Unit 8 Subqueries with IN, Joins, and Other Topics Kirk Scott 1

Transcript of SQL Unit 8 Subqueries with IN, Joins, and Other Topics Kirk Scott 1.

1

SQL Unit 8Subqueries with IN, Joins, and Other Topics

Kirk Scott

2

• 8.1 Queries That Can Be Done with a Join or with IN• 8.2 Queries That Can Easily Be Done with a Join But

Not Done with IN• 8.3 It is Not Clear that NOT IN Queries Can Be Done

with a Join• 8.4 Mixing IN Queries and Joins• 8.5 Doubly Nested IN Queries• 8.6 Subqueries with Operators• 8.7 Counting the Number of Distinct Values• 8.8 Finding Aggregates of Aggregates with IN Queries

3

8.1 Queries That Can Be Done with a Join or with IN

4

• In the kinds of IN queries considered so far, what is happening is a match on equality between the corresponding fields of the inner and outer queries.

• This means that it is possible to get the same results with a join query.

5

• For example, for this IN query:• • SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale)

6

• You would get the same results by doing this join query:

• • SELECT DISTINCT name• FROM Salesperson, Carsale• WHERE Salesperson.spno = Carsale.spno

7

• Keep in mind that the IN query will eliminate duplicate spno values.

• Assuming that there are no duplicate salesperson name values, then the join query with DISTINCT will give exactly the same results as the query with IN.

• If there are duplicate salesperson names, then the queries would not produce exactly the same results.

8

• Here is another example of an IN query:• • SELECT stickerprice• FROM Car• WHERE vin IN• (SELECT vin• FROM Carsale)

9

• You could do this with the following join query:

• • SELECT stickerprice• FROM Car, Carsale• WHERE Car.vin = Carsale.vin

10

• Keep in mind that since vin is the primary key of the Carsale table, the IN would not have the effect of squeezing out duplicate vin values.

• There are none to squeeze out.• However, various different cars may have the

same stickerprice, so there could be duplicate stickerprice values in the IN query results.

11

• If that were the case, then putting DISTINCT in the join query would make sure that the two queries gave different results.

• The most likely equivalent join query would not have DISTINCT in it.

12

8.2 Queries That Can Easily Be Done with a Join But Not Done with IN

13

• There are some things you can do with a join query that you can't do with an IN query.

• The join query shown below includes fields from two tables in its results:

• • SELECT stickerprice, salesprice• FROM Car, Carsale• WHERE Car.vin = Carsale.vin

14

• An IN query can only show fields from the outer table in the results.

• The IN query shown below attempts to select fields from both the inner and outer tables, but the query is incorrect.

• It will not show the desired results:• • SELECT stickerprice, salesprice• FROM Car• WHERE vin IN• (SELECT vin• FROM Carsale)

15

• You can't use a field in the outer query that only exists in the table of the inner query.

• The parentheses serve as a barrier. • The field salesprice does not exist in the outer

query.

16

• If you typed this query in and tried to run it, you would get a text box prompting you for the value of the variable salesprice.

• The outer part of the query does not recognize this as the name of the field in the Carsale table in the inner query.

17

• The reason the query doesn’t work has to do with the meaning of the keyword IN.

• We have seen inline views, which are a kind of subquery, before.

• In that kind of query it was possible to select a field from a table in the subquery.

18

• The difference with an IN query is this:• The only purpose served by the subquery is to

present a set.• The outer query asks whether a value is in the

set.• In effect, the IN can only return a yes or a no.• It cannot return an actual value from the inner

query.

19

8.3 It is Not Clear that NOT IN Queries Can Be Done with a Join

20

• Here is a NOT IN query that finds the stickerprices of cars that haven't sold:

• • SELECT stickerprice• FROM Car• WHERE vin NOT IN• (SELECT vin• FROM Carsale)

21

• How to do this as a join is not readily apparent, so the NOT IN syntax is preferable.

• Shown below is an attempt to get the same results with an inequality join query.

• This will not give the same results as the join query:• • SELECT stickerprice• FROM Car, Carsale• WHERE Car.vin <> Carsale.vin

22

• Not only does this query not give the same results as the NOT IN query;

• it is highly likely that its results are not useful at all. • The query matches every car in the Car table with

every car in the Carsale table that does not have the same vin.

• This is a case where the results are the Cartesian product minus the equijoin, and it is hard to imagine a situation where they would be desirable.

23

• The previous two points, summarized, are:• With joins, you can pull fields from the inner

table, but you can’t do that with IN.• With IN, you can do NOT IN, but the join

equivalent is not apparent.

24

More on the Join Equivalent

• A student suggested this correct approach to solving the problem using a join:

• SELECT C.stickerprice• FROM car C LEFT JOIN carsale CS• ON CS.vin = C.vin• WHERE CS.vin IS NULL

25

• This emphasizes the fact that outer joins are different from inner (vanilla) joins

• Outer joins include records that don’t have matches

• I still find the NOT IN approach more intuitive• Using the outer join in this way never occurred

to me

26

8.4 Mixing IN Queries and Joins

27

• You may wonder whether it is possible to get values from the inner table with IN queries.

• The answer is yes, because it is possible to mix joins and IN.

• Suppose you would like a query to return the names of salespeople and the salesprices of cars they sold to customers who live in Anchorage.

28

• Salesperson name and salesprice come from the Salesperson and Carsale tables, respectively.

• The proposed query also has a condition that applies to the Customer table.

• All together, the query would depend on 3 tables.

29

• Because there are three tables, if this were to be done as an IN query, it would have to be doubly nested.

• This is covered in the next section.• Since the query needs fields from more than

one table, it wouldn’t be possible to write it using only IN anyway.

30

• It could be written as a three-way join.• That might be the simplest option. • However, it could also be written as a query

containing both an IN and a join.• That is the point of this section.• You can mix and match IN and join as needed

and desired.• Here is the query written in this way.

31

• SELECT name, salesprice• FROM Salesperson, Carsale• WHERE Salesperson.spno = Carsale.spno• AND custno IN• (SELECT custno• FROM Customer• WHERE city = 'Anchorage')

32

8.5 Doubly Nested IN Queries

33

• The example of the previous section illustrated a query that could be done with a 3-way join or a single join combined with an IN subquery.

• Suppose that fields from only one table were wanted in the results of a query that depended on three tables.

• It would be possible to do this with a doubly nested IN query.

34

• Once you start nesting queries and mixing IN and NOT IN, analyzing what the queries mean can get complex.

• The following examples will be based on the general question of identifying salespeople according to whether they sold to customers in Anchorage or outside of Anchorage.

• The Venn diagram shows the possibilities.

35

Salespeoplewho sold to customers in Anchorage.

Salespeople who sold to customers outside of Anchorage.

36

• 1. The crescent moon shaped area on the left represents those salespeople who sold only to customers in Anchorage.

• 2. The whole circle on the left represents those salespeople who sold to at least one customer in Anchorage, although they may also have sold to a customer outside of Anchorage.

37

• 3. The crescent moon shaped area on the right represents those salespeople who sold only to customers outside of Anchorage.

• 4. The whole circle on the right represents those salespeople who sold to at least one customer outside of Anchorage, although they may also have sold to a customer in Anchorage.

38

• 5. The lens shaped area in the middle, the intersection, represents those salespeople who sold to customers both in Anchorage and outside of Anchorage.

• In practice it seems reasonable to suppose that that would be the largest subset.

• It is only in the interests of clear labeling that it appears smallest in the Venn diagram.

39

• The following example query finds the names of salespeople who sold cars to customers who live in Anchorage.

40

• SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale• WHERE custno IN• (SELECT custno• FROM Customer• WHERE city = 'Anchorage'))

41

• In terms of the Venn diagram, it selects the whole circle on the left.

42

Salespeople who sold to customers outside of Anchorage.

Salespeoplewho sold to customers in Anchorage.

43

• Recall that nested queries are executed from the inside out.

• If you choose to write a query like this one, you might find that it is most logical to work your way from the inside out.

• You could also get the same results using a three-way join, something you already know how to do.

44

• It is possible to negate the conditions of a doubly nested IN query.

• Once you start doing this, the queries become harder to understand.

• Here is an example with a NOT IN query nested inside an IN query.

• It makes sense to show the SQL first.• Then the explanation will be given.

45

• SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale• WHERE custno NOT IN• (SELECT custno• FROM Customer• WHERE city = 'Anchorage'))

46

• The innermost query finds the sales to customers who are outside of Anchorage.

• In combination with this the middle query finds the spno of salespeople who made such sales.

• The outer query finds the names of these salespeople.

47

• The set of spno’s specified by this query includes salespeople who sold at least one car to a customer outside of Anchorage.

• The critical point is this:• The set of spno’s includes salespeople who

sold at least one car outside of Anchorage, but may also have sold one or more cars in Anchorage.

48

• The results of this query could only include salespeople who made a sale.

• Salespeople who hadn’t made a sale would not be included.

• The results of this query are represented by the whole circle on the right hand side of the Venn diagram.

49

Salespeoplewho sold to customers in Anchorage.

Salespeople who sold to customers outside of Anchorage.

50

• In the next example an IN query is nested inside of a NOT IN query.

• Because there is just one NOT in the query overall, superficially it might seem to be similar to the previous example.

• However, the results will be different.

51

• SELECT name• FROM Salesperson• WHERE spno NOT IN• (SELECT spno• FROM Carsale• WHERE custno IN• (SELECT custno• FROM Customer• WHERE city = 'Anchorage'))

52

• Informally, you could say that this finds the names of salespeople who didn't sell cars to people who live in Anchorage.

• Just as the presence of only one NOT makes the query look similar, the informal description sounds similar to the previous example.

53

• The difference is that the results could not include salespeople who sold a car to a customer in Anchorage.

• Also, if there were salespeople who hadn't sold any cars, they would be included in the results.

54

• The results of this query are represented by the crescent moon shaped area on the right hand side of the Venn diagram.

55

Salespeople who sold to customers outside of Anchorage.

Salespeoplewho sold to customers in Anchorage.

56

• Now consider the following doubly nested NOT IN query.

• It is the first example of a doubly negated query.

• Double negation will appear again in the next unit.

57

• Unfortunately, the double negative is hard to understand.

• Hopefully, after the verbal explanation is finished the Venn diagram will help make the query understandable.

58

• SELECT name• FROM Salesperson• WHERE spno NOT IN• (SELECT spno• FROM Carsale• WHERE custno NOT IN• (SELECT custno• FROM Customer• WHERE city = 'Anchorage'))

59

• Informally, you can say that the innermost query finds the customers who live in Anchorage.

• The middle query finds the spno's for carsales where the customers were not in Anchorage.

• The outermost query finds the names of salespeople who were not among those who did not sell to customers in Anchorage.

• But what does this mean precisely?• Who is included and who isn’t?

60

• Consider the innermost and the middle queries together again.

• Row by row, the spno’s for individual sales to customers not in Anchorage are selected.

• The important thing to keep in mind is that a salesperson might have sold both to customers in Anchorage and customers not in Anchorage.

61

• If a salesperson made a sale to at least one customer not in Anchorage, then that salesperson’s spno will be in the results of the innermost and middle queries.

• You could reword the results of the innermost and middle queries as, “spno’s for salespeople who sold at least one car to a customer outside of Anchorage.”

62

• Then the outermost query negates this.• It asks for the names of salespeople where the

spno was not among those who sold at least one car to a customer outside of Anchorage.

• In other words, it asks for the names of salespeople who sold only to customers in Anchorage.

63

• In other words, the results of this query are represented by the crescent moon shaped area on the left hand side of the Venn diagram.

64

Salespeoplewho sold to customers in Anchorage.

Salespeople who sold to customers outside of Anchorage.

65

• To make this presentation complete, it would be necessary to write a query that found those salespeople who had sold to customers both in Anchorage and outside of Anchorage.

• In other words, that would be a query that selected salespeople that corresponded to the lens shaped area in the Venn diagram.

66

• It would be possible to do this by combining parts of the queries given so far.

• Such a query would be quite lengthy and difficult to understand.

• This will not be done because the techniques presented in the next unit make it possible to devise a better solution.

67

8.6 Subqueries with Operators

68

• There's another kind of subquery that's similar to a subquery with IN.

• It's known as a subquery with an operator. Instead of IN, a comparison operator is used.

• In order for this to work, the subquery has to find exactly one value.

69

• Here is an example of a query that we don't know how to do yet:

• Find the make and model of cars with stickerprices greater than the average dealercost.

• The following query is incorrect because Microsoft Access SQL does not support this syntax:

70

• SELECT make, model• FROM Car• WHERE stickerprice > AVG(dealercost)

71

• However, the problem can be solved easily with a query with the inequality operator and a subquery containing the AVG function, which by definition only returns a single value:

• • SELECT make, model• FROM Car• WHERE stickerprice >• (SELECT AVG(dealercost)• FROM Car)

72

• Subqueries and joins are both important elements of SQL.

• As seen, some things you can do with one, some things with the other, and in many cases you can simply take your choice and use whichever syntax you find more understandable.

73

• Remember that on a test you might not be given a choice, and you might be asked to write a query that finds certain results using a particular kind of syntax.

• It's important to be familiar with subquery syntax and how it works because it will come up again later in a slightly different form in order to solve other problems.

74

8.7 Counting the Number of Distinct Values

75

• You may recall from a previous unit that nesting keywords as shown below resulted in a query that could not be done in Microsoft Access SQL.

• For example, it's not possible to count the number of distinct stickerprices in the Car table directly.

• This query is wrong:• • SELECT COUNT(DISTINCT stickerprice)• FROM Car

76

• The solution to the problem was to use either a temporary table or a view, which found the distinct stickerprices, and then to write an outer query that counted the number of items in the results.

• Written as an inline view, this is a correct solution:• • SELECT COUNT(stickerprice)• FROM• (SELECT DISTINCT stickerprice• FROM Car)

77

8.8 Finding Aggregates of Aggregates with IN Queries

78

• Here is the next kind of query that can't be done directly in Microsoft Access SQL.

• Here, two aggregate functions are nested. • You would like to find to find the make and the

sum of the stickerprices for that make which has the smallest sum of stickerprices in the Car table.

79

• This query is wrong:• • SELECT make, MIN(SUM(stickerprice))• FROM Car• GROUP BY make

80

• In general, this problem can be solved using a temporary table or a view.

• Anything that can be done with a view can also be done with an inline view, a form of subquery, so in general, the answer is obtainable as an outer query with a nested inner query.

81

• There are a couple of details to review before looking at the answer.

• First of all, keep in mind that if you select an aggregate, if you don't specify an alias for the new field in the result, the system will generate one.

• You would like to have a name for this which you can refer to, so in general, you need to recall the syntax for making a column alias and use this when writing your query:

• • SELECT SUM(stickerprice) AS sumstick• …

82

• Although it seemed pointless earlier to have GROUP BY queries that didn't include the GROUP BY field in the SELECT, it turns out to be a simplification in cases like these to only include the aggregate:

• • SELECT SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make

83

• There are two stages to finding the correct answer to the query mentioned above, where you want to find the minimum of the sum of stickerprices.

• The first stage is to find the grouped sum of stickerprices in an inner, subquery.

• Then in an outer query you select the minimum from the subquery, which plays the role of a table in the query overall.

84

• SELECT MIN(sumstick)• FROM• (SELECT SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make)

85

• Notice that this kind of subquery is basically an inline view.

• It plays the role of the table in the FROM clause of the outer query.

• When the inline view is an aggregation query with GROUP BY, as long as the aggregate is given an alias in the subquery, it is available for use in the outer query as shown.

86

• Suppose you'd like to see both the make and sum of the stickerprices for the make with the minimum sum.

• Unlike in simple inline views, any other field from the inline view, like make, is not available in the outer query.

• Several alternatives that don’t work are shown below.

87

• This won’t work because make isn’t selected in the subquery:

• SELECT make, MIN(sumstick)• FROM• (SELECT SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make)

88

• This won’t work because if you select make in the outer query along with an aggregate, then you would have to group by make:

• • SELECT make, MIN(sumstick)• FROM• (SELECT make, SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make)

89

• This is syntactically correct, but it still doesn’t give the desired results.

• If you group by make, you will get the MIN for each group.• However, each group is already just the SUM, so the results are the

same as just finding the grouped by SUM.

• SELECT make, MIN(sumstick)• FROM• (SELECT make, SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make)• GROUP BY make

90

• If you want to include the make in the results, this can be done with a subquery with an operator, in this case, the equality operator.

• This produces the desired results:

91

• SELECT make, SUM(stickerprice)• FROM Car• GROUP BY make HAVING SUM(stickerprice) =• (SELECT MIN(sumstick)• FROM• (SELECT SUM(stickerprice) AS sumstick• FROM Car• GROUP BY make))

92

• Admittedly, the overall solution is not very straightforward, although each piece individually should be clear enough.

• There is no doubt that a more desirable solution is to have a version of SQL that handles nested aggregates directly.

93

• It is also sad, but true, that if you try to write a query like this, save it, and then open it again in the design view, whether to refresh your memory about a correct solution or to try and fix an incorrect solution, Microsoft Access will have filled your work with needless grouping symbols and other sources of confusion that will make it difficult to work with.

94

The End

95