Intro to tsql unit 4

47
Introduction To SQL Unit 4 Modern Business Technology Introduction To TSQL Unit 4 Developed by Michael Hotek

Transcript of Intro to tsql unit 4

Page 1: Intro to tsql   unit 4

Introduction To SQLUnit 4

Modern Business Technology

Introduction To TSQLUnit 4

Developed by

Michael Hotek

Page 2: Intro to tsql   unit 4

Unit 4

Goals• Primary keys• Foreign keys• Joining tables• Sub-selects• Advantages/disadvantages of joins

and sub-selects

Page 3: Intro to tsql   unit 4

Relationships

• A database derives its usefulness from containing a group of tables that have some relationship to each other

• An entity is a person, place, or thing of importance to an organization

• An entity generally becomes a table

• Relationships are the connections between tables

• Relationships are usually implemented as keys in a database design

Page 4: Intro to tsql   unit 4

Relationships

• For example take the titles and publishers table

• Every publisher publishes a title

• And every title has a publisher

• This relationship is implemented by means of the key pub_id

Page 5: Intro to tsql   unit 4

Relationships

• Relationships come in three different varieties

• One to one– One row in a table is related to exactly

one row in another table

• One to many– One row in a table is related to one or

more rows in another table

• Many to many– Many rows in a table are related to one

or more rows in another table

Page 6: Intro to tsql   unit 4

Relationships

• A many to many relationship is extremely poor database design

• This type of relationship can cause a large amount of confusion

• The problem is that many to many relationships do exist and must be stored in a database

• This is usually resolved into multiple one to many relationships also known as an intersection table

Page 7: Intro to tsql   unit 4

Relationships

• An intersection table is an artificial construct that is commonly used in RDBMSs

• It does not have any physical meaning, but instead serves to break up a many to many relationship

• The titleauthor table is an example of an intersection table

Page 8: Intro to tsql   unit 4

Relationships

• Relationships are implemented in a database as keys

• Keys are a logical construct; they are not a physical part of the database

• This means that a key does not represent any physically quantifiable item

• You will generally see numbers used as keys in a database (au_id, pub_id, stor_id)

Page 9: Intro to tsql   unit 4

Primary Key

• A primary key is a special type of key that consists of one or more columns that uniquely identify a row

• Primary keys must be unique and can not contain null values

• A table will only have one primary key

• A primary key will reside on the one side of a 1 - N relationship

Page 10: Intro to tsql   unit 4

Primary Key

• pub_id is the primary key for the publishers table

• This will uniquely identify each publisher in the table

• We do not use the publisher's name, because this could be the same as another publisher

• Also, it is easy to control data input to ensure it is valid

• It is much easier to check 4 digits than 40 characters. Also a name can be null.

Page 11: Intro to tsql   unit 4

Foreign Key

• A foreign key is one or more columns that refer to a primary key of another table

• pub_id is the primary key of publishers

• pub_id is a foreign key in titles

• Example:

A publisher can publish many titles, but a title must have a publisher

This relationship is shown by the primary key of the publishers table being stored with the title that publisher published.

Page 12: Intro to tsql   unit 4

Composite Keys

• A primary key and a foreign key can consist of more than one column

• When a key contains more than one column, it is known as a composite key

• The primary key of the titleauthor table is a composite (au_id,title_id)

Page 13: Intro to tsql   unit 4

Indexes

• A discussion of indexes is well beyond the scope of this course, there are a few naming items that can be noted

• Keys will be implemented in a database as an object called an index

• An index could be a primary key, a foreign key, or neither

• A primary key is the same as a primary index or unique index

• A foreign key is the same as a foreign index

Page 14: Intro to tsql   unit 4

Joins

• Up to this point we have confined our queries to a single table

• While this is done primarily for retrieving data into transaction processing applications, it doesn't represent a real world application of data querying

• All of the data in a database is segmented into tables and we generally need data from more than one table to show what we need

• To accomplish this, we use a join

Page 15: Intro to tsql   unit 4

Joins

• You will notice that there is no such thing as a join clause in our SQL syntax

• A join is simply a where clause

• A join is generally constructed between one primary key and another primary key or between a primary key and a foreign key

• (Discussion of PK/FK symbols on the ER Diagram)

Page 16: Intro to tsql   unit 4

Joins

• Suppose we want to view a list of sales for each store

• We could simply do the following:

select * from sales

stor_id ord_num ord_date qty payterms...

------- ------------ ---------------------- ------ --------

6380 6871 Sep 14 1994 12:00AM 5 Net 60...

6380 722a Sep 13 1994 12:00AM 3 Net 60...

7066 A2976 May 24 1993 12:00AM 50 Net 30...

7066 QA7442.3 Sep 13 1994 12:00AM 75 ON invoice

7067 D4482 Sep 14 1994 12:00AM 10 Net 60...

7067 P2121 Jun 15 1992 12:00AM 40 Net 30...

7067 P2121 Jun 15 1992 12:00AM 20 Net 30...

7067 P2121 Jun 15 1992 12:00AM 20 Net 30...

7131 N914008 Sep 14 1994 12:00AM 20 Net 30...

7131 N914014 Sep 14 1994 12:00AM 25 Net 30...

7131 P3087a May 29 1993 12:00AM 20 Net 60...

...

(22 row(s) affected)

• But, the stor_id is meaningless to us

Page 17: Intro to tsql   unit 4

Joins

• What we want to see is the store name, city, and state along with the sales for each order

select stor_name,ord_num,qty from stores,sales where stores.stor_id = sales.stor_id

stor_name ord_num qty

---------------------------------------- -------------------- ------

Eric the Read Books 6871 5

Eric the Read Books 722a 3

Barnum's A2976 50

Barnum's QA7442.3 75

News & Brews D4482 10

News & Brews P2121 40

News & Brews P2121 20

News & Brews P2121 20

Doc-U-Mat: Quality Laundry and Books N914008 20

Doc-U-Mat: Quality Laundry and Books N914014 25

Doc-U-Mat: Quality Laundry and Books P3087a 20

Doc-U-Mat: Quality Laundry and Books P3087a 25

Doc-U-Mat: Quality Laundry and Books P3087a 15

Doc-U-Mat: Quality Laundry and Books P3087a 25

Fricative Bookshop QQ2299 15

Fricative Bookshop TQ456 10

Fricative Bookshop X999 35

Bookbeat 423LL922 15

...

(22 row(s) affected)

Page 18: Intro to tsql   unit 4

Joins

• What does this mean?

• The select clause simply designates which columns we want to see. If we were retrieving a column that had the same name in the two tables, we would have to specify which table the data was coming from

select stores.stor_id,stor_name, ord_num, qty

from stores,sales

where stores.stor_id = sales.stor_id

Page 19: Intro to tsql   unit 4

Joins

• From clause

• We are retrieving data from more than one table, so each table must be specified in the from clause

• The from clause can be seen as the main driver of a SQL statement

• If the table isn’t in the from clause, none of it's columns can be used in any other clause

Page 20: Intro to tsql   unit 4

Joins

• The where clause

where stores.stor_id = sales.stor_id

• This tells the DBMS to take the first store ID in the stores table and add the data from the corresponding store ID in the sales table to the data retrieved from the stores table

• It then continues with the second store ID, etc. until it reaches the end of the table

Page 21: Intro to tsql   unit 4

Joins

• A join can be seen as a special type of selection criteria.

• If there is a stor_id in the stores table that does not exist in the sales table, the data for that particular store will not be returned

• You can also add additional selection criteria

select stores.stor_id,stor_name,city, state,ord_num,qty

from stores,sales

where stores.stor_id = sales.stor_id and state = 'CA'

Page 22: Intro to tsql   unit 4

Joins

• As we have seen, you can specify just those columns that you want to see

• For instance we are just concerned with the quantity of sales in CA

select sum(qty) from stores,sales where stores.stor_id = sales.stor_id and state = 'CA'

-----------

275

(1 row(s) affected)

Page 23: Intro to tsql   unit 4

• The type of join we have examined so far is also referred to as an equi-join or an inner join

• In the case of stores and sales, we could have a store that doesn't have any sales

• If we use the equi-join, we will not see these stores that do not have any sales

• So, how do we get the list of sales for all stores regardless of whether they have any sales

Joins

Page 24: Intro to tsql   unit 4

Outer Joins

• We accomplish this via an outer join

select stores.stor_id,stor_name, ord_num, qty from stores,sales where stores.stor_id *= sales.stor_id

stor_id stor_name ord_num qty

------- ------------------------------------ -------------------- ----

6380 Eric the Read Books 6871 5

6380 Eric the Read Books 722a 3

7066 Barnum's A2976 50

7066 Barnum's QA7442.3 75

7067 News & Brews D4482 10

7067 News & Brews P2121 40

7067 News & Brews P2121 20

7067 News & Brews P2121 20

7131 Doc-U-Mat: Quality Laundry and Books N914008 20

7131 Doc-U-Mat: Quality Laundry and Books N914014 25

7131 Doc-U-Mat: Quality Laundry and Books P3087a 20

7131 Doc-U-Mat: Quality Laundry and Books P3087a 25

7131 Doc-U-Mat: Quality Laundry and Books P3087a 15

7131 Doc-U-Mat: Quality Laundry and Books P3087a 25

7896 Fricative Bookshop QQ2299 15

7896 Fricative Bookshop TQ456 10

7896 Fricative Bookshop X999 35

8042 Bookbeat 423LL922 15

8042 Bookbeat 423LL930 10

8042 Bookbeat 756756 5

8042 Bookbeat P723 25

8042 Bookbeat QA879.1 30

(22 row(s) affected)

Page 25: Intro to tsql   unit 4

Outer Joins

• Notice the use of the asterisk (*)

where stores.stor_id *= sales.stor_id

• This tells the DBMS to return all of the rows in the stores table with the corresponding data in the sales table and do not drop any store IDs that are not in the sales table

• Note: The use of an asterisk to designate an outer join is used in SQL Server (Sybase and MS) most other DBMSs support a slightly different syntax as does the ANSI-92 standard

Page 26: Intro to tsql   unit 4

Outer Joins

• Outer joins come in three different flavors– Left– Right– Full

• A left outer join is the same thing as a right outer join except for the order

• Left: stores.stor_id *= sales.stor_id• Right: sales.stor_id =* stores.stor_id

• Every left outer join can also be expressed as a right outer join and vice versa

Page 27: Intro to tsql   unit 4

Full Outer Join

• A full outer join is included here for completeness

• You should use a full outer join ONLY under very specific circumstances

• A full outer join will produce a cross product of the two tables

• If you have one table with 100 rows and another with 1000 rows, a full outer join will produce a result set of 100,000 rows

Page 28: Intro to tsql   unit 4

Full Outer Join

• This is because with a full outer join, you are telling the database to give every combination of rows possible

• i.e. Each row is matched to every row in the other table

• This type is query will almost never be preformed and should be avoided at all costs.

• The first time you inadvertently fire one of these off, you will get a rather angry call from your DBA

Page 29: Intro to tsql   unit 4

Subqueries

• Subqueries are simply a SQL statement nested inside of another SQL statement

• The most common place to do this is in a where or having clause.

select [distinct] select_list

from table_list

where {expression {[not] in | comparison [any|all]}|[not] exists}

(select [distinct] subquery_select_list from table_list where conditions)

[group by group_by_list]

[having conditions]

[order by order_by_list]

Page 30: Intro to tsql   unit 4

Subqueries

• Subqueries come in two basic kinds: correlated and noncorrelated

• A noncorrelated subquery is one in which the inner query is independent, gets evaluated first, and passes it’s result set back to the outer query

• A correlated subquery is one in which the inner query is dependent upon the results from the outer query

Page 31: Intro to tsql   unit 4

Subqueries

• Below are examples of these two kinds

• noncorrelated:select pub_name from publishers

where pub_id in (select pub_id from titles

where type = 'business')

• correlated:select pub_name from publishers p

where 'business' in (select type from titles where oub_id = p.pub_id)

• As is the case with most of the subqueries, you can also express them as a join

Page 32: Intro to tsql   unit 4

Subqueries

• Subqueries also come in three different types:

• They return zero or more items• They return exactly one item• They test for existence of a value

• If you have a subquery of the first type it must be preceeded by an IN. where column = (select…) will return an error if the subquery returns more than one item

Page 33: Intro to tsql   unit 4

Noncorrelated Subqueries

• At a conceptual level, a noncorrelated subquery is executed in two parts.

• First the inner query is executed

• It then passes its results back to the outer query which then finds the rows that match the list passed back

• The column names are resolved implicitly based upon the from clause of the corresponding query. You can always explicitly define the table name.

• This is recommended for complex subqueries

Page 34: Intro to tsql   unit 4

Correlated Subqueries

• Processing of a correlated subquery is much more complicated, but these can handle queries you can't easily do with noncorrelated subqueries or joins

• A correlated subquery depends on data from the outer query

• The inner query will execute once for each row in the outer query

Page 35: Intro to tsql   unit 4

Correlated Subqueries

• The outer query retrieves the first row of data and passes the data values to the inner query

• The inner query finds all rows that match the data passed from the outer query

• Finally the rows from the inner query are checked against the conditions in the outer query

• If one or more rows match the conditions, the data corresponding to that row will be returned to the user

Page 36: Intro to tsql   unit 4

Joins or Subqueries

select distinct pub_name from publishers, authors where publishers.city = authors.city

ANDselect pub_name from publishers where city in

(select city from authors)

will return the same results

• But if you want data from both the publishers and authors tables, you must use a join.

select pub_name,au_fname,au_lname

from publishers,authors

where publishers.city = authors.city

Page 37: Intro to tsql   unit 4

Joins or Subqueries

select au_lname,au_fname,city from authors where city in (select city from authors where au_fname = 'Dick' and au_lname = 'Straight')

can also be expressed asselect au_lname,au_fname,city from authors

a1, authors a2 where a1.city = a1.city and a2.au_fname = 'Dick' and a2.au_lname = 'Straight'

• This is referred to as a self join

Page 38: Intro to tsql   unit 4

Joins or Subqueries

• Whether you use joins or subqueries is usually a matter of choice

• Most joins can be expressed as subqueries and vice versa

• Calculating an aggregate and using this in the selection criteria is an advantage of subqueries

select title,price from titles

where price = (select min(price) from titles)

• Displaying data from multiple tables is usually done with a join

Page 39: Intro to tsql   unit 4

Common Restrictions

• The select list of a inner query introduced by an IN can have only one column. This column must also be join compatible with the column in the where clause of the outer query

• Subqueries introduced by an unmodified comparison operator (not followed by ANY or ALL) can not include a group by or having clause unless this will force the inner query to return a single value

• Subqueries can not manipulate their results internally. i.e. They can not contain an order by or the keyword INTO

Page 40: Intro to tsql   unit 4

Any and All

• You use the ANY and ALL keywords with a comparison operator in a subquery

• > ALL means greater than every value in the results of the inner query (> maximum value)

• > ANY means greater than any value in the results of the inner query (> minimum value)

Page 41: Intro to tsql   unit 4

Any and All

ALL Results ANY Results

> all (1,2,3) > 3 >any (1,2,3) > 1

< all (1,2,3) < 1 < any (1,2,3) < 3

= all (1,2,3) = 1 and =2 and =3 = any (1,2,3) =1 or =2 or =3

select title from titles where advance > all

(select advance from publishers,titles where titles.pub_id = publishers.pub_id and pub_name = 'New Age Books')

title

--------------------------------------------------------------------------

The Busy Executive's Database Guide

Cooking with Computers: Surreptitious Balance Sheets

You Can Combat Computer Stress!

Straight Talk About Computers

Silicon Valley Gastronomic Treats

The Gourmet Microwave

The Psychology of Computer Cooking

But Is It User Friendly?

Secrets of Silicon Valley

Net Etiquette

Computer Phobic AND Non-Phobic Individuals: Behavior Variations

Is Anger the Enemy?

Life Without Fear

Prolonged Data Deprivation: Four Case Studies

...

(18 row(s) affected)

Page 42: Intro to tsql   unit 4

Exists

• The last type of subquery is used to test for the existence of something

• To find all of the publishers who publish business books we would do the following:

select distinct pub_name from publishers where exists (select 1 from titles where pub_id = publishers.pub_id and type = 'business')

pub_name

----------------------------------------

Algodata Infosystems

New Moon Books

(2 row(s) affected)

Page 43: Intro to tsql   unit 4

Additional Restrictions

• A subquery that test for existence will either contain one column or an asterisk in the select list. It makes no sense to include a column list, because this type of query simply tests to see if a row exists and does not return any data

Page 44: Intro to tsql   unit 4

• A subquery may contain another subquery

• In fact you can nest as many levels as you need. However, for most applications more than four levels is an indication of poor database design

select au_lname,au_fname from authors where au_id in (select au_id from titleauthors where title_id in (select title_id from titles where type = 'popular_comp'))

• This will return the list of authors who have written at least one popular computing book

Nesting Subqueries

Page 45: Intro to tsql   unit 4

Unit 4 Review

• A relationships are connections between tables

• A primary key is that set of columns that define a unique row

• You can have only one primary key per table• A foreign key is one or more columns that

refer to a primary key of the same or another table

• You can have up to 255 foreign keys per table

• A composite key consists of more than one column

• Joins are used when you need to retrieve data from more than one table

• The two kinds of joins are: equijoin and outer join

• An outer join comes in three flavors: left, right, full

• A full outer join will produce the cross product of the two tables

• Subqueries are nested SQL statements

Page 46: Intro to tsql   unit 4

Unit 4 Review

• Subqueries come in two kinds: correlated and noncorrelated

• Subqueries are also of three different types: return exactly one item, return zero or more items, and test for existence

• Most joins can be written as a subquery and vice versa

• A subquery is used when you need to include an aggregate in the where conditions

• A join is used when you want to retrieve data from more than one table

• All means every value (> all means greater than every value)

• Any means at least one value (> any means greater than at least one value)

• Exists allows us to test to see if a value exists

• Exists queries are used with correlated subqueries

• Subqueries can be nested any number of levels

Page 47: Intro to tsql   unit 4

Unit 4 Exercises

• Time allotted for exercises is 1 hour