Tech Jam 01 - Database Querying

Post on 11-Apr-2017

94 views 2 download

Transcript of Tech Jam 01 - Database Querying

Database QueryingTech Jam -25 november 2015Rodger oates

Database Querying

• STUFF

• COALESCE vs ISNULL

• The LIKE Predicate

• Combining Predicates

• Joining Queries

• Table Expressions

• Window Functions

A Quick Note on Query Processing1. FROM

2. JOINS

3. WHERE

4. GROUP BY

5. CUBE | ROLLUP

6. HAVING

7. SELECT

8. DISTINCT

9. ORDER BY

10. TOP

STUFF

REPLACE

• Used to replace parts of a string

• Will perform action through whole of original string

• Format

• REPLACE(<original>, <string to replace>, <replacement>)

• Note: <string to replace> and <replacement> do not need to be equal in length

STUFF

• Used to insert one string into another

• Replaces characters in range specified with given string

• Format:

• STUFF(<original>, <start index>, <length>, <string to stuff>)

• Examples of use:

• www.sql-server-helper.com/

For XML PATH

• Can be used with STUFF to translate a table of results into a single result.

• Real world example:

• Comma delimited list of names

• E.g. in names of users assigned to a given site

• See: Usage #6 in www.sql-server-helper.com

COALESCE vs ISNULL

ISNULL

• T-SQL specific

• Takes two inputs

• Format:

• ISNULL(<expression 1>, <expression 2>)

• Returns <expression 1> if not null, else <expression 2>

• Note: Data type of result is same as <expression 1> regardless of which expression is returned

COALESCE

• Standard SQL

• Takes any number of inputs

• Format:

• COALESCE(<expression 1>, <expression 2>, …, <expression n>)

• Returns first input object that is not null

• Note: Data type of result is the same as the returned expression

The LIKE Predicate

Why LIKE?

• Used to find strings that are similar to a search term

• Without wildcards, predicate would only match with the search term in its entirety, case insensitive

• With wildcards, we can find matches where only part of the string matches our term

• Search terms that commence with wildcard(s) do not allow SQL to use index ordering to speed up the query

• E.g UserFullName LIKE ‘%oates%’

Combining Predicates

Types of combinations

• AND

• Predicates on both sides must be true

• OR

• Either predicate must be true

• NOT

• Negation of succeeding predicate

• Note: when predicate equates to NULL, the NOT of this predicate will still be NULL

Natural Precedence Rules for Combinations

1. NOT

2. AND

3. OR

e.g.

WHERE col1 = ‘w’ AND col2 = ‘x’ OR col3 = ‘y’ AND col4 = ‘z’

Use of parentheses

• Can use parentheses if different precedence must be given to predicates

• Previous example would equate to:

• WHERE (col1 = ‘w’ AND col2 = ‘x’) OR (col3 = ‘y’ AND col4 = ‘z’)

• However, we could use:

• WHERE col1 = ‘w’ AND (col2 = ‘x’ OR col3 = ‘y’) AND col4 = ‘z’

Joining Queries

Joining Queries

• Types of Join

• Multi-Join Queries

• EXISTS

• Set Operators

• UNION vs UNION ALL

• INTERSECT

• EXCEPT

Types of Join

• CROSS JOIN

• Cartesian product of two tables

• Does not require a link between the tables

• INNER JOIN

• Requires a predicate between the tables (specified using the ON keyword)

• Filters out any rows that do not match the ON predicate

• SELF-JOIN

• Is simply an INNER JOIN

• Joins records in a table with other records in the same table

OUTER JOINS

• Outer joins preserve at least one side of the join if there is no match

• LEFT OUTER JOIN preserves the left table

• E.g. <table A> LEFT OUTER JOIN <table B> preserves <table A>

• RIGHT OUTER JOIN preserves the right table

• E.g. <table A> RIGHT OUTER JOIN <table B> preserves <table B>

• FULL OUTER JOIN preserves both tables where there is no match

• Further information on joins at TechNet

A Note on NULLs within JOINs

• Note: When determining row inclusion via predicate, a NULL value on either side of the comparison will be discarded

• NULL is not a value, it is unknown

Multi-Join Queries

• Often need to look at multiple tables within the same query

• Use joins to link each table together

• Joins are evaluated in order of definition

• Need to be careful when combining outer joins with others, as the results may not be what you expect

Why was a supplier not returned?

Source: Exam 70-461 Training Kit

EXISTS

• Accepts a subquery as input:

• WHERE EXISTS (SELECT … FROM <table>)

• Returns true if at least one row returned

• EXISTS doesn’t need to return the result of the subquery

• Rather, true or false depending on whether rows would have been returned by the subquery

• Can be negated:

• WHERE NOT EXISTS (SELECT … FROM <table>)

Set Operator Guidelines

• Complete rows are matched between the input sets

• Number of columns in all queries must be the same

• Column types for corresponding columns must be compatible

• Considers two NULLs to be compatible

• This is different to JOINs

• Result column names are defined in the first query

• Result ordering can only be defined after the last input query

UNION vs UNION ALL

• UNION unifies the results of two input queries

• Both input queries must have

• UNION has an implicit DISTINCT property

• No duplicates in result

• UNION ALL returns all results, including any duplicates

UNIO N / UNIO N A L L UNIO N A L L

INTERSECT

• Only rows that exist in all input queries are returned

• Has an implicit DISTINCT operator

EXCEPT

• Works with two input queries

• Returns rows that exist in the first query, but not the second

Table Expressions

Types of Table Expression

1. Derived Table

2. Common Table Expression (CTE)

3. View

4. Inline Table-Valued Function

• 1 and 2 are only visible within the scope of the current statement and are not reusable

• 3 and 4 have their definitions stored in the database and can be reused

Optimisation Notes

• Table expressions are not tables, but definitions

• They do not hold any data

• They interact directly with the underlying tables

• They do not have a performance impact in themselves

• The above points must be noted when using a table expression within a query operating on a large data set

Derived Tables

• Closely resemble a subquery

• Defined in the FROM clause of the outer query

• E.g. I want the two lowest priced products per category

Source: Exam 70-461 Training Kit

Common Table Expressions (CTEs)

• Derived tables cannot be nested, if this is required, the statement must define multiple instances of the same query

• Higher risk of mistake

• This is a CTE version of the previous query

Note: we could reuse C as many times as we want, but only within the scope of the current statement

Source: Exam 70-461 Training Kit

A Note on Scoped Tables

• If you need to refer to the same data in multiple statements for a given query it would be better to retrieve the data into a table. You have two options:

• Create a temporary table

• CREATE TABLE #<table name>

• Remember to use a DROP TABLE #<table name> command

• Declare a table variable

• DECLARE @<table name> TABLE

• This way, you are not having to go back to the raw tables to retrieve the same data

Views and Inline Table-Valued Functions

• Define queries in the database, which can then be reused

• Do not store any data

• Performance issues arise from how it is used

• Views do not allow input parameters

• Functions do allow input parameters

Window Functions

Window Functions

• Window Aggregate Functions

• Window Ranking Functions

Window Aggregate Functions

• Same as group aggregate functions

• SUM, COUNT, AVG, MIN, MAX

• Are applied to a window of rows defined by the OVER clause

• Do not hide row detail

• Can mix detail and aggregated elements in the same query

• Without having to define a load of columns in the group section

Example

Window Ranking Functions

• ROW_NUMBER

• Unique number based on ORDER BY clause

• RANK

• Number based on ORDER BY clause

• Assigns same number when ordering values are tied

• Numbers may therefore not be sequential

Window Ranking Functions

• DENSE_RANK

• Similar to RANK

• Numbers are sequential

• NTILE

• Arranges rows within a partition of a number of equally sizes tiles

• Number of rows per partition is calculated by

• <total number of rows> / <partition size>

• If this calculation produces a remainder, additional row(s) are assigned to tiles in their order defined by the ORDER BY clause (see example below)

Example

• Assuming 800 rows of data, and the following query

Source: Exam 70-461 Training Kit

Example

custid orderid Val rownum rnk densernk ntile100

12 10782 12.50 1 1 1 1

27 10807 18.40 2 2 2 1

66 10586 23.80 3 3 3 1

76 10767 28.00 4 4 4 1

54 10898 30.00 5 5 5 1

88 10900 33.75 6 6 6 1

48 10883 36.00 7 7 7 1

41 11051 36.00 8 7 7 1

71 10815 40.00 9 9 8 2

38 10674 45.00 10 10 9 2

53 11057 45.00 11 10 9 2

75 10271 48.00 12 12 10 2

Source: Exam 70-461 Training Kit

PARTITION

• Can be applied to window ranking functions

• Each partition has its own numbering

• E.g. to get the latest child event for each parent