©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced...

25
1 ©Silberschatz, Korth and Sudarsha 22.1 atabase System Concepts 4 th Edition SQL:1999 Advanced Querying SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data Analysis and OLAP This also apply to SQL: 2003 that is just a minor revision of SQL:1999
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced...

Page 1: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

1

©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4th Edition

SQL:1999 Advanced QueryingSQL:1999 Advanced Querying

Decision-Support Systems

Data Warehousing

Data Analysis and OLAP

This also apply to SQL: 2003 that is just a minor revision of SQL:1999

Page 2: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

2

©Silberschatz, Korth and Sudarshan22.2Database System Concepts 4th Edition

Decision Support SystemsDecision Support Systems

Decision-Support systems are used to make business decisions often based on data collected by on-line transaction-processing systems.

Examples of business decisions: what items to stock?

What insurance premium to change?

Who to send advertisements to?

Examples of data used for making decisions Retail sales transaction details

Customer profiles (income, age, sex, etc.)

Page 3: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

3

©Silberschatz, Korth and Sudarshan22.3Database System Concepts 4th Edition

Decision-Support Systems: OverviewDecision-Support Systems: Overview

A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site. Important for large businesses which generate data from multiple

divisions, possibly at multiple sites Data may also be purchased externally

Data analysis tasks are simplified by specialized tools and SQL extensions Example tasks

For each product category and each region, what were the total sales in the last quarter and how do they compare with the same quarter last year

As above, for each product category and each customer category

Statistical analysis packages (e.g., : S++) can be interfaced with databases Partial support through OLAP Functions

Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from Large databases.

Page 4: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

4

©Silberschatz, Korth and Sudarshan22.4Database System Concepts 4th Edition

Data WarehousingData Warehousing

A data warehouse is a repository of information gathered from multiple sources.

Page 5: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

5

©Silberschatz, Korth and Sudarshan22.5Database System Concepts 4th Edition

Data Warehousing (Cont.)Data Warehousing (Cont.)

Provides a single consolidated interface to data

Data stored for an extended period, providing access to historical data

Data/updates are periodically downloaded form online transaction processing (OLTP) systems. Typically, download happens each night.

Data may not be completely up-to-date, but is recent enough for analysis.

Running large queries at the warehouse ensures that OLTP systems are not affected by the decision-support workload.

Page 6: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

6

©Silberschatz, Korth and Sudarshan22.6Database System Concepts 4th Edition

Issues in Building a WarehouseIssues in Building a Warehouse

When and how to gather data. Source driven: data source initiates data transfer Destination driven: warehouse initiates data transfer

What schema to use. Schema integration Cleaning and conversion of incoming data

What data to summarize. Raw data may be too large to store on-line Aggregate values (totals/subtotals) often suffice Queries on raw data can often be transformed by query optimizer to use

aggregate values

How to propagate updates. Date at warehouse is a view on source data Efficient view maintenance techniques required

Page 7: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

7

©Silberschatz, Korth and Sudarshan22.7Database System Concepts 4th Edition

Data-Warehouse ArchitectureData-Warehouse Architecture

Page 8: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

8

©Silberschatz, Korth and Sudarshan22.8Database System Concepts 4th Edition

Star Schema For A Data WarehouseStar Schema For A Data Warehouse

Page 9: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

9

©Silberschatz, Korth and Sudarshan22.9Database System Concepts 4th Edition

Online Analytical ProcessingOnline Analytical Processing

The operation of changing the dimensions used in a cross-tab is called pivoting.

An OLAP system provides other functionality as well. For instance, the analyst may wish to see a cross-tab on item-name and color for a fixed value of size, for example, large, instead of the sum across all sizes. Such an operation is referred to as slicing. The operation is sometimes called dicing, particularly when values for multiple dimensions are fixed.

The operation of moving from finer-granularity data to a coarser granularity is called a rollup.

The opposite operation - that of moving from coarser-granularity data to finer-granularity data – is called a drill down.

Page 10: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

10

©Silberschatz, Korth and Sudarshan22.10Database System Concepts 4th Edition

Relational Representation of the Data in Relational Representation of the Data in Figure 22.1Figure 22.1

Page 11: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

11

©Silberschatz, Korth and Sudarshan22.11Database System Concepts 4th Edition

Cross Tabulation of Cross Tabulation of salessales by by item-name item-name and and colorcolor

The table above is an example of a cross-tabulation(or cross-tab) also referred to as a pivot-table. In general, a cross-table is a table where values for one attribute form the row headers, values for another attribute form the column headers, and the values in an individual cell are derived as follows.

A cross tab with summary rows/columns can be represented by introducing a special value all to represent subtotals.

Page 12: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

12

©Silberschatz, Korth and Sudarshan22.12Database System Concepts 4th Edition

Hierarchies on DimensionsHierarchies on Dimensions

Page 13: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

13

©Silberschatz, Korth and Sudarshan22.13Database System Concepts 4th Edition

Cross Tabulation of Cross Tabulation of salessales With Hierarchy With Hierarchy on on item-nameitem-name

Page 14: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

14

©Silberschatz, Korth and Sudarshan22.14Database System Concepts 4th Edition

Three-Dimensional Data CubeThree-Dimensional Data Cube

Page 15: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

15

©Silberschatz, Korth and Sudarshan22.15Database System Concepts 4th Edition

OLAP ImplementationOLAP Implementation

The earliest OLAP systems used multidimensional arrays in memory to store data cubes, and are referred to as mutidimensional OLAP (MOLAP) systems.

Hybrid systems, which store some summaries in memory and store the base data and other summaries in a relational database, are called hybrid OLAP (HOLAP) systems.

Page 16: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

16

©Silberschatz, Korth and Sudarshan22.16Database System Concepts 4th Edition

Data Analysis (Cont.)Data Analysis (Cont.)

Rollup: Moving from finer-granularity data to a coarser granularity by means of aggregation.

Drill down: Moving from coarser-granularity data finer-granularity data.

Proposed extensions to SQL, such as the cube operation help to support generation of summary data

The following query generates the previous table.

select color, size, sum (number)

from sales

groupby color, size with cube

Page 17: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

17

©Silberschatz, Korth and Sudarshan22.17Database System Concepts 4th Edition

Data Analysis (Cont.)Data Analysis (Cont.)

Figure shows the combinations of dimensions size, color, price

In general computing cube operation with n groupby columns gives 2nd different groupby combinations.

Page 18: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

18

©Silberschatz, Korth and Sudarshan22.18Database System Concepts 4th Edition

SQL:1999 Extended GroupBySQL:1999 Extended GroupBy also supports generalizations of the group by constructs, using the cube and

rollup constructs. A representative use of the cube construct is;

select item-name, color, size, sum(number)from salesgroup by cube(item-name, color, size)

This query computes the union of eight different groupings of thesales relation:

{ (item-name, color, size), (item-name, color), (item-name, size), (color, size), (item-name), (color), (size), ( ) }

Where ( ) denotes an empty group by list.

For each grouping, the result contains the null value for attributes not present in the grouping. For instance, with occurrences of all replaced by null, can be computed by the query

select item-name, color, sum(number)from salesgroup by cube(item-name, color)

Page 19: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

19

©Silberschatz, Korth and Sudarshan22.19Database System Concepts 4th Edition

Extended Group By (Cont.)Extended Group By (Cont.)

A representative rollup construct is

select item-name, color, size, sum(number)from salesgroup by rollup(item-name, color, size)

Here only four grouping are generated:

{ (item-name, color, size), (item-name, color), (item-name), ( ) }

Rollup can be used to generate aggregates at multiple levels of ahierarchy on a column. For instance, we have a table itemcategory(item-name, category) giving the category of each item. Then the query

select category, item-name, sum(number)from sales, categorywhere sales.item-name = itemcategory.item-namegroup by rollup(category, item-name)

would give a hierarchical summary by item-name and by category.

Page 20: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

20

©Silberschatz, Korth and Sudarshan22.20Database System Concepts 4th Edition

Extended Group By (Cont.)Extended Group By (Cont.)

Multiple rollups and cubes can be used in a single group by clause.For instances, the following query

select item-name, color, size, sum(number)from salesgroup by rollup(item-name), rollup(color, size)

generates the groupings

{ (item-name, color, size), (item-name, color), (item-name), (color, size), (color), ( ) }

The function grouping can be applied on an attribute; it returns 1 if the value is a null value representing all, and returns 0 in all other cases. Consider the following query:

select item-name, color, size, sum(number),grouping(item-name) as item-name-flag,grouping(color) as color-flag,grouping(size) as size-flag,

from salesgroup by cube(item-name, color, size)

Page 21: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

21

©Silberschatz, Korth and Sudarshan22.21Database System Concepts 4th Edition

Extended Aggregation: OLAP FunctionsExtended Aggregation: OLAP Functions

Rank, row_number and other functions based on explicit order

Many new statistical functions: see, e.g., DB2 UDB's High-Function Business Intelligence in e-business---Red Book http://www.redbooks.ibm.com/redbooks/SG246546.html STDDEV, etc.

CORRELATION,

COVARIANCE.

Regression functions: Y = a X + b

REGR_SLOPE -> a

REGR_INTERCEPT -> b

Windows on aggregates: Physical windows: based on the number of rows in the window

Logical windows: based on the value span of the window.

Page 22: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

22

©Silberschatz, Korth and Sudarshan22.22Database System Concepts 4th Edition

RankingRanking Ranking is done in conjunction with an order by specification. Suppose we

are given a relation student-marks(student-id, marks) which stores the marks obtained by each student. The following query gives the rank of each student.

select student-id, rank( ) over (order by (marks) desc) as s-rankfrom student-marks

An extra order by clause is needed to get them in sorted order, as shown below.

select student-id, rank ( ) (order by (marks) desc) as s-rankfrom student-marks order by s-rank

Page 23: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

23

©Silberschatz, Korth and Sudarshan22.23Database System Concepts 4th Edition

Partition ByPartition By

Ranking can be done within partition of the data. The following query then gives the rank of students within each section:

student-marks(student-id, marks); student- section(student-id, section)

select student-id, section,rank( ) over (partition by section order by marks desc)

as sec-rankfrom student-marks, student-sectionwhere student-marks.student-id = student-section.student-idorder by section, sec-rank

dense_rank(): no holes after ties

row_number(): provide row numbering given a specific partitioning and ordering of rows.

Page 24: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

24

©Silberschatz, Korth and Sudarshan22.24Database System Concepts 4th Edition

Ranking (Cont.)Ranking (Cont.)

For a given constant n, the ranking the function ntile(n) takes the tuples in each partition in the specified order, and divides them into n buckets with qual numbers of tuples. For instance, we an sort employees by salary, and use ntile(3) to find which range (bottom third, middle third, or top third) each employee is in, and compute the total salary earned by employees in each range:

select threetile, sum(salary)from (

select salary, ntile(3) over (order by (salary) as threetile

from employee) as sgroup by threetile

SQL:1999 permits the user to specify where they should occur by using nulls first or nulls last, for instance

select student-id, rank ( ) over (order by marks desc nulls last) as s-rankfrom student-marks

Page 25: ©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.

25

©Silberschatz, Korth and Sudarshan22.25Database System Concepts 4th Edition

Aggregates on WindowsAggregates on Windows

An example of window query is that, given sales values for each date, calculates for each date the average of the sales on that day, the previous day, and the next day; such moving average queries are used to smooth out random variations.

In contrast to group by, the same tuple can exist in multiple windows. Suppose we are given a relation transaction(account-number, date-time, value),

select account-number, date-time, sum(value) over (partition by account-number

order by date-time range unbounded preceding)

This query returns a new sum for each new tuple—cumulative sum!

An actual window can also be specified: e.g.

range 10 rows preceding

range 30 minutes preceding