Data Warehouses - Politecnico di...

56
Data Warehouses Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Transcript of Data Warehouses - Politecnico di...

Page 1: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Data WarehousesLetizia Tanca

Politecnico di Milano(with the kind support of

Rosalba Rossato)

Page 2: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Outline

• What is a Data Warehouse• Data Warehouse Architecture• Data Warehouse Design

Page 3: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

What is a Data Warehouse

• Data should be integrated across the enterprise(s)

• Summary data provide real value to the organization

• Historical data hold the key to understanding data over time

• What-if capabilities are required

Page 4: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

What is a Data Warehouse?A single, complete and consistent store of data obtained from a variety of different sources made available to end users, so that they can understand and use it in a business context.

[Barry Devlin]

Page 5: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Business Processes’Pyramid

operations

management

direction

Page 6: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

An alternative definition ofData Warehouse

A data warehouse is (the result of) a process for transforming data into information and for making it available to users in a timely enough manner to make a difference.

[Forrester Research, April’96]Data

Information

Page 7: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Data Warehouse

• As a dataset: decision support database maintained separately from the organization’s operational database

• As a process: technique for assembling and managing data from various sources with the purpose of answering business questions. Thus making decisions that were not previously possible

Page 8: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Data Warehouse

• A Data Warehouse is a– subject-oriented,– integrated,– time-varying,– non-volatile

collection of data that is used primarily inorganizational decision making.

[Bill Inmon, Building the Data Warehouse, 1996]

Page 9: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Dimensions of a Data Warehouse

• Data warehouses are very large databaseso Terabytes (1012 bytes):o Petabytes (1015 bytes): e.g. Geographic Information Systemso Exabytes (1018 bytes): e.g. National Medical Recordso Zettabytes (1021 bytes): e.g. Weather reports, including

imageso Zottabytes (1024 bytes): e.g. Intelligence Agency Videos

Page 10: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

DW is a specialized DBStandard DB (OLTP)• Mostly updates• Many small

transactions• Mb - Gb of data• Current snapshot• Index/hash on p.k.• Raw data• Thousands of users

(e.g., clerical users)

Warehouse (OLAP)• Mostly reads• Queries are long and

complex• Gb - Tb of data• History• Lots of scans• Summarized,

reconciled data• Hundreds of users

Page 11: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Where is a DW useful

• Commerce: sales and complaints analysis, client fidelization, shipping and stock control

• Manufacturing plants: production cost control, provision and order support

• Financial services: risk and credit card analysis, frauddetection

• Telecommunications: call flow analysis, subscribers’profiles

• Healthcare structures: patients’ ingoing and outgoingflows, cost analysis

Page 12: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Architecture for a data warehouse

Data Monitoring & Administration

Metadata

Data MartsDATA SOURCESDATA SOURCES

ExternalExternal sourcessources

OperationalOperationaldatabasesdatabases

Analysis tools

DimensionalDimensionalanalysisanalysis

Data miningData mining

VisualizationVisualization

EnterpriseData

Warehouse

Page 13: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

OLAP-oriented data models

• must support sophisticated analyses and computationsover different dimensions and hierarchies

• Most appropriate data model: data cube• Cube dimensions are the search keys• Each dimension may be hierarchical

– DATE {DAY-MONTH-TRIMESTER-YEAR}– PRODUCT {BRAND - TYPE - CATEGORY}

(e.g. LAND ROVER - CARS - VEHICLES)

• Cube cells contain metric values

Page 14: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Examples of data warehouse queries

• Show total sales across all products at increasing aggregation levels for a geography dimension, from state to country to region, for 1999 and 2000.

• Create a cross-tabular analysis of our operations showing expenses by territory in South America for 1999 and 2000. Include all possible subtotals.

• List the top 10 sales representatives in Asia according to sales revenue for automotive products in year 2000, and rank their commissions.

Page 15: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

LOGICAL MODELS FOR OLAPAN INSURANCE COMPANY DATA CUBE

AGE

YEAR

TYPE

•N. OF POLICIES•PREMIUM VALUE

DIMENSIONSMETRIC VALUES

<20 20-30 30-40 40-50 50-60 60-70 70-80 >80

1990

1991

1992

1993

1994

1995

1996

1997

CARFIRE-THEFT POLICIES

FAMILYLIFE

Page 16: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Dimensional Fact Model

• Allows one to describe a set of fact schemata

• The components of a fact schema are: – Facts– Measures – Dimensions– Dimension Hierarchy

Page 17: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

DFM: elements• A fact is a concept that is relevant for

the decisional process; typically it models a set of events of the organization

• A measure is a numerical property of a fact

• A dimension is a fact property defined w.r.t. a finite domain; it describes an analysis coordinate for the fact

Page 18: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Examples

• Store chain– Fact: sales– Measures: sold quantity, gross income– Dimensions: product, time, zone

• Telecom Operator– Fact: phone call– Measures : cost, duration– Dimensions: caller subscriber,calledsubscriber, time

Page 19: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

MultidimensionalRepresentation

Products

Time Periods

Markets

Quantity

Vendite

Page 20: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Time

Markets

Product

Multidimensional data views

Page 21: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

The area manager examines product salesof his/her own markets

Markets

Time

Products

Page 22: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Product manager examines the sales of a specific product in all periods and in all

markets

Products

Time

Markets

Page 23: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Financial manager examines product sales in allmarkets, for the current period and the

previous one

Time

ProductsMarkets

Page 24: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

The strategic manager concentrates on a category of products, a specific region

and a medium time span

Products

Time

Markets

Page 25: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Dimensions and hierarchies

store

city

reg.area

region

day

month

trimester

year

product

category brand

Page 26: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

OLAP OPERATIONS• roll-up

– Aggregates data at a higher level – e.g. last year’s salesvolume per product category and per region

• drill-down– De-aggregates data at the lower level – e.g. for a given

product category and a given region, show daily sales• slice-and-dice

– Applies selections and projections, which reduce data dimensionality

• pivoting– Selects two dimensions to re-aggregate data (cube re-

orientation)• ranking

– Sorts data according to predefined criteria• traditional operations (select, project, join, derived attributes,

etc.)

Page 27: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

OLAP OPERATIONS

YEAR

TYPE

AGE

MONTHTY

PE

AGE

DRILL-DOWN

ROLL-UP

Page 28: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll-up

Page 29: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll-up

Page 30: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Drill-down

Page 31: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

OLAP OPERATIONS

TYPE

AGE

YEAR

AGE

TYPE

YEARYE

AR

TYPE

AGE

PIVOTING

Page 32: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Pivoting

Page 33: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

OLAP OPERATIONSYE

AR

TYPE

AGE

YEAR

TYPE

AGE

SLICE AND DICE

Page 34: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Slice and Dice

Page 35: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Slice and Dice

Page 36: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

VISUALIZATION and REPORTS

Data may be visualized graphically, in an Excel-like format: tables, hystograms, graphics, 3D surfaces, etc.

Page 37: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Examples of operators: initial table

monthFebruaryMarch April

Sales am. t130.000.000140.000.000135.000.000

productpastapasta pasta

Quantity45.000 50.000 51.000

Page 38: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

DrillDrill--downdown: : addingadding a a dimensiondimensionDrillDrill--downdown on the zoneon the zone

monthFebruaryFebruaryFebruaryMarchMarchMarchAprilAprilApril

zonenortheastcenternortheastcenternortheastcenter

productpastapasta pastapastapastapastapasta pastapasta

quantity15.000 17.00013.00018.000 18.00014.00018.00017.00016.000

Page 39: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll-up: dimension elimination

Roll-up on the month

zonenortheastcenter

productpastapasta pasta

Quantity51.000 52.00043.000

Page 40: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Aggregate Aggregate QueriesQueries

ExamplesExamples::

•• Total Total salessales per per productproduct categorycategory, per , per supermarket, per day supermarket, per day

•• Total Total monthlymonthly salessales forfor allall the the productsproducts, per , per supermarketsupermarket

•• Total monthly sales per Total monthly sales per categorycategory per per supermarketsupermarket

•• AvgAvg. . monthlymonthly salessales per per categorycategory, , forfor allallsupermarketssupermarkets

Page 41: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Olap logical models• MOLAP (Multidimensional On-Line

Analytical Processing) stores data by using a multidimensional data structure (a “physical” data cube)

• ROLAP (Relational On-Line Analytical Processing) uses the relational data model to represent multidimensional data

Page 42: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

The data The data cubecube in SQLin SQL

•• ItIt expressesexpresses allall the the possiblepossible tuple tuple aggregationsaggregations of a of a tabletable

•• ItIt usesuses the new the new polymorphicpolymorphic valuevalue ALLALL

Page 43: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Data Data cubecube in SQLin SQL(ROLAP)(ROLAP)

selectselect Model, Model, YearYear, , Color, sum(Color, sum(SalesSales) )

from from SalesSaleswhere Model in {'Fiat','Ford'}where Model in {'Fiat','Ford'}and Color = 'Red'and Color = 'Red'and and YearYear betweenbetween 1994 and 19951994 and 1995

groupgroup by (Model, by (Model, YearYear, Color) , Color) withwith cubecube

Page 44: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

RelevantRelevant FactsFacts

modelfiatfiatford

colorredredred

year19941995 1994

sales50 85 80

Page 45: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

AllAll the data in the the data in the cubecubemodelmodel

fiatfiatfiatfiatfiatfiatfiatfiatfiatfiatfiatfiatfordfordfordfordfordfordfordfordALLALLALLALLALLALLALLALLALLALLALLALL

colorcolor

redredredredALLALLALLALLredredALLALLredredALLALLredredALLALLredredredredredredALLALLALLALLALLALL

yearyear

199419941995 1995 1994199419951995ALLALLALLALL1994 1994 19941994ALLALLALLALL1994199419951995ALLALL1994199419951995ALLALL

sum (sum (salessales))

50 50 85 85 50 50 85 85 135 135 135 135 80 80 808080808080130 130 85 85 215 215 130 130 85 85 215215

Page 46: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Data Data cubecube visualizationvisualization

fiatfiat

..

..19951995

19941994

fordford.. ..

redred..

..

.. ..

.. ..

..

..ALLALLALLALLALLALL

Page 47: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll upRoll upROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions.Utilizing the ROLLUP operator instead of the CUBE operator will eliminate the results that contain a NULL in the first column.

selectselect Model, Model, YearYear, , Color, sum(Color, sum(SalesSales) )

from from SalesSaleswherewhere Model in {'Fiat','Ford'}Model in {'Fiat','Ford'}

and Color = 'Red'and Color = 'Red'and and YearYear between 1994 and 1995between 1994 and 1995

group group byby (Model, (Model, YearYear, Color), Color)withwith ROLLUPROLLUP

Page 48: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

The data after The data after rollroll--upupmodelmodel

fiatfiatfiatfiatfordfordfiatfiatfiatfiatfordfordfiatfiatfordfordALLALL

colorcolor

redredredredredredALLALLALLALLALLALLALLALLALLALLALLALL

yearyear

199419941995 1995 1994199419941994199519951994 1994 ALLALLALLALLALLALL

sum(sum(salessales))

50 50 85 85 808050 50 85 85 80 80 135 135 80 80 215215

Page 49: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

A Simple Cross-Tabular Report With Subtotals

• A database containing: products, customers, sale channels,with the measures amount_sold and quantity_sold.

• Amounts in dollars, aggregated by country, sliced on France and US and the month of September, aggregated by channel

• Half of the values needed for this report would not be calculated just by means of a query that requested SUM(amount_sold) and did a GROUP BY(channel_desc, country_id).

833,224 762,425 70,799 Total699,403638,20161,202Direct Sales133,821124,2249,597 InternetTotalUSFrance

COUNTRYCHANNEL

Page 50: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Cube creationSELECT channels.channel_desc,

countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$

FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND

sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc='2000-09' AND customers.country_id=countries.country_id AND countries.country_iso_code IN ('US','FR')

GROUP BY (channels.channel_desc, countries.country_iso_code)

with CUBE;

Page 51: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Resulting table

638,201 USDIRECT SALES61,202 FRDIRECT SALES699,403 ALLDIRECT SALES124,224 USINTERNET9,597 FRINTERNET133,821 ALLINTERNET762,425 USALL70,799 FRALL833,224 ALLALLSALES ($)COUNTRYCHANNEL_DESC

Page 52: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll-up querySELECT channels.channel_desc, calendar_month_desc,

countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$

FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND

sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US')

GROUP BY (channels.channel_desc, calendar_month_desc, countries.country_iso_code)

with ROLLUP;

Page 53: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

Roll-up result

5,821,739 ALLALLALL

4,886,784 ALLALLDIRECT_SALES

2,451,169 ALL2000-10DIRECT_SALES

1,225,584US2000-10DIRECT_SALES

1,225,584 GB2000-10DIRECT_SALES

2,435,616 ALL2000-09 DIRECT_SALES

1,217,808 US2000-09 DIRECT_SALES

1,217,808 GB2000-09 DIRECT_SALES

934,955 ALLALLINTERNET

478,473 ALL2000-10INTERNET

239,236 US2000-10INTERNET

239,236 GB2000-10INTERNET

456,482 ALL2000-09 INTERNET

228,241US2000-09 INTERNET

228,241GB2000-09 INTERNET

CHANNEL_DESC CALENDAR COUNTRY SALES$

Page 54: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

A ROLL-UP QUERY PRODUCES…… the following sets of rows:• Regular aggregation rows that would be produced by

GROUP BY without using ROLLUP.• First-level subtotals aggregating across country_id

for each combination of channel_desc and calendar_month.

• Second-level subtotals aggregating across calendar_month_desc and country_id for each channel_desc value.

• A grand total row.Note that the roll-up operation, unlike the cube, does

not produce the rows with the “all” value in the first column (it still retains the row containing all “ALL”values, though)

Page 55: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

TypicalTypical DW DW dimensionsdimensions

time:time: 730 730 daysdaysstock stock houseshouses:: 300300productsproducts:: 30.00030.000dailydaily salessales:: 3.0003.000promotionspromotions:: notnot more more thanthan one per one per productproduct soldsoldsalessales:: 730 x 300 x 3000 x 1 = 657 millions.730 x 300 x 3000 x 1 = 657 millions.dimensionsdimensions:: 657 millions x 8 657 millions x 8 attributesattributes

x 4 byte = 21gb.x 4 byte = 21gb.

Page 56: Data Warehouses - Politecnico di Milanohome.deib.polimi.it/schreibe/TeSI/Materials/.../DataWarehouses.pdfData Warehouses Letizia Tanca Politecnico di Milano (with the kind support

ReferencesReferences

• M. Golfarelli, S. Rizzi: Data Warehouse: teoria e pratica della progettazioneMcGraw-Hill, 2002.

• Ralph Kimball: The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data WarehousesJohn Wiley 1996.

• On the Internet: Oracle® Database Data Warehousing Guide (see link on the course page)