BUSINESS INTELLIGENCE Reminds on Data...
Transcript of BUSINESS INTELLIGENCE Reminds on Data...
![Page 1: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/1.jpg)
BUSINESS INTELLIGENCE
Reminds on Data Warehousing
(details at the Decision Support Database course)
Data Sciencce & Business Informatics Degree
![Page 2: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/2.jpg)
BI Architecture2
Laboratory of Data Science
![Page 3: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/3.jpg)
Star-schema datawarehouse
A fact table with star-schema dimension tables only
3
time_key
day
day_of_the_week
month
quarter
year
time
branch_key
street
city
province
country
branch
Sales Fact Table
time_key
item_key
employee_key
branch_key
units_sold
dollars_sold
dollars_costMeasures
item_key
item_name
brand
type
supplier_type
item
employee_key
employee_name
employee_type
employee
Laboratory of Data Science
![Page 4: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/4.jpg)
Snowflake-schema datawarehouse
A fact table with star-schema, snowflake and parent-child dimension tables
4
time_key
day
day_of_the_week
month
quarter
year
time
branch_key
street
city_key
branch
Sales Fact Table
time_key
item_key
employee_key
branch_key
units_sold
dollars_sold
dollars_costMeasures
item_key
item_name
brand
type
supplier_key
item
supplier_key
supplier_type
supplier
city_key
city
province
country
cityemp_key
emp_name
emp_boss_key
employees
Laboratory of Data Science
![Page 5: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/5.jpg)
Which DBMS technology for DW?
Storage technology
Architecture
5
Laboratory of Data Science
![Page 6: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/6.jpg)
Storage
RDBMS: record oriented structure
Laboratory of Data Science
6
Columnar: column oriented structure
Advantages:• Faster Scan
• Data Compression (e.g. State)
![Page 7: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/7.jpg)
Storage
Correlation value-based database
Data cells contain the index to an order-set value
In-memory database
Data is stored in compressed format in main memory
Extraction-based system
Storage of attribute extracted from continuous data
flows (eg., web traffic, sensors)
...
Laboratory of Data Science
7
![Page 8: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/8.jpg)
Architecture
Sequential
SQL query processing by a single
processor
Parallel
SQL query plan processing by a
multi-processor machine, with shared
memory
Distributed (Map-reduce)
SQL query processing distributed to
a set of independent machines
◼ Teradata SQL-MR, Hadoop HiveQL
Laboratory of Data Science
8
![Page 9: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/9.jpg)
BI Architecture10
Laboratory of Data Science
![Page 10: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/10.jpg)
K-dimensional cuboid11
Milk Bread … ... Pasta
Jan 13
… ...
Dec 13
Feb 13
Product.Category
Store.City
Time.Month
OrangeRome
PiseFlorence
An hyper-cube with K axes, with a level of some hierarchy at each axis. A
cell of the cuboid contains the values of metrics for the conditions given by
the cell coordinates.
Lucca
Laboratory of Data Science
![Page 11: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/11.jpg)
Cube navigation by different users12
product
Branch manager look at sales
of his/her stores
for any product and any period
Product managers look at sales
of some products
in any period and in any market
Finance manager look at sales
of a period compared to the previous period
for any product and any market
tim
e
Laboratory of Data Science
![Page 12: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/12.jpg)
Cuboids in SQL
13
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key
GROUP BY L.city, I.brand, T.month
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
Laboratory of Data Science
![Page 13: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/13.jpg)
Roll-up
How many cuboids?14
Product
Time
All
Time
Time
Product
All
All
Drill-Down
Roll-up
Drill-Down
Drill-Down
Roll-up
Laboratory of Data Science
![Page 14: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/14.jpg)
15
Data Cube (extended cube, hypercube)
Total annual sales
of TVs in U.S.A.Date
Cou
ntr
y
*
*TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
*
Laboratory of Data Science
![Page 15: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/15.jpg)
Data cube in SQL Server16
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key
GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
GROUP BY ROLLUP(L.city, I.brand, T.month) - all initial subsequences of the group-by attributes
Laboratory of Data Science
![Page 16: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/16.jpg)
Slice and Dice17
Product
Time
Slice
Product
Time
Product =A,B,C
Laboratory of Data Science
![Page 17: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/17.jpg)
Slice in SQL Server18
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key AND T.year = 2016
GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
Slice
Laboratory of Data Science
![Page 18: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/18.jpg)
Star-join executions in SQL Server
Star-join optimization
automatically detected (vs to be setup in Oracle)
Bitmap join indexes
not available (vs available in Oracle)
Columnstore indexes (since SQL Server 2012)
see docs
◼ http://msdn.microsoft.com/en-us/library/gg492088.aspx
Example (on a copy of sales_fact)◼ CREATE CLUSTERED COLUMNSTORE INDEX cci_sales ON sales_fact_copy
Laboratory of Data Science
19
![Page 19: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/19.jpg)
Materialized views in SQL Server
Create a (normal) viewCREATE VIEW schema.view_name
WITH SCHEMABINDING -- binds the reference tables schema
AS SELECT …, COUNT_BIG(*) as n
FROM …
WHERE …
GROUP BY …
Make an index on itCREATE UNIQUE CLUSTERED INDEX index_name
ON schema.view_name (attributes);
They are called Indexed Views
http://msdn.microsoft.com/en-us/library/ms191432.aspx
20
Laboratory of Data Science
![Page 20: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/20.jpg)
Materialized views in SQL Server21
Laboratory of Data Science
Estimated execution plan Actual execution plan
![Page 21: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/21.jpg)
Analytic SQL
Aggregate functions
MIN, MAX, SUM, COUNT, AVG, VAR, VARP, STDEV, STDEVP
Ranking functions
RANK, DENSE_RANK, NTILE, ROW_NUMBER
Analytic functions (since SQL Server 2012)
CUME_DIST, LEAD, FIRST_VALUE, PERCENTILE_CONT, LAG,
PERCENTILE_DISC, LAST_VALUE, PERCENT_RANK
Laboratory of Data Science
22
![Page 22: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,](https://reader030.fdocuments.net/reader030/viewer/2022041003/5ea6c1cef97c293eee5184b7/html5/thumbnails/22.jpg)
Exercise
For each product family, store country, and quarter,
show the percentage of sales over the total of the
product family
Laboratory of Data Science
23