Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with...

30
Outline Introduction Related Work Goal Challenges The Approach Conclusions Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed [email protected] Universit´ e Libre de Bruxelles Poznan University of Technology July 9, 2015 T BI I D C

Transcript of Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with...

Page 1: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Modelling Data Warehouses with Multiversionand Temporal Functionality

Waqas [email protected]

Universite Libre de BruxellesPoznan University of Technology

July 9, 2015

T BII

D C

Page 2: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 3: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 4: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Data Warehouse Architecture

Data Warehouse

ETL

Extraction, Transformation

Loading

Operational systems

Documents

CRM

ERP

OLAP Analysis

Reporting

Data Mining

Data Warehouse Architecture

Page 5: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Type of Changes in External Data Sources

External data sources (EDSs) change in their:

1 Content• Daily operations

2 Structure• Changes in the business requirements• Changes in the modeled reality• Adaptation of the new technologies• Changes in legislation

Page 6: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Propagating Changes in EDS into Data Warehouses

• As a result of changes in EDSs, DWs also change in their:1 Content

• changes in the product prices

2 Structure• changes in the geographical hierarchies

• The user may be interested in keeping the history of changesin the content and structure

• to reconstruct the state of the business world in the past• to simulate the effect of the future changes• for the audit and accountability purposes

Page 7: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Data Warehouse

ETL

Extraction, Transformation

Loading

Operational systems

Documents

CRM

ERP

OLAP Analysis

Reporting

Data Mining

Data Warehouse Architecture

• What if the user wants to keep the history of changes in thecontent and structure of the data warehouse?

Page 8: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 9: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Available Approaches

1 Slowly Changing Dimensions

2 Temporal Data Warehouses

3 Multiversion Data Warehouses

Page 10: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Slowly Changing Dimensions (SCDs)

• Idea• Three basic and four hybrid responses to changes in the

contents• Basic types are supported by commercial tools such as SSIS,

SSAS

• Drawbacks• Not suitable for temporal data – performance issues• Unable to handle structural changes

• References• R. Kimball and M. Ross. The Data Warehouse Toolkit: The

Definitive Guide to Dimensional Modeling. 2013• Faisal, S. et M. Sarwar. Handling slowly changing dimensions

in data warehouses. JSS, 2014

Page 11: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Temporal Data Warehouses

• Idea• Systems which provide built in support for storing and

querying time varying data• Temporal features are supported by commercial and open

source systems such as Teradata, PostgreSQL

• Drawbacks• Limited support for temporal operations such as coalescing• Unable to handle the structural changes

• References• M. Golfarelli, and S. Rizzi. A survey on temporal data

warehousing. IJDWM, 2009• E. Malinowski, and E. Zimanyi. A conceptual model for

temporal data warehouses and its transformation to the ERand the object-relational models. DaWak 2008.

• J. Eder, C. Koncilia, and T. Morzy. The COMET metamodelfor temporal data warehouses. In Proc. of CAiSE 2006

Page 12: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Temporal Data Warehouses

• Capture the evolution of attribute values by associatingindependent time dimensions

• Valid time – Time when a fact is valid in the modeled reality• Transaction time – Time when a fact is registered in the

system

• Temporal DWs cannot handle the structural changes

Store ID Address Manager VS VE TS TEs1 ABC John 1-Jan-2010 now 3-Jan-2010 UC

Table before change

Store ID Address Manager VS VE TS TEs1 ABC John 1-Jan-2010 now 3-Jan-2010 1-Dec-2012s1 ABC John 1-Jan-2010 1-Jan-2013 1-Dec-2012 UCs1 ABC Doe 1-Jan-2013 now 1-Dec-2012 UC

Table after change

Page 13: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Temporal Data Warehouses

• Capture the evolution of attribute values by associatingindependent time dimensions

• Valid time – Time when a fact is valid in the modeled reality• Transaction time – Time when a fact is registered in the

system

• Temporal DWs cannot handle the structural changes

Store ID Address Manager VS VE TS TEs1 ABC John 1-Jan-2010 now 3-Jan-2010 UC

Table before change

Store ID Address Manager VS VE TS TEs1 ABC John 1-Jan-2010 now 3-Jan-2010 1-Dec-2012s1 ABC John 1-Jan-2010 1-Jan-2013 1-Dec-2012 UCs1 ABC Doe 1-Jan-2013 now 1-Dec-2012 UC

Table after change

Page 14: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Multiversion Data Warehouses (MVDW)

• Idea• Consists of the sequence of DW versions• Each DW version is composed of schema version and data

version

• Drawbacks• Inter and intra-version querying is complex• Storage and performance issues• No support for creating temporal data versions in a schema

version

• References• S. Rizzi and M. Golfarelli. X-time: Schema versioning and

cross-version querying in data warehouses. ICDE, 2007• R. Wrembel, B. Bebel. Metadata management in a

multiversion data warehouse. JODS, 2007

Page 15: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Something More about Cross-Version Queries

Product

ProductKey

Name

UnitPrice

CategoryKey

Sales

StoreKey

ProductKey

TimeKey

Quantity

Amount

Time

TimeKey

Day

Week

Month

Year

Store

StoreKey

Name

Address

Area

CityKey

City

CityKey

Name

Representative

RegionKey

Region

RegionKey

Name

Representative

Category

CategoryKey

Name

Description

Product

ProductKey

Name

UnitPrice

SubcatKey

Sales

StoreKey

ProductKey

TimeKey

Quantity

Amount

Time

TimeKey

Day

Week

Month

Year

City

CityKey

Name

Representative

RegionKey

Region

RegionKey

Name

Director

Subcategory

SubcatKey

Name

CategoryKey

Category

CategoryKey

Name

Description

Store

StoreKey

Name

Address

Manager

CityKey

Product

ProductKey

Name

UnitPrice

SubcatKey

Sales

CityKey

ProductKey

Customerkey

TimeKey

Quantity

Amount

Discount

Time

TimeKey

Day

Week

Month

Year

Customer

CustomerKey

CompanyName

PostalCode

CityKey

City

CityKey

Name

Manager

RegionKey

Region

RegionKey

Name

Director

Subcategory

SubcatKey

Name

CategoryKey

Category

CategoryKey

Name

Description

Figure: Schema versions in the example MVDW

Page 16: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Something More about Cross-Version Queries

8/2014 1/2015 Time

Partially answerable query

today

Not answerable queryAnswerable query

V1 V2 V3

9/2013

Figure: Three versions of a DW and possible cases for a query computingthe value of a schema element present in the current version only

Page 17: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 18: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Project Goal

Design, implement, and evaluate a multiversion data warehousethat can:

• Store temporal data versions in a each schema version• Efficiently query the data stored within a schema version

(intra-version)• Efficiently query the data stored across multiple schema

versions (inter-version)

Schema version 1 Schema version 2 Schema version 3

Inter-version query

Intr

a-ve

rsio

n q

ue

ry

Page 19: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 20: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Challenges

• Modelling• Schema modification operators (SMOs) to derive schema

versions• A meta-model to store the details of the multiple schema

versions• Integrity constraints to ensure inter-schema and intra-schema

data consistency

• Performance• No comprehensive study of temporal algebraic operators for

multidimensional model with temporal and multiversionfunctionality

• Storage• Time varying aggregates are complex e.g. coalescing• Data redundancy overhead in case of multiple versions• An efficient storage model for temporal and multiversion data

Page 21: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Challenges (Cont.)

• Querying• The temporal data stored in multiple schema versions is not

trivial• Missing data and data with different structure across multiple

schema versions

• Technological challenge• Limited support for the temporal features in the

SQL-Standards and thus in commercial systems

Page 22: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 23: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Our Approach

1 Model that supports• Temporal data in schema versions

• W.Ahmed, E.Zimanyi, and R.Wrembel. A Logical Model forMultiversion Data Warehouses. DaWaK 2014

• W.Ahmed, E.Zimanyi, and R.Wrembel. Temporal DataWarehouses: Logical Models and Querying. EDA 2015

2 Query mechanism that supports• Inter-schema and intra-schema querying of data

• W.Ahmed, E.Zimanyi. On Querying Multiversion DataWarehouses. ADBIS 2015

3 Performance• Benchmark

4 Prototype• System as a proof of concept

Page 24: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Our Approach

1 Model that supports• Temporal data in schema versions• W.Ahmed, E.Zimanyi, and R.Wrembel. A Logical Model for

Multiversion Data Warehouses. DaWaK 2014• W.Ahmed, E.Zimanyi, and R.Wrembel. Temporal Data

Warehouses: Logical Models and Querying. EDA 2015

2 Query mechanism that supports• Inter-schema and intra-schema querying of data• W.Ahmed, E.Zimanyi. On Querying Multiversion Data

Warehouses. ADBIS 2015

3 Performance• Benchmark

4 Prototype• System as a proof of concept

Page 25: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Ongoing Work

• Extension of the multiversion DW model with SMOs andtemporal features

• Extension of TPC-DS to use it as the benchmark

• Query rewriting algorithm for the cross-version queries

• Prototype

Page 26: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Planned Publications

1 A Query Mechanism for Multiversion DatawarehouesThe paper will present a system capable of storing multipleschema versions of a DW and querying these versions.Outlet: Bulliten of the Polish Academy of Science. Theproposal has already been accepted by the committee.

2 A Data Warehouse with Multiversion and TemporalFunctionality: A Prototype ImplementationThe paper will present the details of prototype systemimplementation and empirical results obtained from runningthis application on the reference data set.Outlet: ACM SIGMOD (SIGMOD 2016).Expected Submission Deadline: December 2015.

Page 27: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Planned Publications (cont.)

3 Modelling Data Warehouses with Multiversion and TemporalFunctionalityThe paper, with an aim to be published in a journal, willdetail all aspects of the doctoral project.

1 Outlet: Information Systems Journal (IS).2 Outlet: Information and Software Technology (IST).

Expected Submission Deadline: March 2016.

Page 28: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Plan for Fall 2015 - Spring 2016

# Activity Time

1 Return to the home university (ULB) September 20152 Paper presentation in ADBIS 2015 September 20152 Submission of the first journal paper October 20153 Preparation and submission of conference paper December 20154 Preparation and submission of the second journal paper March/April 20155 Thesis writing May, June, July 20156 Thesis defense August/September 2015

Page 29: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Outline

1 Introduction

2 Related Work

3 Goal

4 Challenges

5 The Approach

6 Conclusions

Page 30: Modelling Data Warehouses with Multiversion and Temporal ... · Modelling Data Warehouses with Multiversion and Temporal Functionality Waqas Ahmed waqas.ahmed@ulb.ac.be Universit

Outline Introduction Related Work Goal Challenges The Approach Conclusions

Conclusions

• Maintaining the history of changes in the content andstructure is an important, yet a challenging issue in the fieldof data management

• The temporal and multiversion features allow the user tore-create the state of the business retrospectively orprospectively

• A natural solution to the challenge of the evolution of contentand structure is to combine the both, temporal andmultiversion features as a single solution

• For this, a model, efficient data storage, retrieval and queryingmechanism are needed

• This project aims to address the challenges and present itsoutcome as a proof of the concept application