A Transactional Model for Data Warehouse Maintenance
description
Transcript of A Transactional Model for Data Warehouse Maintenance
11
A Transactional Model A Transactional Model for Data Warehouse for Data Warehouse MaintenanceMaintenance
Authored by:Jun Chen, Songting Chen, Elke A.
RundensteinerPublished in ER’2002, Finland
Database Systems Research Group Worcester Polytechnic Institute
22
Data WarehousingData Warehousing
Data Warehouse
Wrapper
. .
.
DWMS
Wrapper
Base
Base
Wrapper
Base
Data Integration from Remote Base SourcesData Integration from Remote Base Sources Difficult and Labor-IntensiveDifficult and Labor-Intensive Better Do it only ONCE and Materialize the ResultsBetter Do it only ONCE and Materialize the Results Share Materialized Data by Many ApplicationsShare Materialized Data by Many Applications
33
Data Warehouse Data Warehouse MaintenanceMaintenance Motivation: Keep Data Warehouse (DW) Update-to-Motivation: Keep Data Warehouse (DW) Update-to-DateDate Base Base ChangesChanges over Time over Time
Source Data UpdatesSource Data Updates insert, delete, updateinsert, delete, update
Source Schema ChangesSource Schema Changes add, drop, renameadd, drop, rename
Basic Idea: Basic Idea: IncrementalIncremental instead of Re-computation instead of Re-computation Re-computation may take weeksRe-computation may take weeks
44
General Maintenance General Maintenance AlgorithmsAlgorithms View Maintenance (VM)View Maintenance (VM)
Incrementally incorporate source data updatesIncrementally incorporate source data updates [BLT86], [GMS93], [ZGH+95], [SBC+00][BLT86], [GMS93], [ZGH+95], [SBC+00]
View Synchronization (VS)View Synchronization (VS) Rewrite data warehouse view definition after the Rewrite data warehouse view definition after the schema schema of the source changedof the source changed [NLR98], [LNR02][NLR98], [LNR02]
View Adaptation (VA)View Adaptation (VA) Adapt view extent after the view definition changedAdapt view extent after the view definition changed [NR99], [GMR+01][NR99], [GMR+01]
55
DW Maintenance ExampleDW Maintenance Example
CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,
F.FlightNoFROM Customer C, FlightRes FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;
Customer FlightRes
View: Asia_Traveller
MAMAEllenEllen
WPIWPIDaveDave
AddressAddressNameName DestDestFlightNoFlightNoAgeAgeNameName
EuropeEuropeUA77788UA777882222SteveSteve
AsiaAsiaAA8384AA83842222DaveDave
AA8384AA8384WPIWPIDaveDave
FlightNoFlightNoAddressAddressNameName
Insert ( ‘Steve’, ‘Boston’)
Select FlightNo from FlightRes where Name=‘Steve’
66
Maintenance Anomaly Maintenance Anomaly ProblemProblem
Customer
MAMAEllenEllen
WPIWPIDaveDave
AddressAddressNameName
FlightRes
DestDestFlightNoFlightNoAgeAgeNameName
EuropeEuropeUA77788UA777882222SteveSteve
AsiaAsiaAA8384AA83842222DaveDave
View: Asia_Traveller
AA8384AA8384WPIWPIDaveDave
FlightNoFlightNoAddressAddressNameName
1. Insert ( ‘Steve’, ‘Boston’)
3. Select FlightNo from FlightRes where Name=‘Steve’
2. Rename (FlightRes, FlightReservation)
Broken Query!
77
Inside Broken QueryInside Broken Query Two TransactionsTwo Transactions
Base Update TransactionBase Update Transaction w(Bw(Bii)c(B)c(Bii))
DW Maintenance TransactionDW Maintenance Transaction r(Br(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)
Read-write conflicts between two transactionsRead-write conflicts between two transactions Two Independent TransactionsTwo Independent Transactions w(Bw(Bii)) / / r(Br(Bii))
Data Update Data Update w(Bw(Bii): ): Incorrect Query Results Incorrect Query Results [ZGH+95][ZGH+95] Schema Change Schema Change w(Bw(Bii): ): Broken QueryBroken Query
88
A Transactional ApproachA Transactional Approach A Global Transaction ModelA Global Transaction Model
DWMS_TransactionDWMS_Transaction Integrates both Integrates both base update transactionbase update transaction and its and its corresponding corresponding DW maintenance transactionDW maintenance transaction w(Bw(Bii)c(B)c(Bii)r(B)r(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)
Maintenance AnomalyMaintenance Anomaly Rephrased to read-write conflicts of DWMS_TransactionsRephrased to read-write conflicts of DWMS_Transactions
w(Bw(Bii)c(B)c(Bii)r(B)r(B11)r(B)r(B22)…)…r(Br(Bjj))…r(B…r(Bnn)w(DW)c(DW) )w(DW)c(DW) w(Bw(Bjj))c(Bc(Bjj)r(B)r(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)
99
Serializability of Serializability of DWMS_TransactionDWMS_Transaction
TheoremTheorem A history of DWMS_Transactions S is serializable A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions.same DWMS_Transactions.
Basis for Solving Anomaly ProblemsBasis for Solving Anomaly Problems To solve the anomaly problem, To solve the anomaly problem, we need all DWMS_Transactions serializable. we need all DWMS_Transactions serializable.
1010
Traditional Serializability Traditional Serializability AlgorithmsAlgorithms
Lock-basedLock-based Reads / writes acquire locks for access to shared Reads / writes acquire locks for access to shared resourcesresources Transactions block each otherTransactions block each other
Multiversion-basedMultiversion-based Write on a version, read on another versionWrite on a version, read on another version Transactions do not block each otherTransactions do not block each other
1111
Traditional Serializability Traditional Serializability AlgorithmsAlgorithms
Lock-basedLock-based Read / write would need to lock data in sources? Read / write would need to lock data in sources? Not desirable in DW environmentNot desirable in DW environment
Data sources are autonomousData sources are autonomous Not realistic to impose locking on themNot realistic to impose locking on them
Multiversion-basedMultiversion-basedDo not block each otherDo not block each other Desirable in DW environmentDesirable in DW environment
DW and data sources do not block each otherDW and data sources do not block each other Need to maintain versions somewhereNeed to maintain versions somewhere
1212
TxnWrap: A Multiversion TxnWrap: A Multiversion AlgorithmAlgorithm
CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,
F.FlightNoFROM Customer C, FlightRes FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;
Customer
MAMAEllenEllen
WPIWPIDaveDave
AddressAddressNameName
FlightRes
DestDestFlightNoFlightNoAgeAgeNameName
EuropeEuropeUA77788UA777882222SteveSteve
AsiaAsiaAA8384AA83842222DaveDave
View: Asia_Traveller
AA8384AA8384WPIWPIDaveDave
FlightNoFlightNoAddressAddressNameName
CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,
F.FlightNoFROM Customer’ C,FlightRes’ FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;
Wrapper
FlightRes’ Meta Relation
……………………
……………………
…………
NameNameFli’Fli’
D.D.F.F. A.A. N.N.
WrapperCustomer’
Meta Relation
MAMAEllenEllen
WPIWPIDaveDave
AddressAddressNameName
AddressAddressCust’Cust’
NameNameCust’Cust’
AttrAttrRelRel AttrAttrRelRel
1313
Versioned WrapperVersioned Wrapper
Semantics: life time of a tuple is #born <= time < #dead
Wrapper for Customer
NamNamee
AddresAddresss
#born#born #dead#dead
DaveDave WPIWPI 00
EllenEllen MAMA 00
Relation Customer’
ReRell
AttrAttr Rel’Rel’ AttrAttr’’
#bor#bornn
#dea#deadd
C’C’ NamNamee
-- -- 00
C’C’ AddrAddr..
-- -- 00
Meta Relation
1414
Source Updates on Versioned Source Updates on Versioned WrapperWrapper
Transcation 2:
Drop Customer.Address;
Relation Customer’ (Init)
Transaction1:
1. DELETE FROM Customer C
WHERE C.Name = ‘Dave’;
2. INSERT (‘Steve’, ‘Boston’);
MAMA
WPIWPI
AddresAddresss
00
00
#born#born
EllenEllen
DaveDave
#dea#deadd
NamNamee
Relation Customer’ (state 1 )
00MAMAEllenEllen
11BostonBostonSteveSteve
WPIWPI
AddresAddresss
00
#born#born
11DaveDave
#dead#deadNamNamee
Relation Customer’ (state 2 )
00MAMAEllenEllen
11BostonBostonStoveStove
WPIWPI
AddresAddresss
00
#born#born
11DaveDave
#dead#deadNamNamee
Meta Relation (state 2 )
--
--
Rel’Rel’
--
--
Attr’Attr’
2200Addr.Addr.C’C’
00NameNameC’C’
#dead#dead#born#bornAttrAttrRelRel
1515
DW Maintenance Query Rewritten DW Maintenance Query Rewritten for Versioned Wrapperfor Versioned Wrapper
The maintenance query issued in Transaction2:
SELECT Name, Address
FROM Customer
WHERE condition;
Rewritten versioned maintenance query:
SELECT Name, Address
FROM Customer’
WHERE condition and
#born <= 2 and #dead > 2;
Relation Customer’ (State 1 )
00MAMAEllenEllen
11BostonBostonStovStovee
WPIWPI
AddresAddresss
00
#born#born
11DaveDave
#dead#deadNamNamee
1616
Performance EvaluationPerformance Evaluation ImplementationImplementation
In JavaIn Java Platform: Oracle, JDBC on Windows NTPlatform: Oracle, JDBC on Windows NT Embedded in DyDa [CCZ+01] System at WPI Embedded in DyDa [CCZ+01] System at WPI
TestbedTestbed 6 data sources with one relation each6 data sources with one relation each Each relation has 4 attributes and 100,000 tuplesEach relation has 4 attributes and 100,000 tuples One materialized joined view over these data sourcesOne materialized joined view over these data sources TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)
1717
Data Update ProcessingData Update Processing
0
0.1
0.2
0.3
0.4
0.5
0 100 200 300 400 500 600 700 800 900 1000
SWEEP TxnWrap # Concurrent DUs
Time (s)
1818
Schema Change ProcessingSchema Change Processing
0100200
300400500600700
800900
1000
0 6 12 18 24 30 36 42 48 54 60
DyDa Abort of DyDaTxnWrap Abort of TxnWrap
Time (s)
Time Interval (s)
Time (s)
1919
Related WorkRelated Work View MaintenanceView Maintenance
View Maintenance / Synchronization / Adaptation View Maintenance / Synchronization / Adaptation
Maintenance AnomalyMaintenance Anomaly ECA [ZGH+95], SWEEP [AAS+97] handles only ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updatesconcurrent data updates
Compensation-basedCompensation-based Performance degrades at a high loadPerformance degrades at a high load
Multi-version AlgorithmsMulti-version Algorithms 2-version, n-version, unlimited-version algorithms 2-version, n-version, unlimited-version algorithms [MPL92][MPL92]
2020
ConclusionsConclusions Identify the Maintenance Anomaly Problem in Identify the Maintenance Anomaly Problem in mixed model environmentmixed model environment
Design a global Transaction DWMS_Transaction Design a global Transaction DWMS_Transaction model that integrates both source update model that integrates both source update transaction and maintenance transaction.transaction and maintenance transaction.
Rephrase the maintenance anomaly in terms of Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactionsserializability of DWMS_Transactions Propose multiversion algorithm to achieve Propose multiversion algorithm to achieve serializabilityserializability
Implemented the maintenance solution in DydaImplemented the maintenance solution in Dyda Achieve stable performance under various Achieve stable performance under various workloadsworkloads
2121
Other Activities and Future Other Activities and Future WorkWork Batching of updates into more complex Batching of updates into more complex maintenance maintenance plans plans
Parallelism of maintenance processesParallelism of maintenance processes Support more complex views, e.g., aggregationSupport more complex views, e.g., aggregation Generalize to more change typesGeneralize to more change types Provide alternate view synchronization algorithmsProvide alternate view synchronization algorithms Discovery of changes by non-cooperating sourcesDiscovery of changes by non-cooperating sources Discovery of Discovery of meta data in terms of source meta data in terms of source relationships of distributed sourcesrelationships of distributed sources
Move beyond relational middle-layer modelMove beyond relational middle-layer model
2222
Questions?Questions?