Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge...
-
date post
22-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge...
Change-Centric Management of Versions in an XML Warehouse
Amélie Marian
Columbia University
Serge Abiteboul, Grégory Cobéna, Laurent Mignet
INRIA-Rocquencourt
VLDB-Sept 2001 Amélie Marian 2
Overview
The Xyleme Project Change Management Version Management
– XIDs– XML Diff– Deltas– Storage of XML documents versions– Implementation and experiments
VLDB-Sept 2001 Amélie Marian 3
The Xyleme Project
A dynamic XML Data Warehouse with high level services:– User-friendly Query Engine– Semantic Data Integration – Version Management– Query Subscription, Change Monitoring services
Xyleme project is now finished Start-up also called Xyleme
VLDB-Sept 2001 Amélie Marian 4
Change Management
Version Management Learning about Changes Monitoring Changes: Query
Subscription Querying the Past:Temporal Queries
VLDB-Sept 2001 Amélie Marian 5
Version Management
Our Requirements: Obtain the current version Get the modifications since time t Subscribe to change notifications, query
changes Compute temporal queries Rebuild the version Vi of a document at
time ti
VLDB-Sept 2001 Amélie Marian 6
Getting the Documents
XML documents are fetched from the web We only have snapshots of the documents
Pr
Catalog
P
Pr
N PN NP
Pr
Camera 300 TV 100 VCR 200
Pr
Catalog
P
Pr
N PN NP
Pr
TV 100 DVD 500 VCR 150
Version 1 Version 2
VLDB-Sept 2001 Amélie Marian 7
XIDs
Unique identifiers needed to track XML nodes through time:
• Track changes on a specific node (ex: a product in a catalog)
• Reconstruct the history of a node
But physically adding an ID attribute to each node is expensive storage-wise
XIDs: allow to attach persistent IDs to every node in a storage efficient manner
VLDB-Sept 2001 Amélie Marian 8
XIDs
XIDs stored separately as a list (XID-map) – List of the nodes IDs in a
postorder traversal of the tree
– XIDnext: gives the next available XID
Compact Representation Document is not
modified
13
3
1 2
15
14
12
7
8 9
10 11
XID-map (1-3,14-15,7-13|16)
VLDB-Sept 2001 Amélie Marian 9
XML Diff
We implemented a XML diff algorithm to compute changes between two versions of a document:– Use of XML structure for matching – Content matching
Linear in the size of the document XML diff has two roles:
– Match nodes– Build the delta
Ongoing work on improving the XML diff
VLDB-Sept 2001 Amélie Marian 10
1412
15
Update11 13
10
97
86
16
Node Matching using a Diff Algorithm
Delete
Diff (V1,V2)delete(5)update(13,150)insert(16,2,(17-21))
New XID-map:(6-10,17-21,11-16|22)
3
4 7 9 12 14
5 10 15
16
XID-map:(1-16|17)
6 8 11 131
2
Insert
18
21
20
1917
Pr
Catalog
P
Pr
N PN NP
Pr
Camera 300 TV 100 VCR 200
Pr
Catalog
P
Pr
N PN NP
Pr
TV 100 DVD 500 VCR 150
Version 1 Version 2
VLDB-Sept 2001 Amélie Marian 11
Edit-Scripts = SEQUENCE
Sequences of basic operations over XML trees:• Delete(n)• Update(n, v)• Insert(m,k,T)• Move(n,k,m)
An Edit Script can be applied to a document D if its operations are consistent with D
An Edit Script applied to a document D will result in a unique document D’
Several Edit Scripts applied to a document D can result in the same document D’
VLDB-Sept 2001 Amélie Marian 12
Deltas (Δ) = SET
We introduce an alternative way of representing changes: Deltas
Δi,j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi,Vj) )
A Delta (Δ) over a document D is the sequence of unit deltas over D:
Δ={Δ1,2,..., Δk-1,k} There is a (almost) unique delta from Vi to Vj
We represent Deltas as XML documents
VLDB-Sept 2001 Amélie Marian 13
Shortcomings of Deltas
Storage Policies
a) V1, Δ1,2,…Δnow-1,now
b) Δ2,1,…Δnow,now-1, Vnow
c) V1, Δ2,1,…Δnow,now-1
d) Δ1,2,…Δnow-1,now, Vnow
Only a) and b) lossless
But we would like to have fast access to:– Vnow
– Δi,now
Deltas are not reversible and cannot be composed (information on position is missing)
VLDB-Sept 2001 Amélie Marian 14
Completed Deltas (Δ+)
Completed deltas contain more information :• Delete(m,k,T)• Update(n, ov, nv)• Insert(m,k,T)• Move(n,k,m,p,q)
Completed Deltas can be reversed and composed
Completed Deltas are in the spirit of some logs in DB systems
15
<delta><unit_delta>…</unit_delta><unit_delta>
<time from=“1” to=“2”/><delete parent=“16” position=“1” xid-map=“(1-5)”>
<Product><Name>Camera</Name><Price>300</Price>
</Product></delete><update xid=“13” new_value=“150” old_value=“200”/><insert parent=“16” position=“2” xid-map=“(17-21)”>
<Product><Name>DVD</Name><Price>500</Price>
</Product></insert>
</unit_delta></delta>
Example of XML Δ+
VLDB-Sept 2001 Amélie Marian 16
Operations on Deltas
Compute with version:
– Vi o Δ+i,j = Vj
– Vi o Δi,j = Vj
Reverse: (Δ+i,j)-1= Δ+
j,i
Compose: Δ+i,j;Δ+
j,k =Δ+i,k
Simplify: Δ+i,j → Δi,j
VLDB-Sept 2001 Amélie Marian 17
Storage of Versions
For a document D (or a query result Q), we store:– Current Version: Vk
– XID-map (as text) of Vk
– Current Δ+ = {Δ+1,2,..., Δ+
k-1,k} When a new version k+1 arrives:
– Compute XML diff between k and k+1, compute Δ+k,k+1
– Replace current version: Vk+1
– Replace XID-map– Append Δ+
k,k+1 to Δ+
VLDB-Sept 2001 Amélie Marian 18
Levels of Versioning
Full versioning is expensive, we support different levels of versioning:– Full Versioning: Vnow + Δ+
– Partial Versioning: Vnow + Δ
– Last Version Update: Vnow + Δnow-1,now
– Change Support: Vnow + XML diff computed for Query Subscription
– Not Versioned: Vnow
VLDB-Sept 2001 Amélie Marian 19
Implementation
Version Manager and XML diff implemented in C++
A change simulator was implemented for tests
A GUI was implemented
VLDB-Sept 2001 Amélie Marian 21
Deltas Statistics
Reasonable when there are not many modifications
Relatively expensive for small documents
Depends on the quality of the diff
Document Size = 331K
0
0,5
1
1,5
2
Modification probablil ity
Siz
e o
f D
elt
a/S
ize
of
Do
cum
en
t
insertupdatedelete
VLDB-Sept 2001 Amélie Marian 22
Deltas Statistics (2) 30% of modifications on
the document From left to right
– Snapshots– Completed Deltas– Deltas: composition and
previous version reconstruction are not possible
– Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost.
30% of modifications
020406080
100120140160
0,5K 4K 45K 331K
Document size
Sto
rage
siz
e/ s
ize
of a
ll s
nap
shot
s
All Snapshots
Completed Deltas
Simple Deltas
Composed Deltas
VLDB-Sept 2001 Amélie Marian 23
Conclusion
Management of Versions based on Change Representation: – Representation in tree data (XML)– Study of storage policies– Implementation of running prototypes
Completed Deltas: a Set of Modifications– Mathematical properties on completed deltas
(algebraic group) Current work on Query Subscription, Continuous
Queries and Changes over Collections of Documents
VLDB-Sept 2001 Amélie Marian 24
References Version Management
– Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001.
– Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999.
– Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990.
– Adiba and Lindsay. Database Snapshots. VLDB 1980. Diff Algorithms
– Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997.
– Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA.
Xyleme– Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001.– Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod
2001.
VLDB-Sept 2001 Amélie Marian 25
Example: Edit-Scripts vs. Deltas
A Possible Edit-Script:Insert(B,1,P)
Insert(C,1,P)
The Delta:Insert(B,2,P)
Insert(C,1,P)
C
P
B A
Version 1
P
A
Version 0
Edit-Scripts Deltas
Relative position
(at time of operation)
Absolute position
(final)
VLDB-Sept 2001 Amélie Marian 26
Example: Missing Information for Delta Composition (Δ(0,2))
Deltas do not give information on parents and positions of deleted elements
Positions of inserted elements in composition cannot be computed
C
P
B A
Version 1
B
P
D A
Version 2
C
P
A
Version 0
Δ(0,1) Δ(1,2) Δ+(1,2)Insert(B,2,P) Delete(C)
Insert (D,2,P)
Delete(C,1,P)
Insert (D,2,P)