Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge...

26
Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    1

Transcript of Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge...

Change-Centric Management of Versions in an XML Warehouse

Amélie Marian

Columbia University

Serge Abiteboul, Grégory Cobéna, Laurent Mignet

INRIA-Rocquencourt

VLDB-Sept 2001 Amélie Marian 2

Overview

The Xyleme Project Change Management Version Management

– XIDs– XML Diff– Deltas– Storage of XML documents versions– Implementation and experiments

VLDB-Sept 2001 Amélie Marian 3

The Xyleme Project

A dynamic XML Data Warehouse with high level services:– User-friendly Query Engine– Semantic Data Integration – Version Management– Query Subscription, Change Monitoring services

Xyleme project is now finished Start-up also called Xyleme

VLDB-Sept 2001 Amélie Marian 4

Change Management

Version Management Learning about Changes Monitoring Changes: Query

Subscription Querying the Past:Temporal Queries

VLDB-Sept 2001 Amélie Marian 5

Version Management

Our Requirements: Obtain the current version Get the modifications since time t Subscribe to change notifications, query

changes Compute temporal queries Rebuild the version Vi of a document at

time ti

VLDB-Sept 2001 Amélie Marian 6

Getting the Documents

XML documents are fetched from the web We only have snapshots of the documents

Pr

Catalog

P

Pr

N PN NP

Pr

Camera 300 TV 100 VCR 200

Pr

Catalog

P

Pr

N PN NP

Pr

TV 100 DVD 500 VCR 150

Version 1 Version 2

VLDB-Sept 2001 Amélie Marian 7

XIDs

Unique identifiers needed to track XML nodes through time:

• Track changes on a specific node (ex: a product in a catalog)

• Reconstruct the history of a node

But physically adding an ID attribute to each node is expensive storage-wise

XIDs: allow to attach persistent IDs to every node in a storage efficient manner

VLDB-Sept 2001 Amélie Marian 8

XIDs

XIDs stored separately as a list (XID-map) – List of the nodes IDs in a

postorder traversal of the tree

– XIDnext: gives the next available XID

Compact Representation Document is not

modified

13

3

1 2

15

14

12

7

8 9

10 11

XID-map (1-3,14-15,7-13|16)

VLDB-Sept 2001 Amélie Marian 9

XML Diff

We implemented a XML diff algorithm to compute changes between two versions of a document:– Use of XML structure for matching – Content matching

Linear in the size of the document XML diff has two roles:

– Match nodes– Build the delta

Ongoing work on improving the XML diff

VLDB-Sept 2001 Amélie Marian 10

1412

15

Update11 13

10

97

86

16

Node Matching using a Diff Algorithm

Delete

Diff (V1,V2)delete(5)update(13,150)insert(16,2,(17-21))

New XID-map:(6-10,17-21,11-16|22)

3

4 7 9 12 14

5 10 15

16

XID-map:(1-16|17)

6 8 11 131

2

Insert

18

21

20

1917

Pr

Catalog

P

Pr

N PN NP

Pr

Camera 300 TV 100 VCR 200

Pr

Catalog

P

Pr

N PN NP

Pr

TV 100 DVD 500 VCR 150

Version 1 Version 2

VLDB-Sept 2001 Amélie Marian 11

Edit-Scripts = SEQUENCE

Sequences of basic operations over XML trees:• Delete(n)• Update(n, v)• Insert(m,k,T)• Move(n,k,m)

An Edit Script can be applied to a document D if its operations are consistent with D

An Edit Script applied to a document D will result in a unique document D’

Several Edit Scripts applied to a document D can result in the same document D’

VLDB-Sept 2001 Amélie Marian 12

Deltas (Δ) = SET

We introduce an alternative way of representing changes: Deltas

Δi,j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi,Vj) )

A Delta (Δ) over a document D is the sequence of unit deltas over D:

Δ={Δ1,2,..., Δk-1,k} There is a (almost) unique delta from Vi to Vj

We represent Deltas as XML documents

VLDB-Sept 2001 Amélie Marian 13

Shortcomings of Deltas

Storage Policies

a) V1, Δ1,2,…Δnow-1,now

b) Δ2,1,…Δnow,now-1, Vnow

c) V1, Δ2,1,…Δnow,now-1

d) Δ1,2,…Δnow-1,now, Vnow

Only a) and b) lossless

But we would like to have fast access to:– Vnow

– Δi,now

Deltas are not reversible and cannot be composed (information on position is missing)

VLDB-Sept 2001 Amélie Marian 14

Completed Deltas (Δ+)

Completed deltas contain more information :• Delete(m,k,T)• Update(n, ov, nv)• Insert(m,k,T)• Move(n,k,m,p,q)

Completed Deltas can be reversed and composed

Completed Deltas are in the spirit of some logs in DB systems

15

<delta><unit_delta>…</unit_delta><unit_delta>

<time from=“1” to=“2”/><delete parent=“16” position=“1” xid-map=“(1-5)”>

<Product><Name>Camera</Name><Price>300</Price>

</Product></delete><update xid=“13” new_value=“150” old_value=“200”/><insert parent=“16” position=“2” xid-map=“(17-21)”>

<Product><Name>DVD</Name><Price>500</Price>

</Product></insert>

</unit_delta></delta>

Example of XML Δ+

VLDB-Sept 2001 Amélie Marian 16

Operations on Deltas

Compute with version:

– Vi o Δ+i,j = Vj

– Vi o Δi,j = Vj

Reverse: (Δ+i,j)-1= Δ+

j,i

Compose: Δ+i,j;Δ+

j,k =Δ+i,k

Simplify: Δ+i,j → Δi,j

VLDB-Sept 2001 Amélie Marian 17

Storage of Versions

For a document D (or a query result Q), we store:– Current Version: Vk

– XID-map (as text) of Vk

– Current Δ+ = {Δ+1,2,..., Δ+

k-1,k} When a new version k+1 arrives:

– Compute XML diff between k and k+1, compute Δ+k,k+1

– Replace current version: Vk+1

– Replace XID-map– Append Δ+

k,k+1 to Δ+

VLDB-Sept 2001 Amélie Marian 18

Levels of Versioning

Full versioning is expensive, we support different levels of versioning:– Full Versioning: Vnow + Δ+

– Partial Versioning: Vnow + Δ

– Last Version Update: Vnow + Δnow-1,now

– Change Support: Vnow + XML diff computed for Query Subscription

– Not Versioned: Vnow

VLDB-Sept 2001 Amélie Marian 19

Implementation

Version Manager and XML diff implemented in C++

A change simulator was implemented for tests

A GUI was implemented

20

GUI Interface

VLDB-Sept 2001 Amélie Marian 21

Deltas Statistics

Reasonable when there are not many modifications

Relatively expensive for small documents

Depends on the quality of the diff

Document Size = 331K

0

0,5

1

1,5

2

Modification probablil ity

Siz

e o

f D

elt

a/S

ize

of

Do

cum

en

t

insertupdatedelete

VLDB-Sept 2001 Amélie Marian 22

Deltas Statistics (2) 30% of modifications on

the document From left to right

– Snapshots– Completed Deltas– Deltas: composition and

previous version reconstruction are not possible

– Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost.

30% of modifications

020406080

100120140160

0,5K 4K 45K 331K

Document size

Sto

rage

siz

e/ s

ize

of a

ll s

nap

shot

s

All Snapshots

Completed Deltas

Simple Deltas

Composed Deltas

VLDB-Sept 2001 Amélie Marian 23

Conclusion

Management of Versions based on Change Representation: – Representation in tree data (XML)– Study of storage policies– Implementation of running prototypes

Completed Deltas: a Set of Modifications– Mathematical properties on completed deltas

(algebraic group) Current work on Query Subscription, Continuous

Queries and Changes over Collections of Documents

VLDB-Sept 2001 Amélie Marian 24

References Version Management

– Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001.

– Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999.

– Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990.

– Adiba and Lindsay. Database Snapshots. VLDB 1980. Diff Algorithms

– Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997.

– Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA.

Xyleme– Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001.– Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod

2001.

VLDB-Sept 2001 Amélie Marian 25

Example: Edit-Scripts vs. Deltas

A Possible Edit-Script:Insert(B,1,P)

Insert(C,1,P)

The Delta:Insert(B,2,P)

Insert(C,1,P)

C

P

B A

Version 1

P

A

Version 0

Edit-Scripts Deltas

Relative position

(at time of operation)

Absolute position

(final)

VLDB-Sept 2001 Amélie Marian 26

Example: Missing Information for Delta Composition (Δ(0,2))

Deltas do not give information on parents and positions of deleted elements

Positions of inserted elements in composition cannot be computed

C

P

B A

Version 1

B

P

D A

Version 2

C

P

A

Version 0

Δ(0,1) Δ(1,2) Δ+(1,2)Insert(B,2,P) Delete(C)

Insert (D,2,P)

Delete(C,1,P)

Insert (D,2,P)