ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML...

ArchIS: An Efficient Transaction-Time

Temporal Database System

Built on Relational Databases and XML

Fusheng Wang

University of California, Los Angeles

Motivation: Temporal Applications

Financial applications

Record-keeping applications

Scheduling applications

Scientific applications

Most database applications are temporal in nature:

Temporal Databases: the Reality

Over 40 temporal data models and query languages have been proposed in the past

A long struggle to get around the limitations of RDBMS

No DBMS vendors have moved aggressively to extend SQL with temporal support

What’s Needed?

Expressive temporal representations and data models with minimal or no extension

Powerful languages for temporal queries with minimal or no extension

Indexing, clustering and query optimization techniques for efficient query support

Architectures that bring these together

A temporal database system that provides:

Outline

Motivation

Viewing Relation History in XML

Temporal Queries with XQuery

The ArchIS System

Performance Study

Database Compression

Conclusion

Background: Publishing Relational Database as XML

Publishing relational DBs as XML

as actual XML documents: SQL/XML

as XML views: SilkRoute, XPeranto

Our proposal: view the history of relational DBs as XML documents:

Such history can be naturally represented in XML, without any extension to the data model

Temporal queries can be expressed in XQuery as is—without any extension to the language

Amenable for efficiently implementations

Temporal Grouping in XML

Temporal data models can be classified as:

Temporally ungrouped

Temporally grouped

Temporally grouped data models have more expressive power and are more natural for users

It is difficult to fit temporally grouped models into RDBMS

Temporally grouped data model can be represented well in XML

Example: Transaction-Time History of Tables

Timestamped tuple snapshots (temporally ungrouped)name empno salary title deptno DOB start end

Bob 10003 60000 Engineer d01 1945-04-09 1995-01-01 1995-05-31

Bob 10003 70000 Engineer d01 1945-04-09 1995-06-01 1995-09-30

Bob 10003 70000 Sr Engineer d02 1945-04-09 1995-10-01 1996-01-31

Bob 10003 70000 Tech Leader d02 1945-04-09 1996-02-01 1996-12-31

name empno salary title deptno DOB

1995-01-01:1996-12-31

1995-01-01:1996-05-31

Engineer1995-01-01:1995-09-30

1995-01-01:1995-09-30

1945-04-09

1995-01-01:1996-12-3170000

1995-06-01:1996-12-31

Sr Engineer1995-10-01:1996-01-31

1995-10-01:1996-12-31Tech Leader1996-02-01:1996-12-31

Temporally grouped history of employees

XML Representation of DB History

<empno tstart="1995-01-01" tend="1996-12-31">10003</empno><name tstart="1995-01-01" tend="1996-12-31">Bob</name><salary tstart="1995-01-01" tend="1995-05-31">60000</salary><salary tstart="1995-06-01" tend="1996-12-31">70000</salary><title tstart="1995-01-01" tend="1995-09-30">Engineer</title><title tstart="1995-10-01" tend="1996-01-31">Sr Engineer</title><title tstart="1996-02-01" tend="1996-12-31">Tech Leader</title><deptno tstart="1995-01-01" tend="1995-09-30">d01</deptno><deptno tstart="1995-10-01" tend="1996-12-31">d02</deptno><DOB tstart="1995-01-01" tend="1996-12-31">1945-04-09</DOB>

</employee>

</employees>

<empno tstart="1995-01-01" tend="1996-12-31">10003</empno><name tstart="1995-01-01" tend="1996-12-31">Bob</name><salary tstart="1995-01-01" tend="1995-05-31">60000</salary><salary tstart="1995-06-01" tend="1996-12-31">70000</salary><title tstart="1995-01-01" tend="1995-09-30">Engineer</title><title tstart="1995-10-01" tend="1996-01-31">Sr Engineer</title><title tstart="1996-02-01" tend="1996-12-31">Tech Leader</title><deptno tstart="1995-01-01" tend="1995-09-30">d01</deptno><deptno tstart="1995-10-01" tend="1996-12-31">d02</deptno><DOB tstart="1995-01-01" tend="1996-12-31">1945-04-09</DOB>

</employee>

</employees>

Advantages of XML Representations

The attribute value history is grouped, and can be queried directly

The H-document has a well-defined schema generated from the current table

The interval constraints are maintained in the updates

Outline

Motivation

The ArchIS System

Performance Study

Conclusion

XQuery: the coming standard query language for XML

With XQuery, we can specify temporal queries without any extension:

Temporal projection, snapshot queries, temporal joins, interval queries

Complex queries: A SINCE B, continuous periods, period containment

Temporal projection: retrieve the salary history of “Bob”:

element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }

Snapshot queries: retrieve the departments on 1996-01-31:

for $d in doc("depts.xml")/depts/dept[tstart(.) <= "1996-01-31" and tend(.) >= "1996-01-31"]let $n := $d/name[tstart(.)<="1996-01-31" and tend(.)>="1996-01-31"]let $m := $d/manager[tstart(.)<="1996-01-31" and tend(.)>=

"1996-01-31"]return( element dept{$n,$m } )

for $d in doc("depts.xml")/depts/dept[tstart(.) <= "1996-01-31" and tend(.) >= "1996-01-31"]let $n := $d/name[tstart(.)<="1996-01-31" and tend(.)>="1996-01-31"]let $m := $d/manager[tstart(.)<="1996-01-31" and tend(.)>=

"1996-01-31"]return( element dept{$n,$m } )

Temporal Functions

Shield the user from the low-level details used in representing time, e.g., “now”

Eliminate the need for the user to write complex functions, e.g., coalescing

Predefined functions:Restructuring: coalese($l)Period comparison : toverlaps, tprecedes, tcontains, tequals,

tmeetsDuration and date/time: tstart($e), tend($e), timespan($e)

telement(Ts, Te): constructs an empty element element timestamped as tstart=Ts, tend=Te

Support for ‘now’

‘now’: no change until now Internally, “end of time” values are used to denote

‘now’, e.g., 9999-12-31 Intervals are only accessed through built-in

functions: tstart() returns the start of an interval, tend() returns the end or CURRENT_DATE if it’s different from 9999-12-31

In the output, tend value can be: “9999-12-31”CURRENT_DATE by using rtend($e) that recursively

replaces all the occurrence of 9999-12-31 with the current date,

“now”, using externalnow($e) that recursively replaces all the occurrence of \9999-12-31" with the string \now".

Outline

Motivation

The ArchIS System

Performance Study

Conclusion

The ArchIS System

Two approaches are possible for storing and querying H-documents (H-views)

Native XML database approach: store H-documents directly into XML DB

XML-enabled RDBMS. Design issues include:

mapping (shredding) the XML views representing the H-documents into tables (H-tables)

translation of queries from the XML views to the H-tables

indexing, clustering and query mapping techniques

ArchIS: Archival Information System

The ArchIS System: Architecture

H-tables

Relational DataCurrent Database

Active Rules/update logs

Temporal XML Data

SQL Queries

Temporal XML Queries

H-views(H-documents)

AARRCCHHIISS

H-tables

AssumptionsEach entity or relation has a unique key ( or

composite keys) to identify it which will not change along the history. e.g., employee: empno

H-tables:attribute history table: store history of each attributekey table: built for the keyglobal relation table: record the history of relations

e.g.: current database:employee(empno, name, sex, DOB, deptno, salary,

title)

H-tables (cont’d)

current table

H-tables

employee global relation table

relations(relationname, tstart, tend)

empno key table employee_id(id, tstart, tend)

name attribute history

employee_name(id, name, tstart, tend)

… …

salary employee_salary(id, salary, tstart, tend)

title employee_title(id, title, tstart, tend)

H-tables (cont’d)

Sample contents of employee_salary:

ID SALARY TSTART TEND ======= ======= ========== ========== 100022 58805 02/04/1985 02/04/1986 100022 61118 02/05/1986 02/04/1987 100022 65103 02/05/1987 02/04/1988 100022 64712 02/05/1988 02/03/1989 100022 65245 02/04/1989 02/03/1990 100023 43162 07/13/1988 07/13/1989 ...

Updating Table Histories

Changes in the current database can be tracked with either update logs or triggers

DB2: triggers

ArchIS: update logs

Query Mapping

General purpose query mapping: XPeranto

In ArchIS, we have well-defined mapping between H-documents (or H-views) and H-tables

We map temporal XQuery queries into SQL, utilizing SQL/XML

SQL/XML is a new standard to map between RDBMS and XML

Both tag-binding and structure construction is pushed inside the relational engine, thus be very efficient

SQL/XML Publishing Functions

XMLElement and XMLAttribute

XMLAgg

select XMLElement (Name "dept", XMLAttributes (tstart as "tstart", tend as "tend"), deptname) from dept where deptname = ‘Sales’

<dept tstart = "02/04/1985" tend = "12/31/9999"> Sales </dept>

select XMLElement (Name as "new_employees", XMLAttributes ("02/04/2003" as "Since") XMLAgg (XMLElement (Name as "employee", e.name))from employee_name as ewhere e.tstart >= ‘02/04/2003’ <new_employees Since =

"02/04/2003"> <employee>Bob</employee> <employee>Jack</employee></new_employees>

<new_employees Since = "02/04/2003"> <employee>Bob</employee> <employee>Jack</employee></new_employees>

XQuery Mapping to SQL with SQL/XML

select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary)))from employee_salary as S, employee_name as Nwhere N.id = S.id and N.name = 'Bob'group by N.id

Temporal projection: retrieve the salary history of “Bob”:

element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }

XQuery Mapping to SQL with SQL/XML: Steps

Identification of variable rangeMap variables in FOR/LET clause into underlying H-

tables

Generation of join conditionsThere is a join condition any pair of distinct tuple

variables: join them by ids

Translation of built in functionsMap built-in temporal functions in XQuery into

functions in ArchIS

Output generationuse XMLElement and XMLAgg constructs

Temporal Clustering and Indexing

Tuples in H-tables are stored in the order of updates, thus neither temporally clustered nor clustered by objects

Traditional indexes such as B+ Tree will not help on snapshot queries, and better temporal clustering is needed

For every segment, usefulness: U = Nlive/Nall

At the beginning, U =100%, and it decreases with updates

The minimum tolerable usefulness: Umin

Segment-based Clustering Scheme

Segment 1

Segment 2

Segment 3

segstart1 segend1 segstart2 segend2 segstart3 segend3

tstarttuple <= segendSEG

tendtuple >= segstartSEG

tstarttuple <= segendSEG

tendtuple >= segstartSEG

Segment-based Clustering Scheme

Initially all tuples for an attribute history table are archived in a live segment SEGlive with usefulness U =100%. With updates, when U drops below Umin:

1. A new segment is allocated;

2. The interval of this segment is recorded in the table segment(segno, segstart, segend);

3. All tuples in SEGlive are copied into a new segment Si sorted by id;

4. All live tuples in SEGlive are copied into a new live segment SEGlive', and the old live segment is dropped;

After that, the new segment SEGlive’ becomes the new starting segment for updates

Segment-based Clustering Scheme (cont’d)

Sample segments: Segment1 (01/01/1985 - 10/17/1991): ID SALARY TSTART TEND 100002 40000 02/20/1988 02/19/1989 100002 42010 02/20/1989 02/19/1990 100002 42525 02/20/1990 02/19/1991 100002 42727 02/20/1991 12/31/9999 ... Segment2 (10/18/1991 - 07/08/1995): ID SALARY TSTART TEND 100002 42727 02/20/1991 02/19/1992 100002 45237 02/20/1992 02/18/1993 100002 46465 02/19/1993 02/18/1994 100002 47418 02/19/1994 02/18/1995 100002 47273 02/19/1995 12/31/9999 ...

Advantages of Segment-based Clustering Scheme

The current live segment always has a high usefulness, assuring efficient updates;

Records are globally temporally clustered on segments;

For snapshot queries, only one segment is used; for interval queries, only segments involved are used;

Flexibility to control the number of redundant tuples in segments with Umin

Storage Usage of Segment-based Clustering

0.0 0.1 0.2 0.3 0.41.0

1/(1-Umin

) Testing Data(U

min=0)

Relative storage size with different Umin

Nseg <= N0/(1-Umin)

Query Performance on Temporal Data with Segment-based Clustering

Q1 Q2 Q3 Q4 Q5 Q6

ArchIS without segment-based clustering ArchIS with segment-based clusteringQueries:

Point: Q1Snapshot: Q2Interval: Q5History: Q3, Q4, Q6

Outline

Motivation

The ArchIS System

Performance Study

Conclusion

Performance Study: Experimental Setup

Systems: Tamino, DB2, and ArchISArchIS uses BerkeleyDB as its storage manager, and it

builds on top of it a SQL query engine

Temporal data set: the history of 300,024 employees over 17 yearsThe simulation models real world salary increases,

changes of titles, and changes of departmentsThe size of the XML data is 334MBThe single large XML document is cut into a collection

of 15,000 small XML documents with around 25KB each

Machine: Pentium IV 2.4GHz PC with RedHat 8.0

Performance Study: Query Performance

Q1 Q2 Q3 Q4 Q5 Q6

DB2 ArchIS Tamino

snapshot query Q2 on ArchIS is 137 times faster than that on Tamino;interval query Q5 is 91 times faster; history Q6 is 25 times faster; Q4 4 times faster, and Q3 near 3 times faster.Tamino with clustering: snapshot Q2 is 3.3 times faster than without clustering ( still 41times slower than archIS); interval query Q5 is 2.9 times faster than without clustering( still 31 times slower than on ArchIS); history queries are much slower

DB2 and ArchIS: with clusteringTamino: without clustering

Storage Utilization

DB2 ArchIS Tamino (with compression)

Outline

Motivation

The ArchIS System

Performance Study

Conclusion

The disparity between CPU/memory and disk speeds is becoming larger and larger

Cost to read one IDE disk page: 14ms

Cost to uncompress one page: 1.1ms(500MHz CPU) 0.26ms(2.4GHz CPU)

Cost to retrieve one compressed page: 14ms + 0.26ms = 14.3ms

Cost to retrieve uncompressed pages (3.6 pages): 14ms x 3.6 = 50.4ms

Page-based Compression: PageZIP

Traditional data compression tools: compress a file as a whole

PageZIP: page-based compression and uncompression at the granularity of a page

Based on gzip library: zlib

Benefit: save space; point, snapshot or interval queries only retrieve a small fraction of the history, and can be efficient

PageZIP

Segment 1

Segment n

ID: 1001 - 1100page 2 ID: 1100 - 1203page 3 ID: 1203 - 1331…

Storage Utilization with Compression

For each attribute history table, we compress it as a sequence of pages and store each page as a BLOB in a RDBMSemployee_salary (sid, salary, tstart, tend) =>

employee_salary_blob(pageno, startsid, endsid, pageblob)

Tamino DB2 ArchIS

1.5 without compression

with compression

Query Performance with Compression

Q1 Q2 Q3 Q4 Q5 Q6

ArchIS without compression ArchIS with compression

DB2 without compression DB2 with compression TaminoSeconds

Update Performance

For RDBMS, only the current segment is used for updates. For Tamino, current data and historical data are clustered together

Update an employee’s salary:

DB2: 0.29 seconds; Tamino: 1.2 seconds

Assume that every employee gets updated once a year: about 1/260 of the total employee get updated every day on average

DB2: 1.52 seconds; Tamino: 15 seconds

In the worse case for segment-based archiving: 39 seconds for copying segments and 36 segments for compression: but only once

Summary

We built a transaction time temporal database on RDBMS and XML, with:XML to support temporally grouped (virtual)

representations of the database historyXQuery to express powerful temporal queries on such

views temporal clustering for managing the actual historical

data in a RDBMSSQL/XML for executing the queries on the XML views

as equivalent queries on the relational DBcompression as option for efficient storage

ArchIS provides a unified solution for a wide spectrum of temporal application problems

Future Work

Friendly temporal query interfaces based on temporally grouped models

Other clustering and indexing techniques to be investigated

Other efficient data compression techniques proposed for XML data to be investigated

Apply the approach to valid-time DB and bi-temporal DB

Apply the approach to OODBMS and semi-structured data model

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML...

Documents

Transcript of ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML...

Máy nén khí Fusheng

Relational Model and Relational Algebra

Archis Interventions 2010_Volume 26 insert

Fusheng AD Series Pumps Parts Catalogue (1)

Archis Interventions in Prishtina - Manual

Handleiding Archis 3 Zoeken & Vinden versie 0...Handleiding Archis 3 Zoeken & Vinden versie 0.9 Ter introductie Deze handleiding voor het Zoeken & Vinden deel van Archis 3 is een voorlopige

Hansong Xue*, Gang Yang, Di Li, Zhihui Xing and Fusheng ...

The Relational Data Model and Relational Database Constraints · »The Relational Data Model »Relational Model Constraints »Update Operations »Relational Algebra . Relational Model

Chapter 2: Relational Model - WordPress.com · Chapter 2: Relational Model Structure of Relational Databases Fundamental Relational-Algebra-Operations Additional Relational-Algebra-Operations

Lina Estates - Villa Archis Marina di Pietrasanta Toscana Italia

Bitzer CSH Series To Fusheng - avahvacproducts · CSH Series To Fusheng Competitive Replacement Guideline M-BSR-EC3-201604 Bitzer CSH Series To Fusheng Competitive Replacement Guide

Fusheng Co., Ltd.

Relational Database & Relational Algebra

Chapter 3: Relational Model · Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations

Relational Algebra. 2 Outline Relational Algebra Unary Relational Operations Relational Algebra Operations from Set Theory Binary Relational Operations.

Bitzer CSH Series To Fusheng - avahvacproductsavahvacproducts.com/uploads/Conversion_manual_Bitzer_CSH_vs... · Bitzer CSH Series To Fusheng Competitive Replacement Guideline M-BSR-EC3-201604

Historical XML Databases Fusheng Wang and Carlo Zaniolo University of California, Los Angeles.

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations.

PT Fusheng Compressor Indonesia.compressed.pdf

The Relational Model & Relational Algebra