Achieving Adaptivity for OLAP-XML Federations

25
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT

description

Achieving Adaptivity for OLAP-XML Federations. Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT. Overview. Background: OLAP-XML federations New challenges XML data changes Slow or unreliable XML sources Schema changes in data sources Other challenges - PowerPoint PPT Presentation

Transcript of Achieving Adaptivity for OLAP-XML Federations

Page 1: Achieving Adaptivity  for OLAP-XML Federations

Achieving Adaptivity for OLAP-XML Federations

Torben Bach PedersenAalborg University

Joint work with Dennis Pedersen, TARGIT

Page 2: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/232

Overview• Background: OLAP-XML federations• New challenges

– XML data changes– Slow or unreliable XML sources– Schema changes in data sources– Other challenges

• Integration in TARGIT architecture• Other applications of the techniques• Conclusion and future work• Related work

Page 3: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/233

Data Warehousing & OLAPMultidimensional analysis: TARGIT Analysis

Page 4: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/234

OLAP• Good for complex ad hoc queries

– Simple: natural, graphical queries– Fast: pre-aggregation

• A number of problems with physical integration– Short-term and varying data needs

• Population, product info, ...– Dynamical data

• Stock quotes, competitor pricing, ...– Data with limited access

• Competitor product info, public databases, ...

Page 5: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/235

OLAP-XML Federations

OLAP-server

Client

Cube

Traditional OLAP architecture:

Page 6: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/236

OLAP-XML Federations• Logical integration of XML data

– External dimensions– External measures

• Data combined at query time

Federation

Client

XMLCube

Page 7: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/237

OLAP-XML Federations• Logical integration of XML data

– External dimensions– External measures

• Data combined at query time• Transparent for users• Flexible: many XML sources• Quick: running in a few mins• Data is always fresh• Performance often comparable to

physical integration

Federation

Client

XMLCube Cube

Page 8: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/238

XPath Queries for Fetching XML<Books>

<Book><Title>1984</Title><Author>Orwell</Author>

</Book><Book>

<Title>Of Mice and Men</Title>

<Author>Steinbeck</Author></Book>

</Books>

/Books/Book[Author=”Steinbeck”]/Title

Federation

Client

XMLCube Cube

XPath

Dimension value

Page 9: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/239

Old And New TARGIT Architecture

Page 10: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2310

New Challenges• Our previous work focused on basic aspects

– Flexibility– General performance– Implementation

• New: what can go wrong? – need for adaptivity– XML data changes– XML sources slow or unreliable– Schema changes (XML, OLAP, federation)

• We often have no control over the XML sources• A solution has broad interest: views over XML sources

Page 11: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2311

XML Data Changes• Basic federation

– XML data is integrated at query time => XML data changes handled automatically

• However, XML data is cached for performance– Cache timeout value ensures fresh data (set

manually or automatically)– 0 cache timeout => always fetch from source

• Only few current XML databases inform about changes– Xyleme allows users to subscribe to changes– Only delta should be transferred

Page 12: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2312

ICE: Information and Content Exchange• Protocol proposed by W3C for automatically informing

about and requesting changes– Supported by major vendors– Push: subscribe to changes and keep cache up-to-date– Pull: request changes from source at query time

Page 13: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2313

Slow and Unreliable XML Sources• Overload, maintenance, HW breakdown, attacks

– Often we no influence on this• Incremental presentation for user

– What if source is too slow or no reply at all? • Inform user that the system is not working…?• Specification of alternative sources

– Several queries per external dimension/measure– Increased fault tolerance, also better performance

Source Server Client

Page 14: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2314

Slow and Unreliable XML Sources• Start several queries and use the fastest

– Always uses the fastest, but heavy load on sources– Use first response time as indicator for total time

• Start one query at a time

• Minimal load on sources, but slower

Fed Fed

Fed Fed?

Fed123

123

312

Page 15: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2315

Slow and Unreliable XML SourcesAlternative sources of lower

quality: better than no data?

Alternatives– Expired cache data– Google, Xyleme, The

WayBack Machine– Backup-disk, tape– Etc.

Source Speed Quality

Local cache Fastest Fresh

Original source Fast? Freshest

Expired cache Fastest Old

Backup source Fast/slow Very old

Page 16: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2316

Slow and Unreliable XML SourcesIn practice?Sources with equal priority chosen at random

Page 17: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2317

Result: Algorithm for Fetching XML Data

Page 18: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2318

Experiments• 1st experiment: fetching a 137 KB dimension

– Start 8 queries, when first 3 respond, (cancel) last 5, when fastest query finish, (cancel) remaining 2

– Fast reply = good indication of overall speed• 2nd experiment: search local cache, then Google cache

Page 19: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2319

Schema Changes In XML SourcesHow to synchronize XML views after schema change?

(solution described in separate paper)

Bibliography

Publisher

PName Book

Author

AName

Title Price

/Bibliography/Author[AName=”Orwell”]/Book/Title

Bibliography

Publisher

PName

Book

Author

AName

Title Price

Page 20: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2320

Additional Challenges• Changes to federation schema

– Cache may be invalidated– Discard affected cache results (unproblematic)

• OLAP data changes– Cache may be invalidated– Less frequent than XML data changes => cache will

often have expired anyway• OLAP schema changes

– Federated schema may be invalidated– Rare and easy to detect (and correct)

Page 21: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2321

Integrating Techniques - Architecture

Page 22: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2322

Integrating Techniques – Query Processing• Query Evaluator splits query into XML+OLAP parts

and determines query plan based on cost• Execution Engine coordinates and executes plan• Cache Manager maintains cache, e.g., through ICE• XML Component interface fetches XML data, chooses

between available XML sources (Algorithm 1)• View Synchronizer handles schema changes• Metadata Manager manages info about external

dimensions and measures + XML component characteristics

Page 23: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2323

Other ApplicationsAll XPath-based views on XML dataLinks to parts of XML documents• Web pages• Documents (DocBook)• Software applicationsand many more…

Automatic recreation of broken linksIncreased fault tolerance and performance using

alternative sources

?

Page 24: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2324

Conclusion and Future Work• Operational problems in OLAP-XML federations• XML data changes• Slow and unreliable XML sources

– Using several sources (Algorithm 1)– Experiment with Algorithm1

• Techniques integrated into federation architecture• Schema evolution and other challenges• Future work

– TARGIT implementation and testing– Using techniques in other applications

Page 25: Achieving Adaptivity  for OLAP-XML Federations

Torben Bach Pedersen · DOLAP 2003 · 04/22/2325

Related Work• Data changes in XML/semistructured documents

– Xyleme + Zhuge • Schema changes in scientific documents

– Not XML• Adaptive/dynamic query optimization

– Telegraph project– We use once per source, rather than per tuple

• Does not consider one or more of: OLAP+XML concepts, schema changes, slow and unreliable sources

• Own previous OLAP-XML work is not adaptive