Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for...

16
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt, Feng Tian, Yuan Wang

description

Why is this problem important? Transform Internet environment –From passive pages, e.g. Google search Pull by users –Active contents, e.g. triggers Push new events to users Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month Continuous queries –to deliver the large amounts of frequently changing information. –Question? Is this query change-based or timer-based?

Transcript of Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for...

Page 1: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Chapter 9: Web Services and Databases

• Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases

• Authors: Jianjun Chen, David J. DeWitt, Feng Tian, Yuan Wang

Page 2: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

NiagaraCQ: A Scalable Continuous Query System for Internet Databases

• Problem– Problem Statement -– Why is this problem important?– Why is this problem hard?

• Approaches– Approach description, key concepts– Contributions (novelty, improved)– Assumptions

Page 3: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Why is this problem important?• Transform Internet environment

– From passive pages, e.g. Google search • Pull by users

– Active contents, e.g. triggers• Push new events to users

• Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month

• Continuous queries – to deliver the large amounts of frequently changing information.

– Question? Is this query change-based or timer-based?

Page 4: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Problem Statement• Given

– Frequently changing information– A large group of Continuous queries

• Find:– Method to answer all queries

• Objectives– Scalability to a very large set of continuous queries

• Constraints– Many queries are similar (especially in Internet).– Information required by the continuous queries and intermediate

results do not fit in memory.– XML dataset, XML QL query language– Queries may be added/removed asynchronously

Page 5: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Why is this problem Hard?

• Scale of Internet– Support millions of continuous queries– Potentially large number of web users

• Complex queries– Support a large number of triggers, – Expressed as complex queries,– Against web-resident data sets.

• Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month

Page 6: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Novelty of Contribution

• Related Work– Simple approach: optimizing queries independently– Grouping Approach: Group similar queries

• Limitations of Previous work – Focused on optimal plan for a small # of similar queries.– Too expensive for large # of continuous queries.– Not designed for web.

• Contributions– A novel grouping approach for scalability

• Incremental group optimization strategy w/ dynamic re-grouping• New query usually does not require re-grouping

– Query-split scheme requires minimal changes to a query engine.– Support change-based & timer-based queries in a uniform way.

Page 7: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Key Ideas - Expression Signature • Expression signature:

– Mechanism to identify queries sharing monitored data– Same syntax structure, but different constant values across queries

• Example - Two queries sharing events on stock quotes– Replaces the constants in the predicates with a place holder

Page 8: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Key Ideas - Query Groups

• Basic Ides– Group queries by expression signatures– Query group signature = union of signature of all

queries in a group– Group constant table

• Keep signature constants for group– Group execution plans

• Shared by all queries in a group

• Split operation in group plans– Distribute result tuples to destinations – Using destination buffer name in the tuple of

Constant table (Fig. 3.4).– Pros: Reduce number of output buffers– Cons: Split may become a bottleneck

• High variance of update-rate in a group

Page 9: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Key Ideas – Incremental Group Assignment

• Incremental Group Optimization – Assign group(s) for a new query – match query signature to group signatures

in a bottom-up fashion.– Ex. Consider query in Fig. 3.6

• Its plan is in Fig. 3.7• Add lower part to group plan• Add upper part to group constant table

Page 10: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Key Ideas - Decomposition, Materialized Int. Files• Limitation of Split Operation

– Potential Bottleneck – If update rates vary across queries

• Solution– Write outputs to int. files– Add file scan operator to upper query– Decompose into several sub-queries– Challenge

• Impact on Query engine • To monitor sub-queries inputs

• Q? How many queries at most can result as a function of – Number of query groups (G)– Number of original user queries (U)

Removing split-bottleneckusing intermediate files

Page 11: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Support for Change-based & Timer-based Queries

• Change-based queries are fired – as soon as new data becomes available.

• Timer-based queries– only periodically executed.– Reduces computation Make the system more scalable

• Timer-based queries pose two challenges:– Hard to monitor the timer events of queries.– Sharing the common computation becomes difficult due to various time

intervals.• NiagaraCQ handles both types of queries uniformly.• Implementing Destination Buffers

– Pipeline or Materialization– Q? Which is better for timer-based queries?

Page 12: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Other Techniques

• General selection predicates – range-query) may create

intermediate files – containing numerous duplicate

tuples – Solution: ‘Virtual intermediate files’

stores a value range.

• Memory caching is required – to handle intermediate files that do

not fit in memory.

Example of range query and its expression signature

Page 13: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

•Prototyping NiagaraCQ System Architecture

Validation Methodology - 1

Page 14: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Validation Methodology - 2

• Experimental Evaluation

Page 15: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Summary• Paper’s focus

– A Scalable continuous query system

• Ideas – Incremental group optimization– Query split for easy implementation– Support for change-based & timer-based queries

• Contributions– Achieve scalability / easy implementation / grouping timer-based

queries.– Allow a very large # of users to register continuous queries in a high-

level query language

• Analytical Validation– Experiments

Page 16: Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Assumptions, Rewrite today

• Assumptions– Many queries tend to be similar in web environment.– Information and intermediate results may not fit in memory.

• Rewrite today– Include the results of dynamic re-grouping for system

deterioration.– More extensive experiments on optimization efficiency.