PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

56
PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS Kristin Tufte PhD Defense Dec 17, 2004

description

PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS. Kristin Tufte PhD Defense Dec 17, 2004. Streams & XML. person. lname:Jones. fname:Bob. address. (Jones, Bob, 153 Fir St., Portland). Nested, structured data (XML) - PowerPoint PPT Presentation

Transcript of PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

Page 1: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

Kristin Tufte

PhD Defense Dec 17, 2004

Page 2: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

212/17/2004

Streams & XML

Nested, structured data (XML) Streams: network traffic

information, environmental sensor data, telephone call records, click streams

(Jones, Bob, 153 Fir St., Portland) lname:Jones fname:Bob

street:153 Fir St.

address

city: Portland

person

That was then…

…this is now.

Page 3: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

312/17/2004

New Challenges

XML Data is nested New operators, query language

Streams Potentially infinite Produce results without waiting for end of stream/data Arrival rate not in control of database system

XML Streams Stock Data Data Exchange Intelligent Transportation Systems

Page 4: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

412/17/2004

Talk Preview

Incremental Query Evaluation (IQE) Merge Operation Merge Theory Merge Performance

Page 5: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

512/17/2004

Context for IQE Continuous Queries – Tapestry (Early 1990’s)

Monotonic queries, append-only databases Long-running Queries

Online aggregation (Hellerstein et al.), Nested Aggregates (Tan et al.)

Incremental Query Evaluation (IQE) (Partial Results) General solution for long-running queries over XML data

Stream Processing Potentially infinite streams of data STREAM, Aurora (Borealis), Niagara West

Triggers (Eric Hanson, NiagaraCQ)

Page 6: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

612/17/2004

Incremental Query Evaluation*

Motivation: Internet queries (long-running, data in XML) Get results to users before all of the

data arrives

Non-monotonic (blocking) operators are problematic

Modify operators and system framework

countgroup by Subject

* Joint work with Jai Shanmugasundaram

(Title, Subject, DateTime)

selectDateTime ≥ “12/17/04:12AM”

Page 7: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

712/17/2004

(Non-)monotonic Operators

An operator O is monotonic if: A B O(A) O(B) select, join (but often implemented with a

blocking algorithm) O is non-monotonic if it is not

monotonic aggregates, nest

On new input monotonic operators add to output, non-monotonic operators change output

countgroup by Subject

(Title, Subject, DateTime)

selectDateTime ≥ “12/17/04:12AM”

Page 8: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

812/17/2004

Handling Non-monotonic Operators

Users issue partial result requests Re-evaluation – transmit full result on every partial result

request Differential – avoid retransmitting duplicate data

Operators produce and process tuple inserts, deletes, updates All tuples contain “old value” and “new value”

(Title, Subject, DateTime)

selectDateTime ≥ “12/17/04:12AM”

countgroup by Subject

top10(count) Old Value New Value

Subject, Count Subject, Count( null, null, Ukraine, 2)(Ukraine, 2, Ukraine, 3)

Title, Subject, D/T Title, Subject, D/T(null, null, null, Title1, Ukraine, 1AM)(null, null, null, Title2, Ukraine, 3AM)(null, null, null, Title3, Ukraine, 5AM)

Page 9: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

912/17/2004

Re-evaluation vs. Differential

05

1015

2025

3035

9% 27% 45% 64% 82% 100%Percentage of Input Seen

Time (seconds)

No Partial (unordered) No Partial (ordered)Re-evaluation (unordered) Re-evaluation (ordered)Differential (unordered) Differential (ordered)

Page 10: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1012/17/2004

Skewed Data

0

10

20

30

40

0 0.5 1 1.5 2Skew

Time (seconds)

No Partial (unordered) No Partial (ordered)Reevaluation (unordered) Reevaluation (ordered)Differential (unordered) Differential (ordered)

Page 11: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1112/17/2004

Differential Nest

(Google, Title1),(Microsoft, Title2),(Microsoft, Title3)

(Google, Title4)

but what you’d really like to send is: (Google, {Title5})and “merge” it with: (Google, {Title1,Title4})

(Google, {Title1,Title4}, Google, {Title1, Title4, Title 5})

(Google, {Title1}, Google, {Title1, Title4})

produce partial result ( null, null, Google, {Title1}),

( null, null, Microsoft, {Title2, Title3})

Old Value New ValueSubject, Title Subject, Title

(Google, Title5)

Subject, Title

Subject: Google

Title: Title1 Title: Title4

Subject: Google

Title: Title5

Subject: Google

Title: Title1 Title:Title4 Title: Title5Merge

Page 12: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1212/17/2004

Talk Preview

Incremental Query Evaluation Merge Operation Merge Theory Merge Performance

Page 13: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1312/17/2004

Merge Operation

Flexible method for combining two XML (nested) documents-“recursive union” over similarly-structured XML documents

Merge Template guides the process “Keys” are used to indicate when elements

should be combined

Page 14: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1412/17/2004

Merge Example

auction

item

iid:501 desc: Trek Madone 5.9 Bike

bidder: Dave

bid

amt: $1500

item

iid:433 desc: 1971 Martin Guitar

item

iid:501

bidder: Sue

bid

amt: $1550

auction

auction

item

iid:501 desc: Trek Madone 5.9 Bike

bidder: Dave

bid

amt: $1500

item

iid:433 desc: 1971 Martin Guitar

bidder: Sue

bid

amt: $1550

Auction Document New Bid

Merged Document

CombinedInsertedUsed in Match

Page 15: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1512/17/2004

Merge Template (MT)

Merge Template is an XML document consisting of a tree of Element Merge Templates (EMT)

EMT is a triplet containing: (name, local key, content combine function)

(desc, [], ShallowContent - Replace)

(bidder, [], ExactMatch)

(item, [iid], NoContentNoAttrs)

(auction, [], NoContentNoAttrs)

(iid, [], ExactMatch)

(bid, [bidder, amt], NoContentNoAttrs)

(amt, [], ExactMatch)

item

iid:501

bidder: Sue

bid

amt: $1550

auction

Page 16: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1612/17/2004

Merge Template Features

Used as the basis for an Accumulate operator Repeated merge over a stream of XML documents to

create an Accumulator Accumulator is a view of the stream Performs structural aggregation

Keys used to identify elements to combine Keys external to document Content-Combine Functions

aggregate, deep replace Attributes – handled like elements without

children

Page 17: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1712/17/2004

Outline

Incremental Query EvaluationPartial results over XML data

Merge Operation Merge Theory Merge Performance

Page 18: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1812/17/2004

Theoretical Foundations Why a formal definition?

Prove Merge is deterministic (unique result) Unambiguous definition

Key results: Formal definition of Merge as the join of an

upper semi-lattice Merge is the least upper bound of two documents

(under some constraints)

Path Set Representation Good for reasoning about XML documents

Page 19: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

1912/17/2004

auction

item

iid:501 desc: Trek Madone 5.9 Bike

bidder: Dave

bid

amt: $1500

item

iid:433 desc: 1971 Martin Guitar

bidder: Sue

bid

amt: $1550

Merged Document (D3)

View Merge as Least Upper Bound

D3 is “smallest” document that “contains” D1 and D2 auction

item

iid:501 desc: Trek Madone 5.9 Bike

bidder: Dave

bid

amt: $1500

item

id:433 desc: 1971 Martin Guitar

item

iid:501

bidder: Sue

bid

amt: $1550

auction

Auction Document (D1) New Bid (D2)

Page 20: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2012/17/2004

What can go wrong?

D4

D1

D3

D2

item

auction

item item

auction

item

auction

item

auction

iid:501 iid:433

iid:501 iid:433 iid:501 iid:433

No unique result (no Least Upper Bound (LUB))

Keys in Merge Template eliminate ambiguity

Know D4 is correct result if we know iid is a key for item

Page 21: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2112/17/2004

What is a lattice?

An Upper-Semi Lattice is: a partially ordered set, in which least upper bounds (LUBs) exist and are unique

A set of sets closed under union form an upper semi lattice.

implies

Ex 1 – Not Lattice

LUB of {1,2} and {2, 3} does not exist

Ex 2 – LatticeOrder: S1 S2 if S1 S2

Ex 3 – LatticeOrder: document containment

{1, 2} {2, 3} {1, 2} {2, 3}

{1, 2, 3}

D1 D2

D3

Page 22: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2212/17/2004

What do I need for a lattice?

Set of documents (LT) (T is a Merge Template)

Order (document containment) Show LT satisfies the properties of a lattice.

Page 23: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2312/17/2004

Document Containment Order

D1 is contained in D2 if there is a structure-preserving mapping from D1 into D2

item

auction

iid:501 desc: Trek Madone 5.9 Bike

item

auction

iid:433 desc:1971 Martin Guitar

item

iid:501 desc: Trek Madone 5.9 Bike

D1 D2

Page 24: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2412/17/2004

Merge Template (T) Defines LT

A Merge Template, T, is specific to a set of documents Auction MT specific to “auction” documents

LT is all documents that are “compatible” and “key-respecting” with respect to T

Different lattice for each Merge Template

T D5

D1 D3

D4

D10LT

D8

D2

Set of all documents

Page 25: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2512/17/2004

Non-Key-Respecting Documents

D4

D1

D3

D2

item

auction

item item

auction

item

auction

item

auction

iid:501 iid:433

iid:501 iid:433 iid:501 iid:433

means contained in. D is contained in D′ if there is a structure-preserving mapping from D into D′

D3 is not key-respecting with respect to T and should not be in LT.

(item, [iid], NoContentNoAttrs)

(auction, [], NoContentNoAttrs)

(iid, [], ExactMatch)

T

Page 26: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2612/17/2004

Merge-Lattice Theorem Overview

Associate each document D with a unique path set ρ(D) ρ(D1) ρ(D2) is a Least Upper Bound (LUB) for ρ(D1)

and ρ(D2) ρ(D1) ρ(D2) is the “smallest” set that contains both ρ(D1) and

ρ(D2)

Intuition: Merge of D1 and D2 should be the document associated with ρ(D1) ρ(D2)

D1 ρ(D1)

ρ(D1) ρ(D2)

ρ2

ρ1

D2

D3

ρ(D2)

LT

Page 27: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2712/17/2004

Document and Path Set

Use Merge Template + document to create path set One element in path set for each element in document Path comprised of rooted key value and element content Path set order (subset) identical to document

containment order

item

bidder: Dave

amt: $1500

bid

auction

iid:501 desc: Trek Madone 5.9 Bike

auction[]:auction[].item[id:501]:auction[].item[id:501].id[]:501auction[].item[id:501].desc[]:Trek Madone 5.9 Bikeauction[].item[id:501].bid[bidder:Dave,amt:$1500]: auction[].item[id:501].bid[bidder:Dave,amt:$1500].

bidder[]:Daveauction[].item[id:501].bid[bidder:Dave,amt:$1500].

amt[]:$1500

auction[].item[iid:501].desc[]:Trek Madone 5.9 Bike

rooted key value element content

Page 28: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2812/17/2004

Proof that D3 is in L

Construct D3 from ρ(D1) ρ(D2), show D3 is compatible and key-respecting with respect to T

D3

ρ(D1)

2

1

σ σ-1 (=ρ3)

D1

T

ρ(D2)ρ2

-1

ρ2

ρ1

3

ρ1-1

D2

Page 29: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

2912/17/2004

Outline

Incremental Query EvaluationPartial results over XML data

Merge Operation Merge Theory Merge Performance

Page 30: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3012/17/2004

Implementation Highlights

Accumulate operator uses repeated binary Merges to combine a series of XML documents into one result document

Accumulate is implemented as a recursive walk over input docs and the Merge Template

Implemented in Niagara v1.0 (UW-Madison) Lazy construction of DOM nodes: SAXDOM General improvements to Niagara 1.0 code base

Page 31: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3112/17/2004

Performance Environment

866 MHz Pentium PIII, 512MB memory, Red Hat Linux 8.0

Sun JVM J2SE 1.4.2, maximum memory 412MB

Page 32: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3212/17/2004

Input Data - XMark

people

person*

name

email profile

education

phone?

id

site

Persons

site

open_auctions

open_auction*

bid

bidder

personref

time

id

Bids

open_auction*

seller interval

start end

open_auctions

id

site

reserve?

Items

person

person

* 0 or more

? optional

Page 33: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3312/17/2004

Structural Aggregation with Restructuring

amt

bid*

time

item* id

itemsbid

people

person* id

Q5.1 outputQ5.1 input (Bids)

Q5.1 – simple structural aggregation query

For each person produce a list of items they bid on and their bids on those items

site

open_auctions

open_auction*

bid

bidder

personref

time

id

person

Page 34: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3412/17/2004

Restructuring of Input

people

person id:53

site

open_auctions

open_auction

bid:$82

bidder

personref

time:5:00

iid:8 itemsbid

open_auction

bid:$82

bidder

personref

time:5:00

iid:8

amt:$82time:5:00

restructure accumulate

person:53

id:53

people

person

itemsbid

id:8item

bid

person:53

Q5.1 OutputQ5.1 Input Restructured Input

Page 35: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3512/17/2004

Q5.1 query plans

nest(bidderid)construct

(restructured document)

unnest(site.open_auctions.open_auction)

unnest(bidder.person_ref.person as bidderid)

accumulate

filescan

nest(“”)

unnest(time)

unnest(amt)

nest(itemid, bidderid)

nest(bidderid)

unnest(open_auction.id as itemid)

unnest(bidder)

unnest(person_ref.person as bidderid)

Merge Query Plan

unnest(site.open_auctions.open_auction)

filescan

Nest Query Plan

Page 36: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3612/17/2004

nest(bidderid)

nest(“”)

nest(itemid, bidderid)

nest(bidderid)

unnest(open_auction, open_auction.id, bidder,

person_ref.person, time, amt)

filescan

Nest Query Plan

Q5.1 Nest Query Plan

amt:$82time:5:00

id:53

people

person

itemsbid

id:8item

bid

Q5.1 Output

Page 37: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3712/17/2004

Q5.1 Execution Time

020406080

100120140160

0 10 20 30 40 50 60 70MB of Data

Seconds

MergeMagicMergeNest

Page 38: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3812/17/2004

Q5.2 Execution Time

0

50

100

150

200

0 20 40 60 80MB of Data

Seconds

MergeMagicMergeNest

items

item* id

bid*

bidder*

amttime

id

Q5.2: for every item list of bidders and their bidsQ5.1: for every person list of items sold and bids on those items

Q5.2 output

Page 39: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

3912/17/2004

Merge Plan Nest Plan

Operator Avg Exececution Time (sec)

Operator Avg Execution Time (sec)

filescan 3.8 filescan 3.7

unnest: open_auction, itemid, bidder, bidderid

5.4(0.9, 1.2, 1.7, 1.6)

unnest: open_auction, itemid, bidder, bidderid

4.7(0.9, 1.1, 1.4, 1.3)

construct 4.9 unnest (time) 1.3

accumulate 6.3 unnest (amt) 1.6

Total 20.4 nest(itemid, bidderid) 5.2

nest(itemid) 4.9

nest(“”) 2.0

Total 23.4

Avg Query Exec Time 30.7 Avg Query Exec Time 42.5 sec

Avg GC Time 9.5 sec Avg GC Time 17 sec

Execution time breakdown Q5.2

Page 40: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4012/17/2004

Simplified Q5.4-A Output

people

person*

name email

profile

education

phone?

id

bid

bidder*

pesonref

time

open_auction*

seller interval

start end

itemssold

id

reserve?

open_auction* id

itemsbid

person

person

For each person, provide person information, list of items put up for auction (itemssold) and items bid on (itemsbid)

Page 41: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4112/17/2004

Simplified Q5.4-B Output

people

person*

name email

profile

education

phone?

id

time

interval

start end

itemssold

id

reserve?

id

itemsbid

renamed

Key:

deleted

seller person

personref person

bid

item* item*

amt

Page 42: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4212/17/2004

Q5.4-A and Q5.4-B Results

0

20

40

60

80

100

120

0 10 20 30 40 50 60MB of Data

Seconds MergeMagicMergeNest

Query 5.4-A

0

20

40

60

80

100

120

0 10 20 30 40 50 60MB of Data

Seconds MergeMagicMergeNest

Query 5.4-B

Q5.4-B is faster despite having to unnest the input more deeply

Key factor: Q5.4-B has fewer elements in the result

Page 43: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4312/17/2004

Merge-Ready Structural Aggregation

No restructuring; input structured similar to output Best case for Merge

0

20

40

60

80

100

0 10 20 30 40 50 60 70MB of Data

SecondsMerge

Nest

Q5.5 (small documents)

0

20

40

60

80

100

0 10 20 30 40 50 60 70MB of Data

SecondsMerge

Nest

Q5.6 (big documents)

Page 44: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4412/17/2004

Sliding Structural Aggregation

Extend accumulate to handle sliding windows

For each element, maintain range of windows

Test vs. sliding nest0

50

100

150

200

250

300

350

0 250 500 750 1000 1250Range

Seconds MergeMagicMergeNest

Q6.1 (group bids by item then person)

Page 45: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4512/17/2004

Conclusion

Studied processing of XML Streams IQE

General framework for partial results over initial portion of stream

MergeFlexible operator for combining XML

documentsFormal definition in terms of lattice theoryOutperforms nest-based alternatives

Page 46: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4612/17/2004

Extras/Deletes

Page 47: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4712/17/2004

Re-evaluation vs. differential

Query plan for re-evaluation vs. differential

Neston Author

(Author, Address)

(Author, Book)

Join on Author

Page 48: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4812/17/2004

Partially-Ordered Set (POSet)

Let P be a set. A partial order () on P is such that for all x, y, z P(i) x x(ii) x y and y x x = y(iii) x y and y z x z

{1, 2} {2, 3}

{1, 2, 3}

{1}

Example: Set of sets ( implies )

S1 S2 if S1 S2

Page 49: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

4912/17/2004

Sliding Accum query plan Q6.1

bucket

sliding accumulate

construct

filescan + series of unnests

(document, timestamp, window-min, window-max)

( D1, 12:01 PM, 0, 7 ) t1′( D2, 12:20 PM, 1, 8 ) t2′( *, 2:00 PM, 0, 0 ) p1′

(document, timestamp)

( D1, 12:01 PM ) t1

( D2, 12:20 PM ) t2

( *, 2:00 PM ) p1

Page 50: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5012/17/2004

Sliding Nest Query Plan Q6.1

sliding nest(bidderid, windowid)

sliding nest(windowid)

sliding nest(itemid, bidderid,

windowid)

sliding nest(bidderid, windowid)

bucket

construct

filescan + series of unnests

(document, timestamp)

( D1, 12:01 PM ) t1

( D2, 12:20 PM ) t2

( *, 2:00 PM ) p1

(document, timestamp, window-min, window-max)

( D1, 12:01 PM, 0, 7 ) t1′( D2, 12:20 PM, 1, 8 ) t2′( *, 2:00 PM, 0, 0 ) p1′

Page 51: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5112/17/2004

Merge-Lattice Theorem

The Merge-Lattice Theorem states that given a Merge Template T, the set of XML documents that are “compatible” with and “key-respecting” with respect to a T is an upper semi-lattice under a specific ordering based on T.

Page 52: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5212/17/2004

Compatibility Mapping

item

bidder: Dave

amt:$1500

bid

auction

id:501

desc: Trek Madone 5.9 Bike

(desc, [])

(bidder, [])

(item, [id])

(auction,[])

(id, [])

(bid, [bidder, amt])

(amt, [])

Auction Status Document Auction Merge Template

bidder: Sue

amt:$1550

bid

(quantity, [])

Page 53: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5312/17/2004

Identify Document Containment

item

auction

iid:433 desc:1971 Martin Guitar

item

auction

iid:501 desc: Trek Madone 5.9 Bike

auction[]:auction[].item[id:501]:auction[].item[id:501].id[]:501auction[].item[id:501].desc[]:Trek Madone 5.9 Bike

auction[]:auction[].item[id:501]:auction[].item[id:501].id[]:501auction[].item[id:501].desc[]:Trek Madone 5.0 Bikeauction[].item[id:433]:auction[].item[id:433].id[]:433auction[].item[id:433].desc[]:1971 Martin Guitar

D1ρ(D1)

item

iid:501 desc: Trek Madone 5.9 Bike

D2

ρ(D2)

Page 54: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5412/17/2004

Term

Applies To Definition

compatibleandcompatibility mapping

Document – DMerge Template – T

D is compatible with T if there exists an operation-preserving function that maps elements in D to EMTs in T such that (D.root) = T.root, and for every element, E, in D, name(E) = name((E)), and (parent(E)) = parent((E)). is called a compatibility mapping. (Section 4.3)

key-exact Element – EEMT - (E)

E is key-exact with respect to an EMT (E) if for every path p in the Local Key in (E), patheval(E, D, p) is a singleton set. (Section 4.4)

key-exact D, T Compatibility Mapping –

D is key-exact with respect to T if every element E in D is key-exact with respect to (E). (Section 4.5)

key-respecting D, T, D is key-respecting with respect to T and if no two elements of D have the same rooted key value. D must be key-exact with respect to T and . (Section 4.5)

key-respecting Path set - P P is key-respecting if there do not exist p1 and p2 in P such that p1 and p2 differ only in the value string of the terminal element. If a document is key-respecting, its path set is key-respecting. (Section 4.5)

Path-Containment ordering

Documents – D1 and D2

D1 is contained in D2, (D1 ⊑ D2), if there exists a 1-1 homomorphism that maps D1 into D2 such that for every element E in D1, name(E) = name((E)), value(E) = value((E)) and (parent(E)) = parent((E)) and (D1.root) = D2.root. (Section 4.6)

key-consistent D1, D2, T D1 and D2 key-consistent with respect to T if the union of their path sets is key-respecting. (Section 4.7)

mergeable D1, D2, T D1 and D2, are mergeable if they are key-consistent and lkv(D1.root) = lkv(D2.root). (Section 4.7)

Page 55: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5512/17/2004

Nest Operator Example

Subject Title(Google, “New Google chapter…”)(Google, “Google pens new…”)(Microsoft, “Microsoft launches…”)(Google, “Google speaks volumes …”)(Microsoft, “MSN ships…”)

nest

Subject: Google

Title: New Google chapter…

result

Subject: Microsoft

Result in XML

Title:Google pens new…

Title:Microsoft launches…

Title:Google Speaks volumes…

Title:MSN ships…

Subject Title(Google, {“New Google chapter…”,

“Google pens new…”, “Google speaks volumes…”})

(Microsoft, {“Microsoft launches…”, “MSN ships…”})

Are the fonts OK? Smallest I used is 14 for the XML examples, and 16 for text. Is that OK??

Page 56: PARTIAL RESULTS AND STRUCTURAL AGGREGATION OVER XML DATA STREAMS

5612/17/2004

Input 1 (I1)         Result 1 (R1)[ (Google, Title1), [ (Google, {Title1}), (Microsoft, Title2), (Microsoft, {Title2, Title3}) ] (Microsoft, Title3) ]

Input 2 (I2)         Result 2 (R2)[ (Google, Title1), [ (Google, {Title1, Title4}), (Microsoft, Title2), (Microsoft, {Title2, Title3})] (Microsoft, Title3), (Google, Title4) ]

Is Nest Monotonic?

An operator O is monotonic if: A “less than” B O(A) “less than” O(B)

Answer: it depends on how you define “less than” If “less than” is , answer is no If “less than” is substructure, answer is yes

nest on subject