San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to...

17
San Diego Supercomputer Center San Diego Supercomputer Center XMLDM'02, Prague XMLDM'02, Prague 1 Time to Leave the Trees: Time to Leave the Trees: From Syntactic to From Syntactic to Conceptual Querying of XML Conceptual Querying of XML Bertram Lud Bertram Lud ä ä scher scher Ilkay Altintas Ilkay Altintas Amarnath Gupta Amarnath Gupta San Diego Supercomputer San Diego Supercomputer Center Center U.C. San Diego U.C. San Diego

Transcript of San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to...

Page 1: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 1111

Time to Leave the Trees: From Syntactic Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLto Conceptual Querying of XML

Time to Leave the Trees: From Syntactic Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLto Conceptual Querying of XML

Bertram LudBertram Ludääscherscher

Ilkay AltintasIlkay Altintas

Amarnath GuptaAmarnath Gupta

San Diego Supercomputer Center San Diego Supercomputer Center

U.C. San DiegoU.C. San Diego

Bertram LudBertram Ludääscherscher

Ilkay AltintasIlkay Altintas

Amarnath GuptaAmarnath Gupta

San Diego Supercomputer Center San Diego Supercomputer Center

U.C. San DiegoU.C. San Diego

Page 2: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 2222

OverviewOverviewOverviewOverview

• Motivating Example: Motivating Example: – querying XML w/o and w/ conceptual-level information

– “syntactic” vs. “conceptual” querying of XML

• Distilling conceptual-level information: Distilling conceptual-level information: – MXS (abstract Model for XML Schema)

• XPathT: XPathT: – Incorporating conceptual-level information in XPath

Page 3: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 3333

Motivating ExampleMotivating ExampleMotivating ExampleMotivating Example

• Example: “Books DB” Example: “Books DB” (yes, more complex examples exist... ;)(yes, more complex examples exist... ;)

– elements: <myDB> ... <book> .... <price> .... <author> ...

• Sample Queries:Sample Queries:– Q1: Which <book>s have a <price> below $80?– Q2: What’s the count and average <price> of <book>s?

• (Nice) Try:(Nice) Try:– Q1: myDB//book[price<80]– Q2: N := count(myDB//book); S := sum(myDB//book/price);

Avg := S/N;

• But what about ...But what about ...– ... <book>s with multiple <price>s?– ... <awe> (award-winning-exemplars) elements (= subtype of

book having subelement <award>): we forgot those!

Page 4: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 4444

Schema Information to the Rescue!Schema Information to the Rescue!Schema Information to the Rescue!Schema Information to the Rescue!

• XML & Semistructured Data Model:XML & Semistructured Data Model:– labeled ordered trees – “instance contains its own schema information”– XML instances and DTDs have very little “schema info”:

• tag names (aka element “types”) = attribute names• element nesting = object (“slot”) structure

no data types, constraints, classes, class hierarchy, ...

• Schemas are Good for You!Schemas are Good for You!– link to conceptual models/DB design, query formulation,– validation, storage layout (optimization), – query processing (optimization), ...

XML SchemaXML Schema

Page 5: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 5555

Motivating Example (Cont’d)Motivating Example (Cont’d)Motivating Example (Cont’d)Motivating Example (Cont’d)

• Q1 after studying <myDB> and/or its XML Schema:Q1 after studying <myDB> and/or its XML Schema: there is a type hierarchy below type bookT tag names are bound to those types but XPath doesn’t know this => use Syntactic Queries:

//*[book OR tbook OR cbook OR...OR awe] [price<80]//*[book OR tbook OR cbook OR...OR awe] [price<80]

tedious and error-prone (do-it-yourself: tedious and error-prone (do-it-yourself: Appendix AAppendix A) ) – e.g. you overlooked <publication xsi:type=“bookT”> !(usually schema info not contained in the XML instance)

small changes in the schema (adding a new subtype) small changes in the schema (adding a new subtype) require rewriting of your query...require rewriting of your query...

Page 6: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 6666

From Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML Queries

1. Distill conceptual information from the XML Schema1. Distill conceptual information from the XML Schema Abstract Model of XML Schema (MXS)

2. Incorporate MXS information into the query language2. Incorporate MXS information into the query language XPathT (“XPath with types/classes”)

turn turn Syntactic XML QuerySyntactic XML Query //*[book OR tbook OR cbook OR ... OR awe] [price<80]//*[book OR tbook OR cbook OR ... OR awe] [price<80]

into a more adequate into a more adequate Conceptual XMLConceptual XML QueryQuery:://*[ts(bookT)][price<80] /* works for any subtype of bookT */

more robust w.r.t. schema changesmore robust w.r.t. schema changes new opportunities for semantic query optimizationnew opportunities for semantic query optimization

Page 7: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 7777

Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))

• Basic Ideas: Basic Ideas: – Formal abstract model (never mind the XML Schema syntax!),

inspired by Model Schema Language (MXL) [Brown-Fuchs-Robie-Wadler-WWW10-2001]

– “Types as Classes”

• XML Schema Names: XML Schema Names: – T: Type Names

– E: Element Names

– A: Attribute Names

• XML Instances...XML Instances...– ... usually contain only element names (tags) E and attributes A

( exception: “xsd:type = ...” )

Page 8: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 8888

Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))

• MXS NamesMXS Names– T: Types, E: Elements, A: Attributes

• Kinds of TypesKinds of Types– simple vs. complex: T_s, T_c

– abstract vs. concrete: T_a, T_na

• Type HierarchyType Hierarchy– restrict (T_s T_s) (T_c T_c)

• restricts possible instances, keeping structure

– extend (T_s T_c) T_c• adds “slots” (elements and attributes)

– subtype = extend restrict• extend and restrict are subtyping mechanisms

Page 9: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 9999

Type (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML Schema

• Convention: user-defined type names end with “T”Convention: user-defined type names end with “T”– authorT, publicationT, bookT, ...

Page 10: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 10101010

Inheritance in XML Schema (I) Inheritance in XML Schema (I) Inheritance in XML Schema (I) Inheritance in XML Schema (I)

expTextBookTexpTextBookT ::= ::= SUBTYPESUBTYPE ((bookTbookT) ) that that RESTRICTsRESTRICTs <<priceprice> > to to expPriceTexpPriceT andand EXTENDs EXTENDs with with <<recommended_forrecommended_for>>

EXTENDEXTEND

RESTRICTRESTRICT

SUBTYPESUBTYPE

Page 11: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 11111111

Inheritance in XML Schema (II) Inheritance in XML Schema (II) Inheritance in XML Schema (II) Inheritance in XML Schema (II)

1919ththcenturyTextBookTypecenturyTextBookType ::= ::= SUBTYPESUBTYPE {{textBookT, c19bookT}textBookT, c19bookT}

multiplemultipleinheritanceinheritance

singlesingleinheritanceinheritance

XML Schema type system does not known the two are equivalent!XML Schema type system does not known the two are equivalent!

Page 12: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 12121212

Framework for Conceptual Queries in XMLFramework for Conceptual Queries in XMLFramework for Conceptual Queries in XMLFramework for Conceptual Queries in XML

• Binding Types to ElementsBinding Types to Elements– bind (E (T_s T_c )) (A T_s)

• binds element names to simple or complex types

• binds attribute names to simple types

• Syntactic XML InstanceSyntactic XML Instance: : DD– root(NodeId), child(NodeId,Integer,NodeId),

tag(NodeId,Tagname), data(NodeId,Data)

• Conceptual XML InstanceConceptual XML Instance: : DD++– restrict(T, T), extend(T, T), subtype(T, T),

– bind(E T, T)

– ...

Page 13: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 13131313

XPathT: Incorporating Type (Class) XPathT: Incorporating Type (Class) Information in XPath Information in XPath

XPathT: Incorporating Type (Class) XPathT: Incorporating Type (Class) Information in XPath Information in XPath

• XPath XPath patterns ppatterns p and and qualifiers qqualifiers q: : pp[[qq]] returns returns matches matches of of pp which which qualifyqualify according to according to qq

• New New XPathTXPathT patterns: patterns:

• r(t), e(t), s(t):r(t), e(t), s(t): restrictrestrict, , extendextend, , subtypesubtype type type tt• tr(t), te(t), ts(t): tr(t), te(t), ts(t): transitivetransitive versions versions

Page 14: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 14141414

Semantics of XPathTSemantics of XPathTSemantics of XPathTSemantics of XPathT• Example: Example:

“transitive subtype”:

SEM( ts(t) ) :=

{ t’ | subtype*(t,t’) }

from types to element names:

SEM( [T] ) :=

{ e | bind(t,e), tT }

SEM( [ts(bookT)] ) := {book,ebook,tbook, ...}

Page 15: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 15151515

Conceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathT

• Which books have price below $80?Which books have price below $80?//*[ts(bookT)][price<80]

• Semantic-aware equivalent rewriting:Semantic-aware equivalent rewriting://*[ts(bookT)][NOT ts(expTextBookT)][price<80]

• Logic XPathT Query Plan:Logic XPathT Query Plan: tree structure informationtree structure informationconceptual informationconceptual information

Page 16: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 16161616

SummarySummarySummarySummary• Complex domains require Complex domains require conceptual level modeling and queryingconceptual level modeling and querying

capabilities capabilities beyond just tree structurebeyond just tree structure

• Statues Quo: Statues Quo: XML SchemaXML Schema: simple “conceptual model” with may : simple “conceptual model” with may ad-hoc “design decisions”/restrictions ad-hoc “design decisions”/restrictions

Abstract Abstract Model of XML SchemaModel of XML Schema ( (MXSMXS))

XPathTXPathT: first step towards “conceptual” or “semantic” XML query : first step towards “conceptual” or “semantic” XML query language extensionslanguage extensions

more more concise, intuitive, flexibleconcise, intuitive, flexible, and , and robustrobust queries queries the the system maps conceptual to syntactic queriessystem maps conceptual to syntactic queries, not the , not the

programmer/query designer!programmer/query designer!

Page 17: San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 17171717

Next Steps & OutlookNext Steps & OutlookNext Steps & OutlookNext Steps & Outlook

• extend MXS to include more conceptual informationextend MXS to include more conceptual information• develop formal semanticsdevelop formal semantics

– XPathT, extensions: XPathC, XQueryC

• research problems: research problems: – mapping: XPathC queries => equivalent XPath queries– formalize equivalence, always possible? Then, conventional

XML query processors can be used!– “proxy XML Schema doc”: instead of rewriting into XPath

over the original instance, can one materialize some conceptual info as a “proxy XML doc” such that conceptual queries become conventional queries against the proxy...

– semantic query optimization: equivalent rewritings given the conceptual level constraints