1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.
-
Upload
theresa-armstrong -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.
1
Syntax-directed Transformationsof XML Streams
Stefanie Scherzinger joint work with Alfons Kemper
2
<bib> <book> <year>1999</year> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...
<!ELEMENT bib (book)*><!ELEMENT book (year,title,author,author*)<!ELEMENT year #PCDATA><!ELEMENT title #PCDATA><!ELEMENT author #PCDATA>
1. Very long XML documents.
3. Schema information is available.
2. Applications need to becompletely main-memory based.
XML Stream Processing
3
XML Query Languages
//book[year=2003]/title
<books> { for $x in input()//book where $x/year=2003 return <book> {$x/title} <authors> {$x/author} </authors> </book> }</books>
XPath
XQuery
<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><books><xsl:for-each select="bib/book"><book> <xsl:copy-of select="title"/>
<xsl:copy-of select="author"/></book></xsl:for-each></books></xsl:template></xsl:stylesheet>
XSLT
Schema knowledgenecessary to specify query!
4
TransformX Attribute Grammars
1. (Suitable) extended regular tree grammar, e.g. DTD
2. Add attribution functions (Java code)
3. Parser generator produces Java code:• Validates the input• Evaluates the attribution functions
4. Compile and execute
5
Extended Regular Tree Grammars
Grammar G = (Nt,T,P,bib)
Nonterminals Nt = {bib,pub,year,title,author}
Terminals T = {bib,book,year,title,author,PCDATA}
bib ::= bib( pub* )
pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* )
year ::= year( PCDATA )
title ::= title( PCDATA )
author ::= author( PCDATA ) bib
book
year title author author author
L(G)
6
Example: Task<bib> <book> <year>1999</year> < title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...
<books> <book> <id>1</id> <title>Data on the Web</title> <year>1999</year> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...
1. Re-label root to “books”2. Retrieve all books, but not articles3. For each book, output
• numerical identifier• title, year, and authors
input: output:
7
Example: TransformX Attribute Grammar
8
Example: TransformX Attribute Grammar
definitionsection
rulessection
class-membersection
attributionfunctions
9
10
Grammar provides context information potential for optimization
11
Extended Regular Tree Grammars
Grammar G = (Nt,T,P,bib)
Nonterminals Nt = {bib,pub,year,title,author}
Terminals T = {bib,book,year,title,author,PCDATA}
bib ::= bib( pub* )
pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* )
year ::= year( PCDATA )
title ::= title( PCDATA )
author ::= author( PCDATA ) bib
book
year title author author author
L(G)Abbreviation: (pub*)=(book article)*
12
TDLL(1) Grammars
ERTG where rhs is or(regular expression) is one-unambiguous:
• a*.a • a.a* • a.b* a.c* • a.(b* c*)
deterministic parsing with one token lookahead
parse tree can be unambiguously constructed with lookahead of one token:
DTDs are a dialect of TDLL(1) grammars
bib
book
year title author author author
Lee, Mani, Murata, 2000.
13
Strong One-Unambiguity
stronglyone-unambiguous
Koch, Scherzinger, 2003.
14
Syntax in the AbstractAttributed TDLL(1) grammar, i.e., each production
1. is of one of the four forms:n :: = t()
n :: = {f$[} t()
n :: = t() {f$]}
n :: = {f$[} t() {f$]}
2. if is an attributed regular expression, then for the regular expression without the attribution functions:
() must be strongly one-unambiguous
15
Example
16
Parse Tree
bib
book
year title author author author
17
Attributed Parse Tree
bib
book
year title author author author
18
Attributed Parse Tree
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
19
Attributed Parse Tree
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
20
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
L-attributed Grammars
21
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
22
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
23
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
24
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
25
bib
book
yeartitle
authorauthor author
yeartitle
authorauthor author
26
27
In Practice
28
In Practice
29
accessible from withinattribution functions
Class Members
30
transfer informationbetween
attribution functions
TransformXAttributes
31
The TransformX Parser Generator
Translation to Java source code:
1. The validator module– validate input– output attribution functions as encountered
in attributed extended parse tree generated in O(|G|3)
2. The evaluator module– evaluate attribution functions– store attributes on stack generated in O(1)
32
Experiments
Prototype: C++ implementation,generates Java code
Experiments:1. Validate the input2. Output the input3. Evaluate example
Data: Books and articles, datasets 31-122 MB
Memory consumption: 12 MB
33
Conclusion & Summary
• TransformX attribute grammars specify many queries conveniently often more convenient than SAX grammar may reveal potential for optimization
• TransformX parser generatorlittle runtime-overhead (validation+attributes)
• Prototype implementation
34
Selected Related Work
XML and Attribute GrammarsM. Benedikt, C.Y. Chang, W. Fan, J. Freire,
and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03.
M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02.
C. Koch and S. Scherzinger:“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.
F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan. 2002.
S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, 2005.
One-unambiguous Regular LanguagesBrüggemann-Klein and D. Wood. “One-
Unambiguous Regular Languages“. Information and Computation, 1998.
Strong One-unambiguityC. Koch and S. Scherzinger:
“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.
TDLL(1) GrammarsD. Lee, M. Mani, and M. Murata. “Reasoning
about XML Schema Languages using Formal Language Theory.“ Technical Report RJ 10197 Log 95071, IBM Research, Nov. 2000.
Lex&YaccJ. R. Levine, T. Mason, D. Brown. “lex&yacc“.
O‘Reilly, 1992.
35
Thank you