Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev...
-
date post
22-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev...
![Page 1: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/1.jpg)
Fundamentals, Design, and Implementation, 9/e
Text and XML databases
Instructor: Dragomir R. Radev
Winter 2005
![Page 2: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/2.jpg)
Chapter 9/2 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Types of databases
Textual databases Semi-structured databases
![Page 3: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/3.jpg)
Chapter 9/3 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Indexing textual data
Inverted files Boolean queries Signature files Signature S1 matches signature S2 if
S2&S1=S2
![Page 4: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/4.jpg)
Chapter 9/4 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
XML-QL
WHERE <BOOK> <NAME><LAST>$1</LAST></NAME> </BOOK> in “www.booklist.com/books.xmlCONSTRUCT <RESULT> $1 </RESULT>
Two slides from Johannes Gehrke, Cornell University<IMG SRC=“xysq.gif” ALT=“(x+y)^2”>
<apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <cn>2</cn> </apply>
![Page 5: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/5.jpg)
Chapter 9/5 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
XML-QL (continued)
WHERE <BOOK> $b <BOOK> IN “www.booklist.com/books.xml”, <AUTHOR> $n </AUTHOR> <PUBLISHED> $p </PUBLISHED> in $eCONSTRUCT <RESULT> <PUBLISHED> $p </PUBLISHED> WHERE <LAST> $l </LAST> IN $n CONSTRUCT <LAST> $l </LAST> </RESULT>
![Page 6: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/6.jpg)
Chapter 9/6 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<!ELEMENT book (author+, title, publisher)>
<!ATTLIST book year CDATA>
<!ELEMENT article (author+, title, year?, (shortversion|longversion))>
<!ATTLIST article type CDATA>
<!ELEMENT publisher (name, address)>
<!ELEMENT author (firstname?, lastname)>
XML-QL (continued)
![Page 7: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/7.jpg)
Chapter 9/7 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <book>
<publisher><name>Addison-Wesley</name></publisher>
<title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a
XML-QL (continued)
![Page 8: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/8.jpg)
Chapter 9/8 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <book> <publisher><name>Addison-
Wesley</></> <title> $t</> <author> $a</> </> IN "www.a.b.c/bib.xml" CONSTRUCT $a
XML-QL (continued)
![Page 9: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/9.jpg)
Chapter 9/9 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <book>
<publisher><name>Addison-Wesley</></> <title> $t</> <author> $a</> </> IN "www.a.b.c/bib.xml" CONSTRUCT <result> <author> $a</> <title> $t</> </>
XML-QL (continued)
![Page 10: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/10.jpg)
Chapter 9/10 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<bib>
<book year="1995">
<!-- A good introductory text -->
<title> An Introduction to Database Systems </title>
<author> <lastname> Date </lastname> </author>
<publisher> <name> Addison-Wesley </name > </publisher>
</book>
<book year="1998">
<title> Foundation for Object/Relational Databases: The Third Manifesto </title>
<author> <lastname> Date </lastname> </author>
<author> <lastname> Darwen </lastname> </author>
<publisher> <name> Addison-Wesley </name > </publisher>
</book>
</bib>
XML-QL (continued)
![Page 11: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/11.jpg)
Chapter 9/11 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result>
<result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> </result>
<result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> </result>
XML-QL (continued)
![Page 12: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/12.jpg)
Chapter 9/12 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <book > $p</> IN "www.a.b.c/bib.xml",
<title > $t</>,
<publisher><name>Addison-Wesley</>> IN $p
CONSTRUCT <result>
<title> $t </>
WHERE <author> $a </> IN $p
CONSTRUCT <author> $a</>
</>
XML-QL (continued)
![Page 13: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/13.jpg)
Chapter 9/13 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<result>
<title> An Introduction to Database Systems </title>
<author> <lastname> Date </lastname> </author>
</result>
<result>
<title> Foundation for Object/Relational Databases: The Third Manifesto </title>
<author> <lastname> Date </lastname> </author>
<author> <lastname> Darwen </lastname> </author>
</result>
XML-QL (continued)
![Page 14: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/14.jpg)
Chapter 9/14 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </> </> CONTENT_AS $a IN "www.a.b.c/bib.xml"
<book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </> </> IN "www.a.b.c/bib.xml", y > 1995 CONSTRUCT <article> $a </>
XML-QL (continued)
![Page 15: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/15.jpg)
Chapter 9/15 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
XML-QL (continued)
![Page 16: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/16.jpg)
Chapter 9/16 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<!ATTLIST person ID ID #REQUIRED><!ATTLIST article author IDREFS
#IMPLIED>
XML-QL (continued)
![Page 17: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/17.jpg)
Chapter 9/17 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
<person ID="o123">
<firstname>John</firstname>
<lastname>Smith<lastname>
</person>
<person ID="o234">
. . .
</person>
<article author="o123 o234">
<title> ... </title>
<year> 1995 </year>
</article>
XML-QL (continued)
![Page 18: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/18.jpg)
Chapter 9/18 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
XML-QL (continued)
![Page 19: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/19.jpg)
Chapter 9/19 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
WHERE <article><author><lastname> $n</></></> IN "abc.xml”
XML-QL (continued)
WHERE <article author=$i> <title> </> ELEMENT_AS $t </>, <person ID=$i> <lastname> </> ELEMENT_AS $l </>CONSTRUCT <result> $t $l</>
![Page 20: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/20.jpg)
Chapter 9/20 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Scalar values
<title>A Trip to <titlepart> the Moon </titlepart></title> NOT!
<title><CDATA> A Trip to </CDATA><titlepart><CDATA> the
Moon</CDATA></titlepart></title> YES
![Page 21: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/21.jpg)
Chapter 9/21 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Tag variables
WHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>
![Page 22: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/22.jpg)
Chapter 9/22 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Transforming data
<!ELEMENT book (author+, title, publisher)> <!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?, (shortversion|
longversion))> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)>
<!ELEMENT person (lastname, firstname, address?, phone?, publicationtitle*)>
![Page 23: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/23.jpg)
Chapter 9/23 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Transforming data (cont’d)
WHERE <$> <author> <firstname> $fn </> <lastname> $ln </> </> <title> $t </> </> IN "www.a.b.c/bib.xml", CONSTRUCT <person ID=PersonID($fn, $ln)> <firstname> $fn </> <lastname> $ln </> <publicationtitle> $t </> </>
![Page 24: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/24.jpg)
Chapter 9/24 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Integrating data from different sources
WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml",
<taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>
![Page 25: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/25.jpg)
Chapter 9/25 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Query blocks
WHERE <$e> <title> $t </> <year> 1995 </> </> CONTENT_A $p IN "www.a.b.c/bib.xml" CONSTRUCT <result ID=ResultID($p)> <title> $t </> </> { WHERE $e = "journal-paper", <month> $m </> IN $p CONSTRUCT <result ID=ResultID($p)> <month> $m </>
</> } { WHERE $e = "book", <publisher>$q </> IN $p CONSTRUCT <result ID=ResultID($p)> <publisher>$q </>
</> }
![Page 26: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/26.jpg)
Chapter 9/26 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
XQuery
Successor to XML-QL, YAML, Lorel, Quilt
Supported by the W3C Draft only
![Page 27: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/27.jpg)
Chapter 9/27 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
DTD
<!ELEMENT bib (book* )><!ELEMENT book (title, (author+ | editor+ ), publisher, price )><!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )><!ELEMENT editor (last, first, affiliation )><!ELEMENT title (#PCDATA )><!ELEMENT last (#PCDATA )><!ELEMENT first (#PCDATA )><!ELEMENT affiliation (#PCDATA )><!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>
![Page 28: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/28.jpg)
Chapter 9/28 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Sample database<bib>
<book year="1994"><title>TCP/IP Illustrated</title><author>
<last>Stevens</last><first>W.</first>
</author><publisher>Addison-Wesley</publisher><price> 65.95</price>
</book> <book year="1992">
<title>Advanced Programming in the Unix environment</title><author>
<last>Stevens</last><first>W.</first>
</author><publisher>Addison-Wesley</publisher><price>65.95</price>
</book><book year="2000">
<title>Data on the Web</title><author>
<last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first>
</author><author>
<last>Suciu</last><first>Dan</first>
</author><publisher>Morgan Kaufmann Publishers</publisher><price>39.95</price>
</book></bib>
![Page 29: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/29.jpg)
Chapter 9/29 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Sample query
<bib>{for $b in
document("http://www.bn.com/bib.xml")/bib/bookwhere
$b/publisher = "Addison-Wesley“and
$b/@year > 1991return
<book year="{ $b/@year }">{ $b/title }
</book>}
</bib>
![Page 30: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/30.jpg)
Chapter 9/30 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Expected result
<bib><book year="1994">
<title>TCP/IP Illustrated</title></book><book year="1992">
<title>Advanced Programming in the Unix environment</title>
</book></bib>
![Page 31: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d775503460f94a59d36/html5/thumbnails/31.jpg)
Chapter 9/31 Copyright © 2004
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke
Pointers and Demos
http://www.w3.org/TR/xquery/ http://www.w3.org/TR/xmlquery-use-cases/ http://xml.org/ http://131.107.228.20/Default.aspx?
case=XMP&example=Q1#Query http://www.db.ucsd.edu/people/yannis/
XQueryTutorial.htm http://www.ex.ac.uk/~pellison/xml/multiple.htm http://seacow.eecs.umich.edu:8080/timberweb/