XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet...
Transcript of XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet...
![Page 1: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/1.jpg)
XQuery and Hierarchical Naming
Zachary G. IvesUniversity of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
February 7, 2008
![Page 2: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/2.jpg)
2
Today
Reminder: Homework 1 due 2/12 @ 11:59PM
XQuery and joins
Addressing vs. naming
Hierarchical names
![Page 3: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/3.jpg)
3
XQuery’s Basic Form
The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes
“FLWOR” statement pattern:for {iterators that bind variables}let {collections}where {conditions}order by {order-conditions}return {output constructor}
![Page 4: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/4.jpg)
4
Example XML DataRoot
?xml dblp
mastersthesisinproceedings
mdate key
author title year
school
author title yearcrossref ee
mdatekey
2002…
ms/Brown92
Kurt Brown
PRPL…
1992
wisc
2002..
conf/sigm../
Paul R.
On…
sigmod-97
1997
www…
university
namekey
wisc
Wisconsin
country
USA
![Page 5: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/5.jpg)
5
XQuery and Joins
for $i in doc (“dblp.xml”)/dblp/inproceedings, $r in $i/crossref/text(), $c in doc (“dblp.xml”)/dblp/conf, $n in $c/@name
where $c = $rreturn <result>{ $i, $c }</result>
![Page 6: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/6.jpg)
6
Some Uses for Join in XML
Translation between values SSN PennID
Joining or combining information Amazon invoice info + UPS tracking info
Restructuring information <author><book>…</book>
<book>..</book></author> <book><author>…</author> <author>…</author></book>
Here, we separate authors from books, then join them back in “upside-down” fashion
![Page 7: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/7.jpg)
7
Changing Nesting of XML Content
Re-nesting XML trees is a common operationSimply nest the query blocks and correlate them – similar to
join
for $u in doc(“dblp.xml”)/dblp/university, $n = $u/name/text(), $k = $u/@key
where $u/country = “USA”return <ms-theses-92-by-univ>
{ $n } { for $mt in $u/../mastersthesis, $inst in $mt/school/text() where $mt/year/text() = “1992” and _______________ return $mt/title} </ms-theses-92-by-univ>
![Page 8: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/8.jpg)
8
Collections & Aggregation in XQuery
Given a collection, we can compute an average, count, etc. of its members:
<article-authors>{
for $paper in doc(“dblp.xml”)/dblp/inproceedingslet $pauth := $paper/authorreturn <paper> { $paper/title }
<count> { fn:count($pauth) } </count>
</paper>} </article-authors>
a collection
![Page 9: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/9.jpg)
9
Sorting in XQuery
We can order the sequence of “result tuples” output by the return clause:
for $x in doc(“dblp.xml”)/proceedingsorder by $x/title/text()return $x
![Page 10: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/10.jpg)
10
Querying & Defining Tags
Can get a node’s name by querying node-name():for $x in document(“dblp.xml”)/dblp/*return node-name($x)
Can construct elements and attributes using computed names:
for $x in document(“dblp.xml”)/dblp/*,$year in $x/year,$title in $x/title/text(),
element { node-name($x) } {attribute {“year-” + $year} { $title }
}
![Page 11: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/11.jpg)
11
XQuery Summary Very flexible and powerful language for XML
Focus is on database-style operations like joins Performs tasks that can’t be done with XPath or XSLT and that
are tedious to program in Java: Integrating information from multiple sources Joins, based on correspondences of values Computing count, average, etc.
Today, XQuery is available: In RDBMSs (SQL Server, Oracle, DB2) and XML DBMS systems
(MarkLogic) As the basis of research prototypes for “XQuery full text” As the basis of “XQueryP” – a Web Services/AJAX programming
language based on XQuery but with programming language features
http://2006.xmlconference.org/programme/presentations/38.html
We will discuss data integration and middleware later in the course
![Page 12: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/12.jpg)
12
Hierarchical Naming Schemes
Thus far, we’ve seen XPath as a hierarchical naming scheme “Content-based naming”: describe the
structure and values of a tree structure Assumption: XML tree resides in (or is being
sent to) one place
But hierarchy is often used for naming and location
![Page 13: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/13.jpg)
13
How Do We Find Things on the Internet?
Generally, using one of three means: Addresses or locations: specify where something is,
assuming that we understand how to navigate Just like a physical address, we may still need a map! In the Internet, addresses are typically IP addresses – the
routers know the map Names: are mapped into addresses via lookup services
Best-known example on the Internet: DNS name Cell phone numbers, email addresses, etc. are becoming
names Content-based addressing/naming
The actual data value is somehow used to find its location The basis of publish-subscribe systems and peer-to-peer
architectures
![Page 14: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/14.jpg)
14
The Simplest Way of Going fromNames or Content Locations
Directory-based lookup protocols are very common
Examples: Napster 1.0 – peer-to-peer storage with central
directory Inverted index – used to look up keywords in
information retrieval DNS – distributed hierarchical directory LDAP – hierarchical Directory Information Tree
![Page 15: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/15.jpg)
15
Napster 1.0, ca 2002
Hybrid of peer-to-peer storage with central directory showing what’s currently available What are the trade-offs implicit in this model? Why did it
fail?
Napster.com
Peer1
Peer2
Peer3
jjackson-lame.mp3
bspears-oops.mp3
jjackson-lame.mp3
jjackson-lamebspears-oops
Directory
![Page 16: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/16.jpg)
Other Services with Similar Directory + Peer Architectures
FolderSync – now owned by Microsoft Google Desktop Search with multiple
machines
BitTorrent trackers are quite similar (we’ll discuss BitTorrent more later)
16
![Page 17: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/17.jpg)
17
Inverted Indices
A “forward index”: documents to words The “inverted index”: words to word-
occurrences
The basis of most information retrieval engines, Google, etc. Can handle positional predicates … But how can we reconstruct previews?
![Page 18: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/18.jpg)
18
Naming People and Devices: LDAP
Lightweight Directory Access Protocol Hierarchical naming system that can be
partitioned and replicated
![Page 19: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/19.jpg)
19
LDAP’s Schema
LDAP information has an XML-like schema: A unique name in LDAP is called a Distinguished Name,
“dn” and consists of a sequence of attributes representing a hierarchy, from most-specific to least-specific (as in DNS names):
o = organization; dc = domain component ou = organizational unit uid = user ID cn = common name
c = country; st = state; l = locality
Can also have objectClass – the type of entity
![Page 20: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/20.jpg)
20
LDAP Hierarchy
Brad Marshall LDAP Tutorial, quark.humbug.au/publications/ldap_tut.html
![Page 21: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/21.jpg)
21
Querying LDAP
LDAP queries are mostly attribute-value predicates: uid=zives; o=upenn; c = usa
(|(cn=Susan Davidson)(cn=Zachary Ives)(cn=Val Tannen))
objectclass=posixAccount
(!cn=Val Tannen)
How does this differ from XPath? How might we process these queries?
![Page 22: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/22.jpg)
22
The Backbone of Internet Naming:Domain Name Service
A simple, hierarchical name system with a distributed database – each domain controls its own names
edu
columbia upenn berkeley
com
www cis sas
www wwwwww
amazon
www
……
……
…… …
…
Top LevelDomains
![Page 23: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/23.jpg)
23
Top-Level Domains (TLDs)
Mostly controlled by Network Solutions, Inc. today .com: commercial .edu: educational institution .gov: US government .mil: US military .net: networks and ISPs (now also a number of other
things) .org: other organizations 244, 2-letter country suffixes, e.g., .us, .uk, .cz, .tv, … and a bunch of new suffixes that are not very common,
e.g., .biz, .name, .pro, …
![Page 24: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/24.jpg)
24
Finding the Root
13 “root servers” store entries for all top level domains (TLDs)
DNS servers have a hard-coded mapping to root servers so they can “get started”
![Page 25: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/25.jpg)
25
Excerpt from DNS Root Server Entries
This file is made available by InterNIC registration services under anonymous FTP as ; file /domain/named.root ; ; formerly NS.INTERNIC.NET ; . 3600000 IN NS A.ROOT-
SERVERS.NET. A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4 ; ; formerly NS1.ISI.EDU ; . 3600000 NS B.ROOT-
SERVERS.NET.B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107 ; ; formerly C.PSI.NET ; . 3600000 NS C.ROOT-
SERVERS.NET.C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12
(13 servers in total, A through M)
![Page 26: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/26.jpg)
26
Supposing We Were to Build DNS
How would we start? How is a lookup performed?
(Hint: what do you need to specify when you add a client to a network that doesn’t do DHCP?)
![Page 27: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/27.jpg)
27
Issues in DNS
We know that everyone wants to be “my-domain”.com How does this mesh with the assumptions
inherent in our hierarchical naming system?
What happens if things move frequently? What happens if we want to provide
different behavior to different requestors (e.g., Akamai)?
![Page 28: XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.](https://reader033.fdocuments.net/reader033/viewer/2022042822/56649e375503460f94b27feb/html5/thumbnails/28.jpg)
28
Next Time…
We’ll look at alternative mechanisms for finding things: Publish-subscribe models Gossip protocols, such as in routers Flooding … and soon, peer-to-peer or content-based
routing