XML and Databases Aug’10 – Dec ’10. Introduction volume of XML used by businesses is...

31
XML and Databases XML and Databases Aug’10 – Dec ’10

Transcript of XML and Databases Aug’10 – Dec ’10. Introduction volume of XML used by businesses is...

XML and DatabasesXML and Databases

Aug’10 – Dec ’10

Introduction

•volume of XML used by businesses is increasingvolume of XML used by businesses is increasing

•Many websites use XML as a data store, which is transformed into Many websites use XML as a data store, which is transformed into HTML or XHTML for online displayHTML or XHTML for online display

•XML data supplied directly to data stores such as Microsoft Access or XML data supplied directly to data stores such as Microsoft Access or SQL Server from forms filled in by a variety of information workersSQL Server from forms filled in by a variety of information workers

•XML is being used increasingly for business-critical data, some of XML is being used increasingly for business-critical data, some of which is particularly which is particularly confidentialconfidential

•issues such as security, scalability reliabilityissues such as security, scalability reliability

Aug’10 – Dec ’10

This chapter includesThis chapter includes

❑ ❑ Use cases for Use cases for XML-enabled database systemsXML-enabled database systems

❑ ❑ How to perform foundational tasks using How to perform foundational tasks using eXist,eXist, an Open an Open Source native XML databaseSource native XML database

❑ ❑ How to use some of the XML functionality in How to use some of the XML functionality in Microsoft SQL Microsoft SQL Server and MySQLServer and MySQL, two major relational databases with XML , two major relational databases with XML functionalitiesfunctionalities

Aug’10 – Dec ’10

The Need for Efficient XML Data StoresThe Need for Efficient XML Data Stores

•If XML is stored as text documents, how can it be processed If XML is stored as text documents, how can it be processed efficientlyefficiently??

•When volumes of XML data grow, the efficiency of searching becomes When volumes of XML data grow, the efficiency of searching becomes important – the addition of indexes to speed up important – the addition of indexes to speed up searchingsearching that XML that XML becomes increasingly necessarybecomes increasingly necessary

Example : in XML web services, performance is of great Example : in XML web services, performance is of great importance if the user is to feel that the system is importance if the user is to feel that the system is sufficiently sufficiently responsiveresponsive

•If data is stored as something other than XML, how fast data can be If data is stored as something other than XML, how fast data can be transformed into XML comes into playtransformed into XML comes into play

•Issues of Issues of reliabilityreliability also come into play also come into play

Aug’10 – Dec ’10

The Increasing Amount of XMLThe Increasing Amount of XML

•XML has enormous flexibility in representing data.XML has enormous flexibility in representing data.

•It can represent data structures that are difficult or inefficient to It can represent data structures that are difficult or inefficient to represent as relational datarepresent as relational data

•Native XML database is a database designed primarily or only to Native XML database is a database designed primarily or only to handle XML datahandle XML data

•the term the term structured data refers primarily to relational structured data refers primarily to relational datadata

•Semi-structured data is Semi-structured data is a term used to refer to nonrelational data, a term used to refer to nonrelational data, very often XML data very often XML data

•Loosely structured data typically refers to document-centric XMLLoosely structured data typically refers to document-centric XML

Aug’10 – Dec ’10

Comparing XML-Based Data and Relational Comparing XML-Based Data and Relational DataData

In a relational database – no ordering of data In a relational database – no ordering of data

XML - document order is intrinsically presentXML - document order is intrinsically present

Relational databases, as typically structured, have no hierarchy Relational databases, as typically structured, have no hierarchy

XML documents, which are intrinsically hierarchicalXML documents, which are intrinsically hierarchical

Relational database – use of keysRelational database – use of keys

relationship between tablesrelationship between tables

Storing even simple data in a relational table -> loss of ordering.Storing even simple data in a relational table -> loss of ordering.

May need to assemble the data in XML at a later date to recapture May need to assemble the data in XML at a later date to recapture the original structurethe original structure

Aug’10 – Dec ’10

Approaches to Storing XMLApproaches to Storing XML

Storing XML on File Systems:Storing XML on File Systems:

•The very notion of an XML “document” suggests storage on The very notion of an XML “document” suggests storage on disk just like you store any other kind of “document” on your disk just like you store any other kind of “document” on your desktopdesktop

•Many applications continue to store XML documents on file Many applications continue to store XML documents on file systemssystems

•why XML databases have been so slow to take off ??? because why XML databases have been so slow to take off ??? because storing XML documents on file systems works so wellstoring XML documents on file systems works so well

•hierarchical organization of a file system hierarchical organization of a file system is very is very similarsimilar to tothe the hierarchical organization of a file documenthierarchical organization of a file document. .

Aug’10 – Dec ’10

Limitations of storing XML documents on file Limitations of storing XML documents on file systemssystems

•Document SizeDocument Size important factor: granularity of the information you need to important factor: granularity of the information you need to

retrieveretrieve if you need to retrieve small pieces from big documents if you need to retrieve small pieces from big documents

through DOM or XPath, you will incur a huge overhead -through DOM or XPath, you will incur a huge overhead -have to read the full document before you can extracthave to read the full document before you can extract

•UpdatesUpdates• If you want to enable multiple users to update these documents, or, If you want to enable multiple users to update these documents, or,

even worse, if you’re writing a transactional application, you need to even worse, if you’re writing a transactional application, you need to take extra care to perform these updatestake extra care to perform these updates

• Solution : use a version control system such as Subversion Solution : use a version control system such as Subversion (http://subversion.tigris.org/)(http://subversion.tigris.org/)

Aug’10 – Dec ’10

Limitations contd…

•IndexesIndexes• Issue if you store your documents on disk: queries.Issue if you store your documents on disk: queries.• need to implement some kind of indexing mechanismneed to implement some kind of indexing mechanism• If you have few predefined fields to index, you can use the directory If you have few predefined fields to index, you can use the directory

structure as an structure as an index • with Subversion, you can easily get a list of documents for a specific with Subversion, you can easily get a list of documents for a specific

version, committed by a specific user, modified between two dates, version, committed by a specific user, modified between two dates, and so onand so on

• full-text search- use a search engine such as Lucene (full-text search- use a search engine such as Lucene (http://lucene.apache.org/))

Building Your OwnBuilding Your Own: Although most issues can be worked around, keeping : Although most issues can be worked around, keeping XML documents on disk with write access and indexes is a “build your own” XML documents on disk with write access and indexes is a “build your own” kind of solution and exposes you to a fair amount of integration work. Bykind of solution and exposes you to a fair amount of integration work. By

contrast, XML databases give you a much more packaged approachcontrast, XML databases give you a much more packaged approach

Aug’10 – Dec ’10

Build your own…

•XML databases may not have the features you find in a XML databases may not have the features you find in a version control system and for full-text search.version control system and for full-text search.

• Most XML databases do not match search engine features. Most XML databases do not match search engine features.

• you can save a lot of time by using a stable XML database you can save a lot of time by using a stable XML database instead of adding a bunch of software on top of your file instead of adding a bunch of software on top of your file system storage to implement features that are natively system storage to implement features that are natively available in these databases.available in these databases.

Aug’10 – Dec ’10

Using XML With Conventional DatabasesUsing XML With Conventional Databases

•Relational databases are one of the most popular ways to store data.

•They are mature, very well fitted to store structured data, store a huge amount of legacy data, and are well understood by a large number of developers.

•These reasons make them good candidates to use together with XML

Producing XML from Relational DatabasesProducing XML from Relational Databases

•Large numbers of HTML and XHTML websites are created, directly or Large numbers of HTML and XHTML websites are created, directly or indirectly, from relational dataindirectly, from relational data

•Data is stored conventionally as relational tables, and the programmer writes Data is stored conventionally as relational tables, and the programmer writes code to create HTML or XHTML, sometimes using XML as an intermediate stagecode to create HTML or XHTML, sometimes using XML as an intermediate stage

•it is possible to map relational data to hierarchical XML structures and return it is possible to map relational data to hierarchical XML structures and return those hierarchical structures to a userthose hierarchical structures to a user

Aug’10 – Dec ’10

Moving XML to Relational DatabasesMoving XML to Relational Databases

•many relational databases allow XML to be returned to the user from data held many relational databases allow XML to be returned to the user from data held in relational tablesin relational tables

•Similarly, many relational database management systems now have the Similarly, many relational database management systems now have the capability to accept XML data from a user, convert it into a relational form, and capability to accept XML data from a user, convert it into a relational form, and then store that latter data in relational tables.then store that latter data in relational tables.

•ShreddingShredding refers to processing XML and inserting its contents into standard refers to processing XML and inserting its contents into standard database tables.database tables.

•it may be possible to reconstitute the original XML documentit may be possible to reconstitute the original XML document

•Data BindingData Binding

•Data binding frameworks acknowledge the fact that several representations of Data binding frameworks acknowledge the fact that several representations of the same data need to coexist in applications, automating the mapping the same data need to coexist in applications, automating the mapping between those representationsbetween those representations

•XML, SQL databases, and objects- representations supportedXML, SQL databases, and objects- representations supported

Aug’10 – Dec ’10

Data binding cont…

Data binding frameworks that can directly map XML and SQL Data binding frameworks that can directly map XML and SQL databases include databases include

• ADO.NET(http://msdn.microsoft.com/data/ref/adonet/) ADO.NET(http://msdn.microsoft.com/data/ref/adonet/) in Microsoft’s world andin Microsoft’s world and

• Castor (http://www.castor.org/) in the Java open-Castor (http://www.castor.org/) in the Java open-source communitysource community

•Depending on the situation, the XML or XHTML is generated Depending on the situation, the XML or XHTML is generated manually, through templates, or through another data binding librarymanually, through templates, or through another data binding library

Aug’10 – Dec ’10

Native XML DatabasesNative XML Databases

•a native XML database is designed to store XMLa native XML database is designed to store XML

•A native XML database might choose to implement XML using a model like the A native XML database might choose to implement XML using a model like the XML Infoset, the XMLDOM, XPath. XML Infoset, the XMLDOM, XPath.

•It is also likely to capture aspects of an XML document,such as document It is also likely to capture aspects of an XML document,such as document order.order.

•native XML databases: recent, not the same theoretical underpinning as RDBs, native XML databases: recent, not the same theoretical underpinning as RDBs, evolvingevolving

•a native XML database product also maps an XML documenta native XML database product also maps an XML document

to the storage model. to the storage model.

•The mapping differs substantially from the detail of the shreddingThe mapping differs substantially from the detail of the shredding

Aug’10 – Dec ’10 e

Native XML DB contd…

•Native XML databases often store XML documents in collections, and Native XML databases often store XML documents in collections, and queries can be made across a collectionqueries can be made across a collection

•a collection may be defined by a schema or may contain documents a collection may be defined by a schema or may contain documents of differing structureof differing structure

•many native XML databases use XQuery as the query language, even many native XML databases use XQuery as the query language, even though it is not yet a W3C Recommendationthough it is not yet a W3C Recommendation

•Updates to native XML databases currently lack standardizationUpdates to native XML databases currently lack standardization

•XQuery 1.0 lacks insert, delete, and update functionalityXQuery 1.0 lacks insert, delete, and update functionality

•Microsoft’s SQL Server, Oracle, Sybase Adaptive ServerMicrosoft’s SQL Server, Oracle, Sybase Adaptive Server

Enterprise, and IBM’s DB2 9 have the ability to store a new xml Enterprise, and IBM’s DB2 9 have the ability to store a new xml datatype without discarding their traditional strengths as relational datatype without discarding their traditional strengths as relational database management systemsdatabase management systems

Aug’10 – Dec ’10

Native XML DB contd…

•Whether use a native XML database or an XML-enabled relational Whether use a native XML database or an XML-enabled relational database product –doesn’t matter! database product –doesn’t matter!

•Three very different database examples of native XML databases and Three very different database examples of native XML databases and XML-enabled database management systems:XML-enabled database management systems:

❑ ❑ eXist is the most eXist is the most mature open-source mature open-source XML database, written in XML database, written in Java.Java.

❑ ❑ SQL Server is a Microsoft enterprise-capable relational database SQL Server is a Microsoft enterprise-capable relational database management system with some XML functionality.management system with some XML functionality.

❑ ❑ MySQL is the open-source database most widely used to power MySQL is the open-source database most widely used to power websites. Its XML capabilities are still well behind those of its websites. Its XML capabilities are still well behind those of its commercial competitorscommercial competitors

Aug’10 – Dec ’10

Using Native XML DatabasesUsing Native XML Databases

Obtaining and Installing eXist:Obtaining and Installing eXist:•ready-to-run native XML database can be used in ready-to-run native XML database can be used in three three different modes:different modes:

❑ ❑ You can use eXist You can use eXist as a Java library as a Java library to embed a database to embed a database server in your own Java application.server in your own Java application.

❑ ❑ You can run it You can run it as a standalone database serveras a standalone database server as you as you would run a SQL database server.would run a SQL database server.

❑ ❑ You can run it embedded in a web server and get the You can run it embedded in a web server and get the features of both a standalone database and a web features of both a standalone database and a web interface interface to access the database.to access the database.

Aug’10 – Dec ’10

Using eXist in the last two modes Using eXist in the last two modes

using a different set of scripts that you can find in its bin using a different set of scripts that you can find in its bin subdirectory:subdirectory:

❑ ❑ server (.sh or .bat depending on your platform) is used to server (.sh or .bat depending on your platform) is used to run eXist as a standalone database server.run eXist as a standalone database server.

❑ ❑ startup (.sh or .bat) is used to start eXist embedded in a startup (.sh or .bat) is used to start eXist embedded in a web server, and shutdown (.sh or.bat) is used to stop this web web server, and shutdown (.sh or.bat) is used to stop this web server.server.

Aug’10 – Dec ’10

Opening eXist home page

Aug’10 – Dec ’10

Using the Web Interface

AdministrationAdministration

Log in with user name and passwordLog in with user name and password

Only registered users are allowedOnly registered users are allowed

Once logged in you have access to commandsOnce logged in you have access to commands

Aug’10 – Dec ’10

Browsing Collection

Aug’10 – Dec ’10

Create Collection

Create new collection, Collection1Create new collection, Collection1

Upload xml documents in the newly created collectionUpload xml documents in the newly created collection

The documents will be stored in /db/Collection1The documents will be stored in /db/Collection1

Aug’10 – Dec ’10

Newblog.xml

<?xml version=”1.0”?><item id=”1”><title>Working on Beginning XML</title><description><p><a href=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-0764570773.html”><imgsrc=”http://media.wiley.com/product_data/coverImage/73/07645707/0764570773.jpg”align=”left”/></a> I am currently working on the next edition of <ahref=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-0764570773.html”>WROX’s excellent “Beginning XML”.</a></p></description><category>English</category><category>XML</category><category>Books/Livres</category><pubDate>2006-11-13T17:32:01+01:00</pubDate><comment-count>0</comment-count></item>

Aug’10 – Dec ’10

XQuery Sandbox

Web Interface for querying XML documentsWeb Interface for querying XML documentshttp://localhost:8080/exist/sandbox/sandbox.xql.Query example : /item[@id=‘1’]

Aug’10 – Dec ’10

XQuery Sandbox

Newblog.xmlNewblog.xml

To determine the title, id and links of blog entries with a link To determine the title, id and links of blog entries with a link on Wrox siteon Wrox site

for $item in /itemwhere .//a[contains(@href, ‘wrox.com’)]return <match><id>{string($item/@id)}</id>{$item/title}{$item//a[contains(@href, ‘wrox.com’)]}</match>

Aug’10 – Dec ’10

XQuery Sandbox

Aug’10 – Dec ’10

eXist client

Standalone graphical tool that can perform the same kind of Standalone graphical tool that can perform the same kind of operations as a web interfaceoperations as a web interface

The following operations can be performed once logged in The following operations can be performed once logged in with username and password :with username and password :

Browse CollectionsBrowse Collections

Open and edit documentsOpen and edit documents

Query documents using XQuery or XPathQuery documents using XQuery or XPath

Trace tab in results window which shows the execution path of Trace tab in results window which shows the execution path of queries.queries.

Aug’10 – Dec ’10

eXist client

Aug’10 – Dec ’10

eXist client

Aug’10 – Dec ’10

WebDAV

Web-based Distributed Authoring and VersioningWeb-based Distributed Authoring and Versioning

define how HTTP can be used to not only read resources, but also to define how HTTP can be used to not only read resources, but also to write them write them

properties (creation, removal, and querying of information)properties (creation, removal, and querying of information)

Collections: Group resources into collections that are organized like Collections: Group resources into collections that are organized like

a file system, similar to a directory or desktop foldera file system, similar to a directory or desktop folder

Locking: Use locks to prevent others from editing the same content Locking: Use locks to prevent others from editing the same content

you're working on in WebDAVyou're working on in WebDAV

Aug’10 – Dec ’10

XML IDE

One feature not present in WebDAV is the capability to One feature not present in WebDAV is the capability to execute queriesexecute queries

XML IDE can access the eXist database through WebDAVXML IDE can access the eXist database through WebDAV

Also queries can be executed from the IDE itselfAlso queries can be executed from the IDE itself

Aug’10 – Dec ’10