IBM Software Group ® Native XML Support in
Transcript of IBM Software Group ® Native XML Support in
![Page 1: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/1.jpg)
IBM Software Group
®
Native XML Support in DB2 Universal Database
Matthias Nicola, Bert van der LindenIBM Silicon Valley [email protected]
Sept 1, 2005
![Page 2: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/2.jpg)
IBM Software Group | DB2 Information Management Software
2
Agenda
Why “native XML” and what does it mean?
Native XML in the forthcoming version of DB2Native XML StorageXML Schema SupportXML IndexesXQuery, and the Integration with SQL
Summary
![Page 3: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/3.jpg)
IBM Software Group | DB2 Information Management Software
3
XML Databases
XML-enabled DatabasesThe core data model is not XML (but e.g. relational)Mapping between XML data model and DB’s data
model is required, or XML is stored as textE.g.: DB2 XML Extender (V7, V8)
Native XML DatabasesUse the hierarchical XML data model to store and
process XML internallyNo mapping, no storage as textStorage format = processing formatE.g.: Forthcoming version of DB2
![Page 4: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/4.jpg)
IBM Software Group | DB2 Information Management Software
4
Problems of XML-enabled Databases
CLOB storage:Query evaluation & sub-document level access requires
costly XML Parsing – too slow !
Shredding:Mapping from XML to relational often too complexOften requires dozens or hundreds of tablesComplex multi-way joins to reconstruct documentsXML schema changes break the mapping
• no schema flexibility !
• For example: Change element from single- to multi-occurrence requires normalization of relational schema & data
![Page 5: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/5.jpg)
IBM Software Group | DB2 Information Management Software
5
Integration of XML & Relational Capabilities in DB2
SERVER
CLIENT SQL/X
XQueryXMLInterface
RelationalInterface Relational
XML
DB2 Storage:
DB2 Client /Customer Client Application
Native XML data type • (not Varchar, not CLOB, not object-relational)
XML Capabilities in all DB2 componentsApplications combine XML & relational data
HybridDB2 Engine
![Page 6: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/6.jpg)
IBM Software Group | DB2 Information Management Software
6
Native XML Storage DB2 stores XML in parsed hierarchical format
(similar to the DOM representation of the XML infoset)
create table dept (deptID char(8),…, deptdoc xml);
Relational columnsare stored in relationalformat (tables)
XML is stored nativelyas type-annotated trees(representing theXQuery Data Model).
deptID … deptdoc
“PR27” … <dept> … <emp>…</emp></dept>
… … …
DB2 Storage
![Page 7: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/7.jpg)
IBM Software Group | DB2 Information Management Software
7
dept
name
employee
phoneid=901
John Doe
office
408-555-1212 344
name
employee
phoneid=902
Peter Pan
office
408-555-9918 216
0
1
4
25=901
John Doe
3
408-555-1212 344
1
4
24=902
Peter Pan
3
408-555-9918 216
<dept> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
Efficient Document Tree Storage
0 dept4 employee1 name5 id2 phone3 office
String tableSYSIBM.SYSXMLSTRINGS
1 String table per database Database wide dictionary for all
tags in all XML columns
Reduces storage Fast comparisons &
navigation
![Page 8: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/8.jpg)
IBM Software Group | DB2 Information Management Software
8
Information for Every Node
Tag name, encoded as unique StringID
A nodeID
Node kind (e.g. element, attribute, etc.)
Namespace / Namespace prefix
Type annotation
Pointer to parent
Array of child pointers Hints to the kind & name of child nodes
(for early-out navigation)
For text/attribute nodes: the data itself
![Page 9: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/9.jpg)
IBM Software Group | DB2 Information Management Software
9
XML Node Storage Layout
Node hierarchy of an XML document stored on DB2 pages Documents that don’t fit on 1 page: split into regions/pages Docs < 1 page: 1 region, multiple docs/regions per page
Example:
Document split into 3 regions, stored on 3 pages
Split can be at anylevel of the document
![Page 10: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/10.jpg)
IBM Software Group | DB2 Information Management Software
10
XML Storage: “Regions Index”
maps nodeIDs to regions & pages
allows to fetch required regions instead of full documents
allows intelligent prefetching
page page page
Regions index
not user defined, default component of XML storage layer
![Page 11: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/11.jpg)
IBM Software Group | DB2 Information Management Software
11
XML Schema Support & Validation in DB2
Use of XML Schemas is optional, on a per-document basis
No need for a fixed schema per XML column
Validation per document (i.e. per row)
Zero, one, or many schemas per XML column For example: different versions of the same schema,
or schemas with conflicting definitions
Mix validated & non-validated documents in 1 XML column
Schemas are registered & stored in the DB2 Schema Repository (XSR) for fast and stable access.
![Page 12: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/12.jpg)
IBM Software Group | DB2 Information Management Software
12
Validation using Schemas
Validate XML from a parameter marker using xsi:schemaLocation:
insert into dept(deptdoc) values (xmlvalidate(?))
Identify schema for a given document: select deptid, xmlxsrobjectid(deptdoc) from dept where deptid = “PR27”
Override schema location by referencing a schema ID or URI:
insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema id departments.deptschema)
insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema uri 'http://my.dept.com’)
![Page 13: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/13.jpg)
IBM Software Group | DB2 Information Management Software
13
XML Indexes for High Query Performance
Define 0, 1 or multiple XML Value Indexes per XML column
XML index maps: (pathID, value) (nodeID, rowID)
Index any elements or attributes, incl. mixed content
Index definition uses an XML pattern to specify which elements/attributes to index (and which not to)
Can index all elements/attributes, but not forced to do so
Can index repeating elements 0 , 1 or multiple index entries per document
New XML-specific join and query evaluation methods, evaluate multiple predicates concurrently with minimal index I/O
xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)
![Page 14: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/14.jpg)
IBM Software Group | DB2 Information Management Software
14
XML Indexing: Examplescreate table dept(deptID char(8) primary key, deptdoc xml);
create index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;
create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;
create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);
<dept bldg=101> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
…xmlpattern '//name' as sql varchar(35); (Index on ALL “name” elements)…xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)…xmlpattern '//text()‘ as sql varchar(hashed); (Index on ALL text nodes, hash code)…xmlpattern ‘/dept//name' as sql varchar(35);
…xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes under employee)
…xmlpattern 'declare namespace m="http://www.myself.com/"; /m:dept/m:employee/m:name‘ as
sql varchar(45);
![Page 15: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/15.jpg)
IBM Software Group | DB2 Information Management Software
15
Querying XML Data in DB2
The following options are supported:
XQuery/XPath as a stand-alone language
SQL embedded in XQuery
XQuery/XPath embedded in SQL/XML
Plain SQL for full-document retrieval
Compiler/Optimizer Details: Beyer et al. “System RX”, SIGMOD 2005 [2]
![Page 16: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/16.jpg)
IBM Software Group | DB2 Information Management Software
16
Example: XQuery as a stand-alone Language in DB2
create table dept(deptID char(8) primary key, deptdoc xml);
for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/deptlet $emp := $d//employee/namewhere $d/@bldg = > 95 order by $d/@bldgreturn <EmpList> {$d/@bldg, $emp} </EmpList>
db2-fn:xmlcolumn returns the sequence ofall documents in the specified XML column
<dept bldg=101> <employee id=901> <name>John Doe</name>
<phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
![Page 17: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/17.jpg)
IBM Software Group | DB2 Information Management Software
17
Examples: SQL embedded in XQuery create table dept(deptID char(8) primary key, deptdoc xml);
Identify XML data by a SELECT statementLeverage predicates/indexes on relational columns
for $d in db2-fn:sqlquery('select deptdoc from dept (single document) where deptID = “PR27” ')…
for $d in db2-fn:sqlquery('select deptdoc from dept (some documents) where deptID LIKE “PR%” ')…
for $d in db2-fn:sqlquery('select dept.deptdoc from dept, unit (some documents) where dept.deptID=unit.ID and unit.headcount > 200’)…..
for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept , $e in db2-fn:sqlquery('select xmlforest(name, desc)
from unit u’)…
(constructed documents)(join & combine XML and relational data)
![Page 18: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/18.jpg)
IBM Software Group | DB2 Information Management Software
18
Example: XQuery embedded in SQL/XML
SQL/XML Standard Functions: xmlexists, xmlquery, xmltable
create table dept(deptID char(8) primary key, deptdoc xml);
select deptID, xmlquery(‘for $i in $d/dept
let $j := $i//name return $j’ passing deptdoc as “d”) from deptwhere deptID LIKE “PR%”
and xmlexists(‘$d/dept[@bdlg = 101]’ passing deptdoc as “d“)
![Page 19: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/19.jpg)
IBM Software Group | DB2 Information Management Software
19
Other Features in DB2 native XML
XML Text Search SupportXML Import/ExportXML Type in Stored ProceduresAPI Extensions (JDBC, CLI, .NET, etc.)XML Schema RepositoryFull SQL/XML supportVisual XQuery BuilderAnnotated schema shredding…and more
![Page 20: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/20.jpg)
IBM Software Group | DB2 Information Management Software
20
Summary
CLOB and shredded XML storage restrict performance and flexibility
New native XML support in DB2:
Better Performance through Hierarchical & parsed XML representation at all layers Path-specific XML Indexing New XML join and query methods
Higher Flexibility through: Integration of SQL and XQuery Schemas are optional, per document, not per column Zero, one, or many XML schemas per XML column
![Page 22: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/22.jpg)
IBM Software Group | DB2 Information Management Software
22
BACKUP CHARTS
![Page 23: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/23.jpg)
IBM Software Group | DB2 Information Management Software
23
Shredding: A simple case
<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT> Department
DEPTID DEPTNAME15 Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750
![Page 24: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/24.jpg)
IBM Software Group | DB2 Information Management Software
24
Shredding: A schema change…“Employees are now allowed to have multiple phone numbers…”
<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <PHONE>415-010-1234</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT>
PhoneEMPNO PHONE
27 406-463-123410 415-010-1234
10 408-463-4963
Requires:• Normalization of existing data !• Modification of the mapping• Change of applications
Costly!
DepartmentDEPTID DEPTNAME
15 Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750
![Page 25: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/25.jpg)
IBM Software Group | DB2 Information Management Software
25
Native XML in DB2: Key Themes
Standards compliant + driving the standards XML, XQuery, SQL/XML, XML Schema…
Flexibility, because that is what XML is all about.. Zero, one, or many XML schemas per XML column
Native (hierarchical) Storage & Sophisticated XML Indexes New “pivot join” to evaluate many predicates concurrently
Integrated in DB2 Leveraging scalability, reliability, performance, availability…
Integrated with application APIs: JDBC, ODBC, .NET, embedded SQL, CLI,…
Integrated with SQL Access relational data and XML data in same statement
![Page 26: IBM Software Group ® Native XML Support in](https://reader036.fdocuments.net/reader036/viewer/2022062513/554fa14eb4c90586258b49bc/html5/thumbnails/26.jpg)
IBM Software Group | DB2 Information Management Software
26
XML Index DDL
create index idx1 on T(xmlcol) generate key using xmlpattern '/a/b/@c' as sql date;
Declaration & use of namespace prefix supported (not shown above)
AS SQL VARCHAR (integer)
CREATE index-name
ON table-name (xml-column-name) GENERATE KEY USING xmlpattern
UNIQUE
text()
@attribute-tag
@*
///
element-tag
*///
INDEX
DOUBLE
DATETIMESTAMP
VARCHAR (HASHED)
xmlpattern:
xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)