May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution...
-
Upload
suzanna-chapman -
Category
Documents
-
view
225 -
download
2
Transcript of May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution...
May 8, 2007 9:20 a.m. – 10:20 a.m.
Platform: DB2 for Linux, UNIX and Windows
DB2 9: XML Evolution and Revolution
Philip K. GunningGunning Technology Solutions, LLC
Session: E05
2
Outline
• XML in DB2 LUW till DB2 9 time• Shredding• CLOBs
• XML only databases• TIMBER, Niagara, Natix
• Followed by bliss for several years…• XML Databases Fundamental differences with
Relational Databases
3
Outline
• Then IBM shook-up the database world WITH DB2 9 HYBRID DATA SERVER
• Extensible Optimizer and DB2 9• Why Native XML data type?• pureXML™
4
Outline
• Pure XML Implementation• Pure XML -- Key Enablers• SQL/XML• XPath/XDM• XQuery• Developer Workbench• XQuery Builder
• Explain Facility and Visual Explain
5
Disclaimer
• DB2 9 is a registered trademark of IBM Corp.• pureXML is a registered trademark of IBM Corp.• DB2 9 Sample queries and programs are copyrights of
IBM Corp.
• DB2 for z/OS is a registered trademark of IBM Corp.
• Developer Workbench and Visual Explain are copyrights of IBM Corp.
6
Shredding
• Early implementations of XML support in databases used shredding to shred XML to columns in relation tables• Mapping + Parsing = Overhead• Retrieval of whole document or parts • Entire document replaced if update required• Lack of flexibility
7
CLOBs
• Stored entire XML document as text
• High cost of retrieval• Not buffered• Poor search performance and parsing• Lack of flexibility
8
Key Factors in IBM Approach
• “XML and Relational data coexist and complement each other in enterprise solutions”
• “A successful XML repository requires much of the same infrastructure that already exists in a RDBMS system”
• “XML query languages have considerable conceptual and functional overlap with SQL”
DB2 goes hybrid: Integrating native XML and XQuery with relational data and SQLIBM Systems Journal, Vol 45 NO 2, 2006, Beyer, et al
9
Revolutionary ApproachDB2 9 pureXML Framework
• DB2 Optimizer was extensible
• XML Native data type
• Enables XML data to be treated natively
• Native XML data types enables better performance (less overhead versus legacy methods) via optimization and XML indexes
• Industry schemas supported
10
Fundamental Differences
• DB2 9 native XML data type takes advantage of years of relational database research• 20+ years of optimization advancements
• Extensive query rewrite plus new rewrites
• Uses underlying optimization and storage components
• Same or enhanced APIs
11
PureXML Framework Implementation
• Key Enablers• Extensible Optimizer• XML and SQL Integration• XQuery, XDM, XPath, SQL/XML, • Development Tooling
• Developer Workbench• XQuery Builder• Explain Support, including Visual Explain
12
SQL/XML Parser XQuery Parser
Semantics Checking
Optimizer Phase
Rewrite Phase
Code Generation
QueryPlan
QGMX
Hybrid SQL/XQuery Compiler
13
DB2 Client Application
SQL/XML XQuery
Relational
InterfaceXSR/Catalogs
XML
Interface
DB2 Engine
DB2 STORAGE
XMLRelational
DB2 9 Hybrid Data Server Architecture
14
Tight Integration
15
XQuery Defined
• SQL is the query language for relational databases
• XQuery is the query language for XML as defined by the W3C organization
• Built-in support provided in DB2 9 by query compiler and built-in XQuery functions
16
INPUT FUNCTIONS
17
DB2 9 XML Input
• SQL INSERT Statement
• Input to the XML column must be a well-formed XML document• Defined in XML specification
• Clients send XML documents in textual representation and DB2 uses a Simple API for XML (SAX) parser• “formness” • Validation
• If XML data type, serialization performed by DB2 implicitly
• XMLPARSE function for non-XML data type
18
DB2 9 Annotated XML Schema Decomposition
• Data from XML documents decomposed into relational and XML columns using the annotated XML Schema decomposition• Stores data into columns according to
annotations contained in XML schema documents
• XML Schema Registry (XSR) Registration
• Schemas registered with DB2 supplied Stored Procedure or via Command Line Processor
19
DB2 9 XML Input -- IMPORT
• Import utility enhanced to support import of XML documents
• Validation optional
• Schema must be registered in DB2 XML Schema Repository (XSR) if validation performed
20
OUTPUT FUNCTIONS
21
DB2 9 XML Output Functions
• db2-fn:xmlcolumn function• Takes a string literal as input that identifies an
XML column and returns an XML sequence that consists of all document nodes in specified columns
22
DB2 9 XML Output Functions
• db2-fn:sqlquery function• Used to restrict input to an XQuery by
conditions placed on relational columns in the same or related tables
• Returns a single column• Based on SQL Fullselect
23
DB2 9 XML Output -- EXPORT
• EXPORT utility supports XML data type
• XML data stored separately from exported relational data
• Details about exported XML represented in main exported file by an XML data specifier (XDS)
24
XQuery Data Model (XDM)
• XQuery Data Model (XDM) is used to define an instance of an XDM sequence
• An instance of the XDM is a sequence• Sequence is an ordered collection of zero or
more items• An item is either an atomic value or a node
• Sequence – 48, <car/>, (6,7,8), (48,<car/>,(6,7,8))• () (an empty sequence), an XML document, 48
25
DATABASE DESIGN
26
Relational – XML
• Relational is highly structured
• Represented by well defined entities and relationships
• XML is hierarchical in form, unstructured and can be very complex• Represented in a tree format defined by XPath
W3C standard
27
Relational vs. XML Database Design
• Relational• Frequency of updates• Design is fixed• Max performance req• Stays relational• Meaning outside hierarchy• Specific attributes• Large Fact and dimension
tables• RI Required
• XML• Design Changes• Flexibility desired• Not use relationally
downstream• Only hierarchical• Many attributes and
only subset applicable• Only subset applicable• Small dimensions in
STAR schema
28
XML Indexes
• Value Indexes• Path-specific value indexes on XML columns• Elements and attributes used in predicates and cross-
document joins• Full-text indexes
• Indexes can be defined on any native XML column• Documents can be fully or partially indexed• Enables just certain parts of documents to be subject to full-
text search• Text index maintained asynchronously via “lazy” update
• Regions Indexes• Connects documents that span multiple pages • Created automatically by DB2
29
XML Storage
• Relational data stored in tables and columns
• XML data stored in hierarchical type-annotated tree format
• XML document stored separately outside of table
• XML Data Specifier (XDS) stored in table describes XML document
30
XML Storage
• Documents must be able to span disk pages• Single text node may be larger than a page
• Direct Node Access• Not feasible to traverse every node (could be
several gigabyte document)
• Must support existing isolation levels, logging and recovery mechanisms
31
XML Storage
• DB2 uses a structured, type-annotated tree
• Stored in binary representation to avoid repeated parsing and validating of the document
• Digital signatures preserved
• Each node contains its type information
• Type information on the document level enables schema evolution• Each document in a column can conform to a different
schema or different versions of evolving schema
32
XML Storage
• Each node contains pointers to parent and children• Supports efficient navigational queries
• Path expressions are evaluated directly for the native format on buffered pages without copying or transforming the data
• Extra information stored with each node• Type annotation if validated• Each element node has set of child slots for
associate attribute and ordered children
33
XML Storage
• Child slots have hints within them • Give indication of what the child represents• Enables fast navigation across a context node’s set of
children without actually visiting each child node• Child page may be on a different page and require I/O
• A unique identifier gives each node a logical and physical addressability• Can be used in indexing and query evaluation
• Large document trees may not fit on one page• Can be split into regions via region index
34
BUILDING APPLICATIONS
35
Key DB2 9 XML Enablers
• Build with Developer Workbench
• Test with Developer Workbench
• Deploy and Maintain with Developer Workbench
• Replaces former Development Center• Migration support for existing documents
• Eclipse Framework based tool
36
Key DB2 9 XML Enablers• Developer Workbench
• Separate download at http://www-306.ibm.com/software/data/db2/ad/
37
XML Sample Schema Definition
38
XML-XQuery SP
39
Visual Explain Support
40
41
XML Schema Definition
42
XPath Example
43
Summary
• pureXML™ Framework
• SQL/XML
• XQuery/XPath
• XDM and XSR
• XML Storage and XML Indexes
• Developer Workbench• Build, Test, Deploy and Maintain!
• Additional Features coming in DB2 9 for z/OS
44
Thanks!Philip K. Gunning
Gunning Technology Solutions, LLC
Session: E5DB2 9: XML Evolution and Revolution