XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor...
-
Upload
howard-mcbride -
Category
Documents
-
view
216 -
download
0
Transcript of XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor...
XML – Data Model, DTD and Schema
ADVANCED DATABASES
Khawaja MohiuddinAssistant Professor
Department of Computer SciencesBahria University (Karachi Campus)
[email protected]://sites.google.com/site/khawajamcs
Content for this lecture is taken from: Chapter 11 of “Database Systems: Models, Languages …”, 6th Ed.” by Elmasri and Navathe(Chapter 12 of “Fundamentals of Database Systems” 6th Ed. by Elmasri and Navathe)
Topics to Cover
Structured, Semi-structured,and Unstructured Data
XML Hierarchical (Tree) Data Model XML Documents XML DTDs XML Schema Storing and Extracting XML Documents
from Databases
2
XML: Extensible Markup Language
Data sources Databases storing data for Internet applications
Hypertext documents Common method of specifying contents and
formatting of Web pages Static Web Pages Vs. Dynamic Web Pages
XML data model based on tree (hierarchical) structures as
compared to the flat relational data model structures
data extracted from relational databases can be formatted as XML documents to be exchanged over the Web
3
Structured, Semi-structured,and Unstructured Data Structured data
Represented in a strict format Example: information stored in databases
Semi-structured data May have a certain structure but not all
information collected will have identical structure
No predefined schema Schema information mixed in with data
values, since each data object can have different attributes that are not known in advance.
4
Structured, Semi-structured,and Unstructured Data (cont’d.) Semi-structured data (contd.)
Also referred to as Self-describing data May be displayed as a directed graph
Labels or tags on directed edges represent: Schema names Names of attributes Object types (or entity types or classes) Relationships
Internal nodes represent individual objects or composite attributes.
Leaf nodes represent actual data values of simple (atomic) attributes.
5
Structured, Semi-structured,and Unstructured Data (cont’d.)
6
Structured, Semi-structured,and Unstructured Data (cont’d.) Unstructured data
Very limited indication of the type of data Example: text document that contains
information embedded within it HTML tag
Text that appears between angled brackets: <...>
End tag Tag with a slash: </...>
7
Structured, Semi-structured,and Unstructured Data (cont’d.) HTML uses a large number of predefined
tags HTML documents
Do not include schema information about type of data
Static HTML page All information to be displayed explicitly
spelled out as fixed text in HTML file
8
9
XML Hierarchical (Tree) Data Model
Elements and attributes Main structuring concepts used to construct
an XML document Simple elements
Contain data values Complex elements
Constructed from other elements hierarchically
XML tag names Describe the meaning of the data elements
in the document
10
11
XML Hierarchical (Tree) Data Model (cont’d.)
Called Tree model or hierarchical model Three Main types of XML documents
Data-centric XML documents have many small data items that follow a specific
structure and hence may be extracted from a structured database.
formatted as XML documents in order to exchange them over or display them on the Web.
usually follow a predefined schema that defines the tag names
Document-centric XML documents documents with large amounts of text, such as
news articles or books. Contain few or no structured data elements
12
XML Hierarchical (Tree) Data Model (cont’d.) Hybrid XML documents
may have parts that contain structured data and other parts that are predominantly textual or unstructured.
may or may not have a predefined schema Schemaless XML documents
Do not follow a predefined schema of element names and corresponding tree structure
Semi-structured The value of the standalone attribute in an
XML document is yes <?xml version= “1.0” standalone=“yes”?>
13
XML Hierarchical (Tree) Data Model (cont’d.) XML attributes
Describe properties and characteristics of the elements (tags) within which they appear
possible to use for holding values of simple data elements; however, this is generally not recommended
May reference another element in another part of the XML document
Common to use attribute values in one element as the references. This resembles the concept of foreign keys in relational databases
14
XML Documents, DTD, and XML Schema
Well formed XML Documents Has XML declaration
• Indicates version of XML being used as well as any other relevant attributes
Must follow the syntactic guidelines of the tree data model Should have a single root element Every element must include a matching pair of
start and end tags within the start and end tags of parent element
Can be processed by generic processors that traverse the document and create an internal tree representation
Well formed XML documents can be schemaless
15
XML Documents, DTD, and XML Schema (cont’d.)
DOM (Document Object Model) A standard model with an associated set of
API Allows programs to manipulate the resulting
tree representation corresponding to a well-formed XML document
Whole document must be parsed before hand to convert the document to standard DOM internal data structure representation
16
XML Documents, DTD, and XML Schema (cont’d.)
SAX (Simple API for XML) Another API for processing of XML documents
on the fly Notifies processing program through callbacks
whenever a start or end tag is encountered Makes it easier to process large documents Allows for processing of streaming XML
documents process the tags as they are encountered also known as event-based processing
17
XML Documents, DTD, and XML Schema (cont’d.) Valid XML Documents
Document must be well formed and it must follow a particular schema
Start and end tag pairs must follow the structure specified in separate XML DTD (Document Type Definition) file or XML schema file
18
XML Documents, DTD, and XML Schema (cont’d.) XML DTD
Data types in DTD are not very general Special syntax
Requires specialized processors All DTD elements always forced to follow
the specified ordering of the document Unordered elements not permitted
19
XML DTD First, name of root tag Then, elements and
their nested structure * after element name
means element can be repeated zero or more times
+ after element name means element can be repeated one or more times
? after element name means element can be repeated zero or one time
No symbol after element name means, must appear exactly once
20
XML DTD (cont’d.)
Type of element is specified via parentheses following the element
Parentheses may include names of the children of the element
#PCDATA or other data types in parenthesis means a leaf node
PCDATA (Parsed Character Data) is similar to a string data type
The list of attributes can be specified via the keyword !ATTLIST
The ID type of an attribute means it can be referenced from another attribute whose type is IDREF within another element
Attributes can also be used to hold the values of simple data elements of type #PCDATA
Parentheses can be nested when specifying elements A bar symbol ( e1 | e2 ) specifies that either e1 or e2
can appear in the document
21
XML DTD (cont’d.)
<?xml version=“1.0” standalone=“no”?><!DOCTYPE Projects SYSTEM “proj.dtd”>
standalone=“no” means the document needs to be checked against a separate DTD document or XML schema document
The separate DTD document named "proj.dtd" should be stored in the same file system as the XML document
Alternatively, we could include the DTD document text at the beginning of the XML document itself
XML DTD has several limitations: data types are not very general has its own special syntax and thus requires specialized processors all DTD elements are always forced to follow the specified ordering of
the document These drawbacks led to the development of XML schema
22
XML Schema
XML schema language Standard for specifying the structure of XML
documents Uses same syntax rules as regular XML
documents Same processors can be used on both
As with XML DTD, XML schema is based on tree data model, with elements and attributes as the main structuring concepts
Borrows additional concepts from database and object models, such as keys, references, and identifiers
23
24
25
XML Schema (cont’d.)
XML schema concepts: XML Descriptions and XML namespaces
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
identifies the specific set of XML schema language elements (tags) being used by specifying a file stored at a Web site location
A commonly used standard for XML schema commands Each such definition is called an XML namespace, because it
defines the set of commands (names) that can be used File name is assigned to the variable xsd (XML schema
description) using the attribute xmlns (XML namespace), and this variable is used as a prefix to all XML schema commands (tag names)
xsd:element or xsd:sequence used later refers to the definitions of the element and sequence tags as defined in the file http://www.w3.org/2001/XMLSchema
26
XML Schema (cont’d.)
Annotations, documentation, language used xsd:annotation and xsd:documentation are used for
providing comments and other descriptions in the XML document.
The attribute xml:lang of the xsd:documentation element specifies the language being used, where en stands for the English language.
Elements and types the name attribute of the xsd:element tag specifies the
element name, which is called company for the root element in our example
The structure of the company root element is specified in our example as xsd:complexType
xsd:sequence structure of XML schema is used to further specify a sequence of departments, employees and projects
27
XML Schema (cont’d.)
First level elements Elements named employee, department, and project are
first level elements and each is specified in an xsd:element tag
If a tag has only attributes and no further subelements or data within it, it can be ended with the backslash symbol (/>) directly instead of having a separate matching end tag. It is called empty element.
Element types, minOccurs, and maxOccurs specify the type and multiplicity of each element in any
document that conforms to the schema specifications When specified as a type attribute in an xsd:element, the
structure of the element must be described separately, typically using the xsd:complexType element of XML schema. Examples: employee, department, and project elements
28
XML Schema (cont’d.)
If no type attribute is specified, the element structure can be defined directly following the tag, example: company root element
The minOccurs and maxOccurs tags are used for specifying lower and upper bounds similar to the *, +, and ? symbols of XML DTD.
If they are not specified, the default is exactly one occurrence.
Keys xsd:unique for specifying unique attributes xsd:selector to identify the element type that contains
the unique element xsd:field to identify the element name within it that is
unique. Examples: departmentNameUnique and projectNameUnique
xsd:key for specifying primary keys. Examples: projectNumberKey, departmentNumberKey
xsd:keyref for specifying foreign keysExample: departmentManagerSSNKeyRef
29
XML Schema (cont’d.)
Structures of complex elements xsd:complexType specifies the structures of the
complex elementsExamples: Department, Employee, Project, and Dependent
If no key constraints, subelements can be embedded within parent element definition
Composite attributes Also specified as complex types Examples: Address, Name, Worker and WorksOn These could have been directly embedded within their
parent elements
30
Storing and Extracting XML Documents from Databases
Most common approaches for storing and extracting1. Using a DBMS to store the documents as
text Relational or object DBMS can be used to store
whole XML documents as text fields within the DBMS records or objects
Can be used if DBMS has a special module for document processing
Would work for storing schemaless and document-centric XML documents
31
Storing and Extracting XML Documents from Databases
2. Using a DBMS to store document contents as data elements
Would work for storing a collection of documents that follow a specific XML DTD or XML schema.
Since documents' structure is same, a relational (or object) database can be designed to store the leaf-level data elements within the XML documents.
Would require mapping algorithms to design a database schema that is compatible with the XML document structure as specified in the XML schema or DTD
And to recreate the XML documents from the stored data.
These algorithms can be implemented either as an internal DBMS module or as separate middleware that is not part of the DBMS.
32
Storing and Extracting XML Documents from Databases (cont’d.)3. Designing a specialized system for storing
native XML data Based on the hierarchical (tree) model Such systems are being called Native XML
DBMSs Would include specialized indexing and querying
techniques, and would work for all types of XML documents.
Could also include data compression techniques to reduce the size of the documents for storage.
Examples of popular products offering native XML DBMS capability : Tamino by Software AG Dynamic Application Platform of eXcelon Oracle also offers a native XML storage option
33
Storing and Extracting XML Documents from Databases (cont’d.)4. Creating or publishing customized XML
documents from preexisting relational databases
Since enormous amounts of data is already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the Web.
This approach would use a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.
34
Conclusion
Three main types of data: structured, semi-structured, and unstructured
XML standard Tree-structured (hierarchical) data model XML documents and the languages for
specifying the structure of these documents There are several options for storing and
extracting XML documents from databases
35