XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor...

35
XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus) [email protected] https://sites.google.com/site/khawajamcs nt for this lecture is taken from: er 11 of “Database Systems: Models, Languages …”, 6th Ed.” by Elmasri and Nav ter 12 of “Fundamentals of Database Systems” 6th Ed. by Elmasri and Navathe)

Transcript of XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor...

Page 1: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML – Data Model, DTD and Schema

ADVANCED DATABASES

Khawaja MohiuddinAssistant Professor

Department of Computer SciencesBahria University (Karachi Campus)

[email protected]://sites.google.com/site/khawajamcs

Content for this lecture is taken from: Chapter 11 of “Database Systems: Models, Languages …”, 6th Ed.” by Elmasri and Navathe(Chapter 12 of “Fundamentals of Database Systems” 6th Ed. by Elmasri and Navathe)

Page 2: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Topics to Cover

Structured, Semi-structured,and Unstructured Data

XML Hierarchical (Tree) Data Model XML Documents XML DTDs XML Schema Storing and Extracting XML Documents

from Databases

2

Page 3: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML: Extensible Markup Language

Data sources Databases storing data for Internet applications

Hypertext documents Common method of specifying contents and

formatting of Web pages Static Web Pages Vs. Dynamic Web Pages

XML data model based on tree (hierarchical) structures as

compared to the flat relational data model structures

data extracted from relational databases can be formatted as XML documents to be exchanged over the Web

3

Page 4: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Structured, Semi-structured,and Unstructured Data Structured data

Represented in a strict format Example: information stored in databases

Semi-structured data May have a certain structure but not all

information collected will have identical structure

No predefined schema Schema information mixed in with data

values, since each data object can have different attributes that are not known in advance.

4

Page 5: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Structured, Semi-structured,and Unstructured Data (cont’d.) Semi-structured data (contd.)

Also referred to as Self-describing data May be displayed as a directed graph

Labels or tags on directed edges represent: Schema names Names of attributes Object types (or entity types or classes) Relationships

Internal nodes represent individual objects or composite attributes.

Leaf nodes represent actual data values of simple (atomic) attributes.

5

Page 6: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Structured, Semi-structured,and Unstructured Data (cont’d.)

6

Page 7: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Structured, Semi-structured,and Unstructured Data (cont’d.) Unstructured data

Very limited indication of the type of data Example: text document that contains

information embedded within it HTML tag

Text that appears between angled brackets: <...>

End tag Tag with a slash: </...>

7

Page 8: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Structured, Semi-structured,and Unstructured Data (cont’d.) HTML uses a large number of predefined

tags HTML documents

Do not include schema information about type of data

Static HTML page All information to be displayed explicitly

spelled out as fixed text in HTML file

8

Page 9: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

9

Page 10: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Hierarchical (Tree) Data Model

Elements and attributes Main structuring concepts used to construct

an XML document Simple elements

Contain data values Complex elements

Constructed from other elements hierarchically

XML tag names Describe the meaning of the data elements

in the document

10

Page 11: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

11

Page 12: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Hierarchical (Tree) Data Model (cont’d.)

Called Tree model or hierarchical model Three Main types of XML documents

Data-centric XML documents have many small data items that follow a specific

structure and hence may be extracted from a structured database.

formatted as XML documents in order to exchange them over or display them on the Web.

usually follow a predefined schema that defines the tag names

Document-centric XML documents documents with large amounts of text, such as

news articles or books. Contain few or no structured data elements

12

Page 13: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Hierarchical (Tree) Data Model (cont’d.) Hybrid XML documents

may have parts that contain structured data and other parts that are predominantly textual or unstructured.

may or may not have a predefined schema Schemaless XML documents

Do not follow a predefined schema of element names and corresponding tree structure

Semi-structured The value of the standalone attribute in an

XML document is yes <?xml version= “1.0” standalone=“yes”?>

13

Page 14: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Hierarchical (Tree) Data Model (cont’d.) XML attributes

Describe properties and characteristics of the elements (tags) within which they appear

possible to use for holding values of simple data elements; however, this is generally not recommended

May reference another element in another part of the XML document

Common to use attribute values in one element as the references. This resembles the concept of foreign keys in relational databases

14

Page 15: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Documents, DTD, and XML Schema

Well formed XML Documents Has XML declaration

• Indicates version of XML being used as well as any other relevant attributes

Must follow the syntactic guidelines of the tree data model Should have a single root element Every element must include a matching pair of

start and end tags within the start and end tags of parent element

Can be processed by generic processors that traverse the document and create an internal tree representation

Well formed XML documents can be schemaless

15

Page 16: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Documents, DTD, and XML Schema (cont’d.)

DOM (Document Object Model) A standard model with an associated set of

API Allows programs to manipulate the resulting

tree representation corresponding to a well-formed XML document

Whole document must be parsed before hand to convert the document to standard DOM internal data structure representation

16

Page 17: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Documents, DTD, and XML Schema (cont’d.)

SAX (Simple API for XML) Another API for processing of XML documents

on the fly Notifies processing program through callbacks

whenever a start or end tag is encountered Makes it easier to process large documents Allows for processing of streaming XML

documents process the tags as they are encountered also known as event-based processing

17

Page 18: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Documents, DTD, and XML Schema (cont’d.) Valid XML Documents

Document must be well formed and it must follow a particular schema

Start and end tag pairs must follow the structure specified in separate XML DTD (Document Type Definition) file or XML schema file

18

Page 19: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Documents, DTD, and XML Schema (cont’d.) XML DTD

Data types in DTD are not very general Special syntax

Requires specialized processors All DTD elements always forced to follow

the specified ordering of the document Unordered elements not permitted

19

Page 20: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML DTD First, name of root tag Then, elements and

their nested structure * after element name

means element can be repeated zero or more times

+ after element name means element can be repeated one or more times

? after element name means element can be repeated zero or one time

No symbol after element name means, must appear exactly once

20

Page 21: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML DTD (cont’d.)

Type of element is specified via parentheses following the element

Parentheses may include names of the children of the element

#PCDATA or other data types in parenthesis means a leaf node

PCDATA (Parsed Character Data) is similar to a string data type

The list of attributes can be specified via the keyword !ATTLIST

The ID type of an attribute means it can be referenced from another attribute whose type is IDREF within another element

Attributes can also be used to hold the values of simple data elements of type #PCDATA

Parentheses can be nested when specifying elements A bar symbol ( e1 | e2 ) specifies that either e1 or e2

can appear in the document

21

Page 22: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML DTD (cont’d.)

<?xml version=“1.0” standalone=“no”?><!DOCTYPE Projects SYSTEM “proj.dtd”>

standalone=“no” means the document needs to be checked against a separate DTD document or XML schema document

The separate DTD document named "proj.dtd" should be stored in the same file system as the XML document

Alternatively, we could include the DTD document text at the beginning of the XML document itself

XML DTD has several limitations: data types are not very general has its own special syntax and thus requires specialized processors all DTD elements are always forced to follow the specified ordering of

the document These drawbacks led to the development of XML schema

22

Page 23: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema

XML schema language Standard for specifying the structure of XML

documents Uses same syntax rules as regular XML

documents Same processors can be used on both

As with XML DTD, XML schema is based on tree data model, with elements and attributes as the main structuring concepts

Borrows additional concepts from database and object models, such as keys, references, and identifiers

23

Page 24: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

24

Page 25: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

25

Page 26: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema (cont’d.)

XML schema concepts: XML Descriptions and XML namespaces

<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

identifies the specific set of XML schema language elements (tags) being used by specifying a file stored at a Web site location

A commonly used standard for XML schema commands Each such definition is called an XML namespace, because it

defines the set of commands (names) that can be used File name is assigned to the variable xsd (XML schema

description) using the attribute xmlns (XML namespace), and this variable is used as a prefix to all XML schema commands (tag names)

xsd:element or xsd:sequence used later refers to the definitions of the element and sequence tags as defined in the file http://www.w3.org/2001/XMLSchema

26

Page 27: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema (cont’d.)

Annotations, documentation, language used xsd:annotation and xsd:documentation are used for

providing comments and other descriptions in the XML document.

The attribute xml:lang of the xsd:documentation element specifies the language being used, where en stands for the English language.

Elements and types the name attribute of the xsd:element tag specifies the

element name, which is called company for the root element in our example

The structure of the company root element is specified in our example as xsd:complexType

xsd:sequence structure of XML schema is used to further specify a sequence of departments, employees and projects

27

Page 28: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema (cont’d.)

First level elements Elements named employee, department, and project are

first level elements and each is specified in an xsd:element tag

If a tag has only attributes and no further subelements or data within it, it can be ended with the backslash symbol (/>) directly instead of having a separate matching end tag. It is called empty element.

Element types, minOccurs, and maxOccurs specify the type and multiplicity of each element in any

document that conforms to the schema specifications When specified as a type attribute in an xsd:element, the

structure of the element must be described separately, typically using the xsd:complexType element of XML schema. Examples: employee, department, and project elements

28

Page 29: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema (cont’d.)

If no type attribute is specified, the element structure can be defined directly following the tag, example: company root element

The minOccurs and maxOccurs tags are used for specifying lower and upper bounds similar to the *, +, and ? symbols of XML DTD.

If they are not specified, the default is exactly one occurrence.

Keys xsd:unique for specifying unique attributes xsd:selector to identify the element type that contains

the unique element xsd:field to identify the element name within it that is

unique. Examples: departmentNameUnique and projectNameUnique

xsd:key for specifying primary keys. Examples: projectNumberKey, departmentNumberKey

xsd:keyref for specifying foreign keysExample: departmentManagerSSNKeyRef

29

Page 30: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

XML Schema (cont’d.)

Structures of complex elements xsd:complexType specifies the structures of the

complex elementsExamples: Department, Employee, Project, and Dependent

If no key constraints, subelements can be embedded within parent element definition

Composite attributes Also specified as complex types Examples: Address, Name, Worker and WorksOn These could have been directly embedded within their

parent elements

30

Page 31: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Storing and Extracting XML Documents from Databases

Most common approaches for storing and extracting1. Using a DBMS to store the documents as

text Relational or object DBMS can be used to store

whole XML documents as text fields within the DBMS records or objects

Can be used if DBMS has a special module for document processing

Would work for storing schemaless and document-centric XML documents

31

Page 32: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Storing and Extracting XML Documents from Databases

2. Using a DBMS to store document contents as data elements

Would work for storing a collection of documents that follow a specific XML DTD or XML schema.

Since documents' structure is same, a relational (or object) database can be designed to store the leaf-level data elements within the XML documents.

Would require mapping algorithms to design a database schema that is compatible with the XML document structure as specified in the XML schema or DTD

And to recreate the XML documents from the stored data.

These algorithms can be implemented either as an internal DBMS module or as separate middleware that is not part of the DBMS.

32

Page 33: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Storing and Extracting XML Documents from Databases (cont’d.)3. Designing a specialized system for storing

native XML data Based on the hierarchical (tree) model Such systems are being called Native XML

DBMSs Would include specialized indexing and querying

techniques, and would work for all types of XML documents.

Could also include data compression techniques to reduce the size of the documents for storage.

Examples of popular products offering native XML DBMS capability : Tamino by Software AG Dynamic Application Platform of eXcelon Oracle also offers a native XML storage option

33

Page 34: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Storing and Extracting XML Documents from Databases (cont’d.)4. Creating or publishing customized XML

documents from preexisting relational databases

Since enormous amounts of data is already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the Web.

This approach would use a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.

34

Page 35: XML – Data Model, DTD and Schema ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)

Conclusion

Three main types of data: structured, semi-structured, and unstructured

XML standard Tree-structured (hierarchical) data model XML documents and the languages for

specifying the structure of these documents There are several options for storing and

extracting XML documents from databases

35