Applied XML Programming for Microsoft .NETPART 3
XML Data Validation1. The correctness of XML documents can be measured using two distinct and complementary
metrics: the well-formedness of the document and the validity.
1. Well-formedness of the document refers to the overall syntax of the document. Validation applies at a deeper level and involves the semantics of the document, which must be compliant with a userdefined layout
The XmlTextReader class ensures only that the document being processed is
syntactically correct. By design, the XmlTextReader class deliberately avoids making a
more advanced analysis of the nodes in the document and checking their internal
dependencies. A more specialized class is available in the Microsoft .NET Framework
for accomplishing this more complex task—the XmlValidatingReader class. This
chapter will focus on techniques and classes available in the .NET Framework to
perform validation on XML data.
Although validation is a key aspect in projects that involve critical document exchange
across heterogeneous platforms, it does come at a price. Validating a document means
taking a while to analyze the constituent nodes; the number, type, and values of their
attributes; and the node-to-node dependencies. When applications handle a fully
validated document, they can be certain not only about the overall syntax but even
about the contents. In a normal XML document, a node simply represents itself—a
rather generic repository of hierarchical information. In a validated XML document, on
the other hand, the same node to the application's eye represents a strongly typed and
strongly defined piece of information. Basically, in a validated document, a node
<invoice_number> ceases to be a node and becomes what it was intended to be—the
number of the invoice.
Clearly, a nonvalidating reader (and, more generally, a nonvalidating XML parser) will
run faster than a validating reader, and that's why XML parsers usually provide XML
validation as an option that can be programmatically toggled on and off. In .NET
applications, you use XmlTextReader if you simply need well-formedness; you resort to
XmlValidatingReader if you need to validate the schema of the document.
The XmlValidatingReader ClassThe XmlValidatingReader class is an implementation of the XmlReader class that
provides support for several types of XML validation: document type definitions (DTDs),
XML-Data Reduced (XDR) schemas, and XML Schemas. The XML Schema language
is also referred to as XML Schema Definition (XSD). DTD and XSD are official
recommendations issued by the W3C, whereas XDR is simply the Microsoft
implementation of an early working draft of XML Schemas that will be superseded by
XSD as time goes by
You can use the XmlValidatingReader class to validate entire XML documents as well
as XML fragments. An XML fragment is a string of XML code that does not have a root
node. For example, the following XML string turns out to be a valid XML fragment but
not a valid XML document. XML documents must have a root node.
<firstname>Dino</firstname>
<lastname>Esposito</lastname>
The XmlValidatingReader class works on top of an XML reader—typically an instance
of the XmlTextReader class. The text reader is used to walk through the nodes of the
document, and then the validating reader gets into the game, validating each piece of
XML based on the requested validation type.
Supported Validation TypesWhat are the key differences between the validation mechanisms (DTD, XDR, and
XSD) supported by the XmlValidatingReader class?
DTD A DTD is a text file whose syntax stems directly from the Standard Generalized Markup Language (SGML)—the ancestor of XML as we know it today.
XDR XDR is a schema language based on a proposal submitted by Microsoft to the W3C back in
1998. (For more information, see http://www.w3.org/TR/1998/NOTE-XML-data-0105.) XDRs are
flexible and overcome some of the limitations of DTDs.
XSD XSD defines the elements and attributes that form an XML
document. Each element is strongly typed. Based on a W3C
recommendation, XSD describes the structure of XML documents using
another XML document
DTD was considered the cross-platform standard until a couple of years ago. Then the
W3C officialized a newer standard—XSD—which is, technically speaking, far superior
to DTD.
The XmlValidatingReader Programming InterfaceThe XmlValidatingReader class inherits from the base class XmlReader but implements
internally only a small set of all the functionalities that an XML reader exposes. The
class always works on top of an existing XML reader, and many methods and
properties are simply mirrored
The dependency of validating readers on an existing text reader is particularly evident if
you look at the class constructors. An XML validating reader, in fact, can't be directly
initialized from a file or a URL. The list of available constructors comprises the following
overloads:
public XmlValidatingReader(XmlReader);
public XmlValidatingReader(Stream, XmlNodeType,
XmlParserContext);
public XmlValidatingReader(string, XmlNodeType,
XmlParserContext);
Different Treatments for XSD and XDRAlthough you can store both XSD and XDR schemas in the schema collection, there
are some differences in the way in which the XmlSchemaCollection object handles
them internally. For example, the Add method returns an XmlSchema object if you add
an XSD schema but returns null if the added schema is an XDR. In general, any
method or property that manipulates the input or output of an XmlSchema object
supports XSD schemas only.
Another difference concerns the behavior of the Item property in the
XmlSchemaCollection class. The Item property takes a string representing the
schema's namespace URI and returns the corresponding XmlSchema object. This
happens only for XSDs, however. If you call the Item property on a namespace URI that
corresponds to an XDR schema, null is returned.
The reason behind the different treatments for XDR and XSD schemas is that XDR
schemas have no object model available in the .NET Framework, so when you need to
handle them through objects, the system gracefully ignores the requests.
XDR schemas are there only to preserve backward compatibility; you will not find them
supported outside the Microsoft Win32 platform. It is important to pay attention to the
methods and the properties you use to manage XDR in your code. The overall
programming interface makes the effort to unify the methods and the properties to work
on both XDRs and XSDs. But in some circumstances, those same methods and
properties might lead to unpleasant surprises.
In a nutshell, you can cache an XDR schema for further and repeated use by the
XmlValidatingReader class, but that's all that you can do. You can't check for the
existence of XDR schemas, nor can a reference to an XDR schema be returned. But
you can do this, and more, for XSDs.
Validating XML Fragmentsthe XmlValidatingReader class has the ability to parse and validate entire documents as well as XML fragments
Using DTDsThe DTD validation guarantees that the source document complies with the validity
constraints defined in a separate file—the DTD. A DTD file uses a formal grammar to
describe both the structure and the syntax of XML documents. XML authors use DTDs
to narrow the set of tags and attributes allowed in their documents. Validating against a
DTD ensures that processed documents conform to the specified structure. From a
language perspective, a DTD defines a newer and stricter XML-based syntax and a
new tagged language tailor-made for a related group of documents.
Developing a DTD GrammarLet's look more closely at a DTD file. To build a DTD, you normally start writing the file
according to its syntax. In this case, however, we'll start from an XML file named
data_dtd.xml that will actually be validated through the DTD, as shown here:
<?xml version="1.0" ?>
<!DOCTYPE class SYSTEM "class.dtd">
<!-- Sample XML document (data_dtd.xml) using a DTD -->
<class title="Applied XML Programming for .NET"
company="DinoEsposito's Own Company"
author="Dino Esposito">
<days total="5" expandable="true">
<day id="1">XML Core Classes</day>
<day id="2">Related Technologies</day>
<day id="3">XML and ADO.NET</day>
<day id="4" optional="true">XML and Applications</day>
<day id="5" optional="true">XML Interoperability</day>
</days>
</class>
general information about the class (title, author, training company) are written using
attributes. Each module spans a full day, and its description is implemented using plain
text.
Any XML document that must be validated against a given DTD file includes a
DOCTYPE tag through which it simply links to the DTD of choice, as shown here:
<!DOCTYPE class SYSTEM "class.dtd">
The following listing demonstrates a DTD that is tailor-made for the preceding XMLdocument:
<!ELEMENT class (days)>
<!ATTLIST class title CDATA #REQUIRED
author CDATA #IMPLIED
company CDATA #IMPLIED>
<!ENTITY % Boolean "true | false">
<!ELEMENT days (day*)>
<!ATTLIST days total CDATA #REQUIRED
expandable (%Boolean;) #REQUIRED>
<!ELEMENT day (#PCDATA)>
<!ATTLIST day id CDATA #REQUIRED
optional (%Boolean;) #IMPLIED>
Certainly XSDs provide you with more functions than DTDs can. For one thing,
schemas are all written in XML and don't require you to learn a new language. If you
look at our basic DTD example in this context, you might not be scared by its unusual
format. As you move from textbook examples and enter the tough real world, the
complexity of an inflexible language like DTD becomes more apparent.
XSDs provide you with a finer level of control over the cardinality of the tags and the
attribute types. In addition, XSDs can be used to set up a system of schema inheritance
in which more complex types are built atop existing ones.
Using XDR SchemasAs mentioned, XML-Data Reduced (XDR) schema validation is the result of a Microsoft
implementation of an early draft of what today is XSDs. XDR was implemented for the
first time in the version of MSXML that shipped with Microsoft Internet Explorer 5.0,
back in the spring of 1999.
In the XDR schema specification, you'll find almost all of the ideas that characterize
XSDs today. The main reason for XDR support in the .NET Framework is backward
compatibility with existing MSXML-based applications. To enable these applications to
upgrade properly to the .NET Framework, XDR support has been retained intact. You
will not find XDR support anywhere else outside the Microsoft Windows platform,
however.
If you have used Microsoft ActiveX Data Objects (ADO), and in particular the library's
ability to persist the contents of a Recordset object to XML, you are probably a veteran
of XDR. In fact, the XML schema used to persist ADO 2.x Recordset objects to XML is
simply XDR.
What Is a SchemaA schema is an XML file (with typical extension .xsd) that describes the syntax and
semantics of XML documents using a standard XML syntax. An XML schema specifies
the content constraints and the vocabulary that compliant documents must
accommodate. For example, compliant documents must fulfill any dependencies
between nodes, assign attributes the correct type, and give child nodes the exact
cardinality.
The XML Schema specification is articulated into two distinct parts. Part I contains the
definition of a grammar for complex types—that is, composite XML elements. Part II
describes a set of primitive types—the XML type system—plus a grammar for creating
new primitive types, said to be simple types. New types are defined in terms of existing
types.
An XML schema also supports rather advanced and object-oriented concepts such as
type inheritance. In the .NET Framework, the SOM provides a suite of classes held in
the System.Xml.Schema namespace to read a schema from an XSD file. These
classes also enable you to programmatically create a schema that can be either
compiled in memory or written to a disk file.
Simple and Complex Types
XML simple types consist of plain text and don't contain any other elements. Examples
of simple types are string, date, and various flavors of numbers (long, double, and
integer). XML complex types can include child elements and attributes. In practice, a
complex type is always rendered as an XML subtree. A complex type can be
associated only with an XML element node, whereas a simple type applies to both
elements and attributes.
structure of the XSD type system
Defining an XSD SchemaYou have three options when creating an XSD schema. You can write it manually by
combining the various tags defined by the XML Schema specification. A more effective
option is represented by Visual Studio .NET, which provides a visual editor for XSD files
with full IntelliSense support. The third option is based on the XML Schema definition
tool (xsd.exe) mentioned in the previous section, which can infer the underlying schema
from any well-formed XML document.
Setting Up a Sample SchemaLet's start by creating a simple schema to describe an address. Like many realworld
objects, an address too is rendered using a complex type—a kind of XML data
structure. The following code shows the schema for an address. It's a fairly simple
schema consisting of a sequence of five elements: street, number, city, state and zip,
plus an attribute named country
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="address" type="AddressType" />
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string" />
<xs:element name="number" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:string" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" />
</xs:complexType>
</xs:schema>
Linking Documents and SchemasYou might want to know how an XML document can link to the schema. An XML
schema can be associated with document files in two ways: as in-line code or through
external references. The second option decouples the document instance and the
schema. The first option, on the other hand, simplifies deployment and data
transportation because all information resides in a single place.
XML validation is the parser's ability to verify that a given XML source document is
comformant to a specified layout. The intrinsic importance of validation, and related
technologies, can't be denied, but a few considerations must be kept in mind.
For one thing, XML documents and schema information must be distinct elements. This
improves performance when the document is transferred over the wire and keeps the
memory footprint as lean as possible. In addition, validating a document to make sure it
has the requested layout is not always necessary if the correctness of the data two
applications exchange can be ensured by design. If the documents sent and received
are generated programmatically and there is no (reasonable) way to hack them,
validation can be an unneeded burden. In this case, you can rate the schema
information as similar to debug information in Win32 executables: useful to speed up
the development cycle, but useless in a production environment.
The real big thing behind XML validation is XSD—a W3C specification to define the
structure, contents, and semantics of XML documents. XSD is another key element that
enriches the collection of official and de facto current standards for interoperable
software. It joins the group formed by HTTP for network transportation, XML for data
description, SOAP for method invocation, XSL for data transformation, and XPath for
queries.
With XSD, we have a standard but extremely rigorous way to describe the layout of the
document that leaves nothing to the user's imagination. XSD is the constituent
grammar for the XML type system, and thanks to the broad acceptance gained by XML,
it is a candidate to become a universal and cross-platform type system
Further ReadingXML sprang to life in the late 1990s as a metalanguage scientifically designed to definitively push aside SGML. If you want to learn more
about this ancestor of XML, still in use in some legacy e-commerce applications, have a look at the tutorial available at
http://www.w3.org/TR/WD-html40-970708/intro/sgmltut.html.
In this chapter and in this book, you won't find detailed references to the syntax and structure of XML technologies. If you need to know all about DTD attributes and XSD components, you'll need to look elsewhere. One resource that I've found extremely valuable is Essential XML Quick Reference, written by Aaron Skonnard and Martin
Gudgin (Addison Wesley, 2001). This book is an annotated review of all the markup code around XML, including XSD, XSL, XPath, and SOAP—not coincidentally, the same XML standards fully supported by the .NET Framework. Another resource I would recommend is XML Pocket Consultant, written by William R. Stanek (Microsoft Press,
2002). For online resources, check out in particular http://www.xml.com.
Top Related