ISO 19757 – Document Schema Definition Languages (DSDL)
description
Transcript of ISO 19757 – Document Schema Definition Languages (DSDL)
ISO 19757 - DSDL
ISO 19757 –Document Schema Definition
Languages (DSDL)
Martin Bryan
Convenor, JTC1/SC18 WG1
ISO 19757 - DSDL
Parts of DSDL
1. Overview
2. Regular-grammar-based validation (RELAX NG)
3. Rule-based validation (Schematron)
4. Namespace-based validation dispatch language (NVDL)
5. Datatypes
6. Path-based integrity constraints
7. Character repertoire validation
8. Declarative document architectures
9. Datatype- and namespace-aware DTDs
10. Validation management
ISO 19757 - DSDL
Regular-grammar-based validation (RELAX NG)
• XML description of a data model– Compact syntax is even simpler than DTDs
• Provides way of defining short-cuts– More functional than parameter entities
• Provides context-dependent models– Models can be amended when imported
• Supports namespaces and datatypes– Any datatype, including W3C Schema datatypes
• Can import modules from multiple namespaces– Can build multi-source schemas
ISO 19757 - DSDL
Main components of RELAX NG
pattern ::= <element name="QName"> pattern+ </element>| <element> nameClass pattern+ </element>| <attribute name="QName"> [pattern] </attribute>| <attribute> nameClass [pattern] </attribute>| <group> pattern+ </group>| <interleave> pattern+ </interleave>| <choice> pattern+ </choice>| <optional> pattern+ </optional>| <zeroOrMore> pattern+ </zeroOrMore>| <oneOrMore> pattern+ </oneOrMore>| <list> pattern+ </list>| <mixed> pattern+ </mixed>| <ref name="NCName"/>| <parentRef name="NCName"/>| <empty/>| <text/>| <value [type="NCName"]> string </value>| <data type="NCName"> param* [exceptPattern] </data>| <notAllowed/>| <externalRef href="anyURI"/>| <grammar> grammarContent* </grammar>
ISO 19757 - DSDL
Using the full syntax
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start> <ref name="document"/> </start> <define name="document"> <element name="document"> <ref name="head"/> <ref name="body"/> </element> </define> <define name="head"> <element name="head"> <interleave> <element name="organization"> <choice> <value>ISO</value> <value>ISO/IEC</value> </choice> </element> <element name="document-type"> <choice> <value>International Standard</value> <value>Technical Report</value> <value>Guide</value> <value>Publicly Available Specification</value> <value>Technical Specification</value> <value>International Standardized Profile</value> </choice> </element>
ISO 19757 - DSDL
Alternative compact syntax
• Can produce a whole ISO standard using just:namespace p = "http://relaxng.org/ns/proofsystem"datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"formal = element p:* { attribute * { text }*, (formal|text)* }inline &= formal*block |= formalblock |= element grammarref|rngref {attribute src { xsd:anyURI }}include "is.rnc“
• Can replace existing definitions with new one• Can extend definitions
– |= means “add this option to an existing OR group”
– &= means “add this option to an existing AND group”
• Can merge grammars
ISO 19757 - DSDL
Rule-based validation (Schematron)
• “A Schematron schema contains natural-language assertions concerning a set of documents, marked up with various elements and attributes for testing these natural-language assertions, and for simplifying and grouping the assertions.”
• “A Schematron schema reduces to a non-chaining rule system whose terms are boolean functions invoking an external query language on the instance and other visible XML documents, with syntactic features to reduce specification size and to allow efficient implementation.”
ISO 19757 - DSDL
Schematron example
<sch:rule context="failed-assert | successful-report"><sch:extends rule="second-level" /><sch:assert test="count(diagnostic-reference) + count(text)
= count(*)">The <sch:name/> element should only contain a text element
and diagnostic reference elements.</sch:assert><sch:assert test="count(text) = 1">The <sch:name/> element should only contain a text element.</sch:assert><sch:assert test="preceding-sibling::fired-rule |
preceding-sibling::failed-assert | preceding-sibling::successful-report">
A <sch:name/> comes after a fired-rule, a failed-assert or a succesful-report.
</sch:assert></sch:rule>
ISO 19757 - DSDL
Schematron core elements
• active • assert • extends • include • let • name • ns • param • pattern • phase • report • rule • schema • value-of
ISO 19757 - DSDL
Ancilliary elements and attributes
• diagnostics element• diagnostic element• dir element• emph element• p element• span element• title element
• flag attribute• fpi attribute• icon attribute• role attribute• see attribute• subject attribute
ISO 19757 - DSDL
Namespace-based ValidationDispatching Language (NVDL)
• Allows data from different namespaces to be validated by different processes– Can validate one namespace using RELAX, another using a DTD
and a third using a W3C Schema
• Simple and full syntaxes– Full syntax simplified to simple syntax before use
• All validation is done in context– Slots are created to identify where data from alternative
namespaces has been removed• Allows attributes from different namespaces to be
validated • Elements and attributes in different namespaces are separated
into separate “sections”
ISO 19757 - DSDL
NVDL example – HTML + XForms (1)
<rules xmlns="purl://dsdl.org/nvdl/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <validate schema="xhtml2.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <validate schema="xforms.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <attach message="Skipped descendant XForms sections."/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <unwrap message="Skipped descendant XHTML2 sections."/> </namespace> </mode> </validate> …
ISO 19757 - DSDL
NVDL example (2)
<unwrap> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <unwrap message="Skipped descendant XForms"/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <attach message="Any descendant XHTML2 sections"/> </namespace> </mode> </unwrap> </namespace> </mode> </validate> </namespace></rules>
ISO 19757 - DSDL
Datatypes
• Allows multiple datatype sets to be defined– W3C datatypes can be used as the base
• Will allow user-defined datatype primitives to be added– Needed for extended date/period formats, etc
• Will provide mechanism for defining complex patterns– Patterns based on supertypes will be allowed
• Normalization of values, comparing results after normalization– Convert local date formats to ISO 8601 then compare
ISO 19757 - DSDL
Possible form for Part 5
<datatype name="price">
<supertype name="decimal">
<cast>
<if test="not(sign='-')">
<copy-of select="whole-part"/>
<text>.</text>
<my:fraction-part>
<value-of select(substring(concat(fraction-part, '00'), 1,2)"/>
</my:fraction-part>
</if>
</cast>
</supertype>
</datatype>
ISO 19757 - DSDL
Path-based integrity constraints
• Non-hierarchical links between information items in a structured resource can be identified by addressing items within the document tree and then expressing the relationship between them.
• Provides a method for identifying information items dependent on ancestry or the use of keys
• And a method for describing the role of relationships that are not hierarchical
• Allows selection of fragments to be validated• Will include an extensible basis for supporting
mechanisms not currently available
ISO 19757 - DSDL
Character repertoire validation
• User-defined character sets that can be used to validate the contents of elements or attributes– Will be able to check that only characters relevant for
a particular language are used, not all those in a particular Unicode character block
• Schematron-like rules for associating character repertoires with a particular element or attribute
<sch:rule context="*[/*[@xml:lang='nl']]"> <sch:assert test="\p{IsBasicLatin}\p{IsLatin-1Supplement} IJij\p{IsGeneralPunctuation}\p{IsCurrencySymbols}"> If this document is a Dutch document, it should have only characters used in typical Dutch publishing. </sch:assert> </sch:rule>
ISO 19757 - DSDL
Declarative document architectures
• Allows locally meaningful names to be assigned to schema components– 80/20 rule allows many functions of abstract classes
• Allows predefined fragments to be defined within schema – Reintroduces entity definitions in a more controllable
form– May contain optional components
• Can even re-define entity names– No longer restricted to English-based prompts to
reference standard entity references such as
• Removing elements/attribute in defined contexts
ISO 19757 - DSDL
Datatype/Namespace-aware DTDs
• Shows how the ISO 8879/XML Document Type Definition (DTD) syntax can be extended to validate documents that make full use of XML Namespaces and Part 5 Datatypes
• May be extended to add character repertoire validation
• Will allow DTDs to be used to validate any XML document, including those defined using Part 2
• Will allow SGML documents to be treated as input to ISO 19757 validation processes
ISO 19757 - DSDL
Validation management
• Includes a mechanism to invoke parsers which read non-XML sources (and XML sources that can't be identified by a single URI) to create XML Infosets that can be used for subsequent processing
• Allows pre-validation transformations to be used to normalize and/or subset documents before validation
• Multiple validations and transformations may be applied• Transformations will be able to split a document into
multiple resulting documents• Includes facilities to generate customized validation
reports which can be output as XML document instances that can be processed by other applications
ISO 19757 - DSDL
Possible format for Part 10
<framework>
<rule>
<instance>
<transform transformation="normalize.xslt"/>
</instance>
<assert>
<isValid schema="my-schema.rng"/>
<isValid schema="my-schema.sch"/>
</assert>
</rule>
</framework>
ISO 19757 - DSDL
Current status
• Published– Part 2, RELAX-NG
• At Committee Draft stage– Part 3, Schematron– Part 4, NVDL
• Working Draft under consideration– Part 1, Overview– Part 7, Character repertoire validation– Part 8, Declarative document architectures– Part 10, Validation management
• Parts 5, 6 & 9 not yet drafted
ISO 19757 - DSDL
Tracking progress
• Via your national standards body– IST/41 at BSI
• Via XML UK or any ISUG chapter– Martin Bryan is XML UK representative on IST/41 and
ISUG representative for SC34/WG1
• Via the DSDL public website– http://www.dsdl.org