Schema Strategy

9
OCLC - XML Schema Strategy DRAFT Purpose of this document: We use XML schemas to define the message payloads that can be exchanged between clients and our services. This document describes how these schemas shall be handled, for example where they should be stored and how they can be imported into development projects where needed. Why is this necessary? Currently there is no OCLC-wide agreement on how XML schemas are maintained and having a centralized schema repository creates some advantages like: 1. Helping to eliminate redundant schemas, for example identical copies of schemas controlled as part of the source in multiple software projects (rather than being treated like a dependency). 2. Helping to eliminate redundant data type definitions in new schemas by promoting (requiring) reuse of data types already defined in schemas in the central repository. 3. Helping to eliminate redundant element definitions. Element types like for example language, currency and countries should be defined only once and reused wherever they are needed in other XML schemas. Schemas as archetypes and dependencies Ideally, all new OCLC XML schemas should be controlled in their own sub-project of the XMLSchemas Subversion project . Each sub-project creates its own artifact; this artifact can then be used as a dependency by software projects and other schema projects. Having sub-projects that produce versioned release artifacts (schema versions with unique URLs) allows each schema to be versioned/released independently. Each schema version can then be defined as a Maven dependency, ie: <dependency> <groupId>org.oclc.schemas<groupId> <artifactId>MySchema</artifactId> <version>1.1<version> </dependency>

description

Writing good schemas

Transcript of Schema Strategy

OCLC - XML Schema Strategy DRAFT

Purpose of this document:We use XML schemas to define the message payloads that can be exchanged between clients and our services. This document describes how these schemas shall be handled, for example where they should be stored and how they can be imported into development projects where needed.Why is this necessary? Currently there is no OCLC-wide agreement on how XML schemas are maintained and having a centralized schema repository creates some advantages like:1. Helping to eliminate redundant schemas, for example identical copies of schemas controlled as part of the source in multiple software projects (rather than being treated like a dependency).2. Helping to eliminate redundant data type definitions in new schemas by promoting (requiring) reuse of data types already defined in schemas in the central repository.3. Helping to eliminate redundant element definitions. Element types like for example language, currency and countries should be defined only once and reused wherever they are needed in other XML schemas.Schemas as archetypes and dependenciesIdeally, all new OCLC XML schemas should be controlled in their own sub-project of the XMLSchemas Subversion project. Each sub-project creates its own artifact; this artifact can then be used as a dependency by software projects and other schema projects. Having sub-projects that produce versioned release artifacts (schema versions with unique URLs) allows each schema to be versioned/released independently. Each schema version can then be defined as a Maven dependency, ie: org.oclc.schemas MySchema 1.1

As schema releases are deployed as Maven artifacts, their official public URIs follow the pattern:http://worldcat.org/xmlschemas/SchemaName/version/SchemaName-version.xsd

Note: The URIs for pre-release or SNAPSHOT versions of XML schemas will reference internal OCLC Maven repositories. Worldcat.org is a public-facing, read-only, partial mirror of the OCLC enterprise Archiva repository: http://svn.dev.oclc.org:10000/archiva/index.action, so when an XMLSchemas sub-project is released and the schema artifact is deployed to that repository, it becomes available via http://worldcat.org/xmlschemas/ (it gets pulled over on the first external access request).Creating a schema projectNote: The archetype needs to be adjusted to use the schema root.To create a new schema sub-project under XMLSchemas it needs to be requested via ServiceNow.E.g. XMLSchemas/foo After the project has been created, check out the XML schema project to directory foo.svn co http://svn.dev.oclc.org/svn/XMLSchemas/foo foo Create a Maven project from the XMLSchema archetype:mvn archetype:generate \-DarchetypeRepository=http://artifactory.dev.oclc.org/artifactory/development-internal \-DarchetypeGroupId=org.oclc.maven.archetypes \-DarchetypeArtifactId=standalone-schema-archetype \-DarchetypeVersion=1.0-SNAPSHOT \-Dversion=1.0-SNAPSHOT \-DgroupId=org.oclc.schemas \-DartifactId=foo

Put the (valid) foo.xsd into foo/trunk/src/main/schemas/foo.xsd.Create a (valid) foo/trunk/src/main/resources/fooExample.xml instance document.Tweak trunk/pom.xml.Run mvn test to verify that the project builds and things are valid.Run svn add pom.xml src, svn propset svn:ignore F/tmp/foo.(/tmp/foo contains list of glob patterns to ignore like .idea and *.iml), and svn commit m whatever.Run mvn deploy goal and verify artifact in Artifactory.After significant testing, publish 1.0 release to Archiva by using Self Service, see later step.The XMLSchemas project structureIf a sub-project is created within the XMLSchemas repository, there are two structures to be aware of, the Maven project structure and the file directory structure.Note: Sub-projects can be created via a Maven archetype: http://artifactory.dev.oclc.org/artifactory/development-internal/org/oclc/maven/archetypes/standalone-schema-archetype/1.0-SNAPSHOT/standalone-schema-archetype-1.0-SNAPSHOT.pom

POM file structureAll schema sub-projects should define the schema root project as their parent. The root project contains all overall Maven project settings like plugins and versions that are common for all the schema projects.Comment by Jon Fausey: This needs to be discussed in more depth (outside the text of this document by the authors). org.oclc.schemas SchemaRoot 1.1

The Maven project structure as a tree looks like this: - Schema Root Project-Schema Project-Any optional child projects, e.g. when nested schemas are usedFile directory structureThe file directory structure is the same like in any other SVN directory. The trunk contains the current SNAPSHOT version, and the releases can be found within branches and tags. -schema project -trunk -src -main -resources (for test data) -schemas (for the schema files)Working with namespacesEvery service schema should define its own unique namespace identifier following the pattern: http://worldcat.org/xmlschemas/MyNamespaceID where MyNamespaceID is a meaningful association to the service.The namespace definition of a schema ready for release should look like this:

The pom.version placeholder will be replaced by Maven during the build process. Also it is important that the schemas file name and the namespace are consistent, meaning they should be the same.Working with importsWhen working with nested schemas it becomes necessary to define import or include or redefine statements within the schemas. If the schema that is imported is not released yet, it needs to be imported from its snapshot repository location rather than the release repository location (http://worldcat.org/xmlschemas/...). This is done using a schemaLocation attribute value. Note that the namespace attribute value URI should always follow the http://worldcat.org/xmlschemas/... pattern, even during the pre-release (snapshot) phase of the schemas development.

Before or after the imported schema has been released, depending on the schema, the import must be switched to the worldcat.org schema location.

Schema versioning:The basic schema version numbering should be in the form of:A.B[-SNAPSHOT]Where A is the major version number, and it only changes when a significant change to the service is made. B is the minor version number, it changes when a minor enhancement to the service requires a minor change to the schema. SNAPSHOT is used during the development phase and marks a schema as a work in progress.The Common Schema ProjectsWithin XMLSchemas there are so called Common projects. These projects contain base schemas or base element type definitions to be reused within other schemas. (These are not yet fully mature, widely vetted, or cmoprehensive, but they are a good start.) This eliminates the need to maintain redundant information (type and element definitions) throughout other schema projects and allows enterprise-level changes to be made in minimal set of schemas.Common SchemasCommon schemas are like abstract java classes, they can be used as template to inherit other schemas from them.Common TypesCommon types can be reused and extended in multiple schemas, and have the same meaning all over OCLC. Types as Currency, Language and Countries are a good example.Schema guidelinesSome guidelines for writing good schema definitions: To have only one root element in your schema avoid using rel="" to reference other elements. Work with types where you can. (Avoid inlining a type definition anonymously within an element definition.) Document your elements with annotation/documentation. Do NOT control copies of XML schemas in other projects!!! (This allows them to drift out of sync without people realizing it.) Include them as proper dependencies and use their URLs.The schema release processTo release schema projects you need them set up in the Middleware Self Service tool. This is a straight forward process:1. Login to the Middleware Self-Service System2. Select the schema you want to release.

3. Create a branch and build the project.

4. In the branch change all the import statements from using the Archiva location to use worldcat.org (if there are any)5. Tag the branch

After these steps the schema is available under worlcat.org/xmlschemas.Importing schemas to service projectsThere are multiple ways to import schemas to service projects. The simplest would be to define a Maven dependency, but most certainly the schema will be used by tools like JAXB to generate binding classes.This can be done easily with Maven and the maven-jaxb2-plugin. Basically there are three ways to do this. The first two work pretty well for simple not nested schemas (compare http://confluence.highsource.org/display/MJIIP/User+Guide).Compiling Schema from a URL true http://worldcat.org/xmlschemas/MySchemaName/${version}/MySchemaName-${version}.xsd

Compiling (generating code from) the schema from a Maven artifact true org.oclc.schemas MySchema ${project.version} MySchema.xsd

If nested schemas are used, things get a bit more complicated as they need to be compiled in the reversed order of import. Meaning the top level schema is compiled last, and the base schema that contains no imports is compiled first. This can be done with a two step process:1. Import the artifact with the maven remote resources plugin org.apache.maven.plugins maven-remote-resources-plugin process generate-sources process org.oclc.schemas.MySchema:MySchema:${version}

2. Create an execution step for every schema and pipe the created episode file to the next execution step. org.jvnet.jaxb2.maven2 maven-jaxb2-plugin Common generate-sources generate 2.2 true ${project.build.directory}/maven-shared-archive-resources/schemas MySchema.xsd ${project.basedir}/src/main/xsd MyBindingFile.xjb true true -Xannotate -XhashCode -Xequals -XtoString ${project.build.directory}/xjc/my.episode org.jvnet.jaxb2_commons jaxb2-basics-annotate ${jaxb2BasicsVersion} org.jvnet.jaxb2_commons jaxb2-basics ${jaxb2BasicsVersion} -Xannotate -XhashCode -Xequals -XtoString -b ${project.build.directory}/xjc/my.episode .