define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation...

12
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc. 1 define.xml: A Crash Course Frank DiIorio CodeCrafters, Inc. Philadelphia PA define.xml XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface metadata storage SAS Clinical Standards Toolkit XSL-FO iText JavaScript CSS HTML ODM ODM extensions XML4Pharma CDISC standard version x sponsor requests Oracle/database (the other) define.pdf old school brute fo Remember define.pdf? Purpose: document deliverables Datasets: description, structure, sort order Variables: attributes, codes, derivation, et al. Created using: Metadata, SAS macros Contents validated by: Visual inspection Programmatic checks of the metadata FDA now requests define.xml, aka CDSISC’s Case Report Tabulation Data Definition SpecificationAnd conceptually it resembles define.pdf define.xml: Dataset-Level (transformed by XSL)

Transcript of define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation...

Page 1: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

1

define.xml: A Crash Course

Frank DiIorioCodeCrafters, Inc.Philadelphia PA

define.xml

XSL

Xpath

XML Mapper

validation

define.pdf

metadata tables

define version ‘x

schema/XSD

XMLPad

metadata interface

metadata storage

SAS Clinical Standards Toolkit

XSL-FOiText

JavaScriptCSSHTML

ODM

ODM extensions

XML4Pharma

CDISC standard version ‘x

sponsor requests

Oracle/database

(the other) define.pdf

old school brute force

Remember define.pdf?

• Purpose: document deliverablesDatasets: description, structure, sort orderVariables: attributes, codes, derivation, et al.

• Created using:Metadata, SAS macros

• Contents validated by:Visual inspectionProgrammatic checks of the metadata

• FDA now requests define.xml, aka CDSISC’s“Case Report Tabulation Data DefinitionSpecification”

• And conceptually it resembles define.pdf …

define.xml: Dataset-Level (transformed by XSL)

Page 2: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

2

define.xml: Variable-Level (transformed by XSL)

define.xml: Similar, but …

• define.xml differs from define “classic”:Unlike a PDF, it is easily machine-readableIt follows a strictly defined format (schema)It’s “meatier” than define.pdf, requiring muchricher metadataRequires validation of

• syntax• compliance with schema

• Clearly, we’re dealing with something new andcomplex

This Presentation …

• Briefly reviews XML basics• Describes metadata needed to support

construction of define.xml• Presents one way to build the XML file• Shows how to validate the file• Discusses define.pdf (no, not that define.pdf!)• Focuses on define Version 1 but identifies

issues relevant to Version 2• Is simply an overview of the file creation and

validation process

Page 3: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

3

XML Basics

• Extensible Markup Language: plain text with mark-up(“tags”) similar in look & feel to HTML

• Content is user-defined, by schemas• Files are collections of elements (aka “nodes”), each of

which can have one or more attributes. Elements canbe arranged in a hierarchy.

• Unlike HTML, emphasis is on data content, not itsdisplay

• XML is part of a “family” of specificationsXSL – transforms XML into another formatXPath – navigates within the document. Used by XSL.XSD/Schema – defines rules for content and structure ofan XML file

XML Basics, Illustrated

“Study” element“OID” attribute of “Study”element

Element hierarchy: “GlobalVariables”is child of “Study”

Schema specifies whichelements can repeat

Schema specifies validattribute values

Page 4: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

4

define.xml Basics

• define.xml must be valid from two perspectives:SyntaxContent (compliance with schema)

• define schema/contentAn extension of the CDISC Operational Data Model(ODM)Schema controls content, not display

• Rules for names, attributes, number of occurrences, order ofnodes, etc.

• A value can conform to the schema but still be wrong! (e.g., typeis Integer but really should be Float)

Available at CDISC, OpenCDISC web sitesDetermining what goes where is, arguably, the hardestpart of the file creation process.

Node OrderStart of OpenCDISC XML file showing node order

What You’ll Need

• An XML Viewer/Editor (display ODM schema,define.xml, XSL) such as:

XMLpadSAS XML Mapper

• ValidatorOpenCDISCSAS Clinical Standards ToolkitXML4PharmaCan be supplemented with home-grown tools

• Knowledge and patienceW3Schools.com, other sites/books

Between the Tags: Metadata• Metadata

Drives the creation of the XMLAnd can also be used for various tasks throughout theproject life cycle (next slide)

• Metadata tables can include:Study-level: protocol name, standard name/versionDatasets: name, structure, key fieldsVariables: attributes, controlled terminology usage,derivation/CRF sourceValue: detail of variable values (test codes, etc.)Comp. algorithms: extended and/or repeated derivationsControlled terms: descriptions and values ofcoded/enumeratedResults: description of TFLs – name, content, source(s),etc. (new in define v2)

Page 5: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

5

Metadata: Usage Throughout Study Life Cycle

VariablesTable

%cre8Spec %attrib

%domSplit

%domChk

%crXFDF %xpt

%defXML

%defPDF

domain

variable

type

length

label

order

definitionProg

definitionSub

use

crflocation

core

EDC /raw program /

validate domain XPTdefine.xml/pdfStudy

setup

\study\data\prog

m’data. config. sdtm. adam

blankcrf.pdf

exportdefineotherdataset

spec

Metadata Issues

• DesignIdeally, maps (directly/views) to XML elementsand attributes with a minimum of transformationShould be sensitive to changes in standards:

• define.xml• data (SDTM, ADaM)

• StorageThe metadata should be regarded as a valuablecorporate asset.So don’t store it in Excel! Oracle or similarenterprise-level database is a far better choice(though more resource intensive).

Page 6: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

6

Metadata Issues: Entry (Dataset-Level)

Metadata Issues: Entry (Variable-Level)

Page 7: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

7

Building the XML

• Many ways to do this, among themSAS Clinical Standards ToolkitBrute force: Macros, DATA steps

• Benefits: extreme flexibility with respect to order ofdataset display, control of Comments content,selection of XSL, etc. Also, tool (macros) can performXML validation, create ZIP file of deliverables

• Drawbacks: lots of code; has to be responsive tochanges in the standards

Building (or not) the XSL

• XSL transforms XML into other formats (HTML is themost common) and makes the XML reader friendly.

• Since the define XML is in a predictable format,transformation of any file for any study can be done witha standard XSL file (the “XML Promise”)

• The XSL is identified by a reference in the XML:

<?xml version="1.0" encoding="ISO-8859-1" ?><?xml-stylesheet type="text/xsl" href=“define.xsl"?>

• Your choice:Use XSL found in the CDISC pilotsWrite your own (as with define.XML: flexibility, at the costof writing a lot of code)

A Word About XSL

• Before writing your own XSL, consider …• Different type of language: badly shaped

learning curve (for most of us)• Think about functionality to provide over and

above CDISC-supplied filesTable sorting, printingAdditional navigation (next/previous table, etc.)

• Consider whether the sponsor will accept theXSL (ActiveX, JavaScript, securityconsiderations)

Page 8: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

8

Sample XSL from Early CDISC Pilot<!-- ***************************************** --><!-- Code List Items --><!-- ***************************************** --><xsl:iftest="/odm:ODM/odm:Study/odm:MetaDataVersion/odm: CodeList[odm:CodeListItem]"> <div id="decodelist"> <xsl:for-eachselect="/odm:ODM/odm:Study/odm:MetaDataVersion/odm CodeList[odm:CodeListItem]"> <fieldset> <xsl:attribute name="id">CL.<xsl:value-of select="@OID"/></xsl:attribute> <legend>Code List - <xsl:value-ofselect="@Name"/>, Reference Name(<xsl:value-of select="@OID"/>) </legend> <table>

Syntaxresembles XML

Inclusion of “pure” HTML

The XSL can buildHTML statements

Element selection requiresknowledge of XPath

Coding of XSL can dramatically affect transformation and readability of anXML file, as shown in next slides …

define.xml: Style Sheet 1

The difference is in the HTML created by the XSL, not in the XML itself!

Page 9: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

9

define.xml: Style Sheet 2

The difference is in the HTML created by the XSL, not in the XML itself!

Did We Get It Right? Validating the XML

• Recall define.pdf v. define.xml discussion: different,more stringent and definable validation requirements

• Ensures names/values, attributes, occurrences, order ofnodes conform to the schema.

• But we can’t validate that the data makes sense!Var. length of 20 may be valid according to the schema,but if length in the dataset was >20, problem lieselsewhere

• ToolsOpenCDISCSAS Clinical Standards ToolkitXML4Pharma CDISC Define.xml CheckerHome-grown (specialized, client-requested checks)

Page 10: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

10

Validation: OpenCDISC V1.3 Ruleshttp://www.opencdisc.org/projects/validator/cdisc-define.xml-1.0-validation-rules

Level of severityis arguable!

Validation: OpenCDISC Results (Summary)

Validation report has become part ofour deliverables to the client.Inclusion of any item flagged as anError or Warning must be explained.

Page 11: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

11

Validation: OpenCDISC Results (Detail)

You’re Not Done Yet: define.pdf

• You mean define.xml• No, define.pdf – a PDF rendering of the XML• Why (oh why, oh why, …?)• How

Read the XML with SAS XML maps, then useREPORT for the various pieces (Jansen paper)

iText open source library (Java)XSL-FO (Formatting Objects) documentdescription languageOur old friend, Brute Force (next slide)

define.pdf: Brute Force, No FinessedefineXML.sasdata work.defpdf_value; set work.value; … write value-level XML …

defineXMLPDF.sas… ODS PROCLABEL, other …proc report data=work.defpdf_value;

Calling Program%setup(project=study)%defineXML(…parameters…)%defineXMLPDF(…parameters…)

Page 12: define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation define.pdf metadata tables define version ‘x schema/XSD XMLPad metadata interface

define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.

12

define.pdf: define.xml Transformed

Closing Comments

• The process to create define.xml is morecomplex than define.pdf:

New technologiesMore “moving partss” – metadata, XML, XSL, …Stringent validation

• Keys:Organizational commitmentTransparent access to robust metadataTools that facilitate flexible display (especiallyimportant to CROs)

Thank You!

Your comments are valued and encouraged:[email protected]