define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation...
Transcript of define.xml: Dataset-Level (transformed by XSL) · 2013. 9. 25. · XSL Xpath XML Mapper validation...
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
1
define.xml: A Crash Course
Frank DiIorioCodeCrafters, Inc.Philadelphia PA
define.xml
XSL
Xpath
XML Mapper
validation
define.pdf
metadata tables
define version ‘x
schema/XSD
XMLPad
metadata interface
metadata storage
SAS Clinical Standards Toolkit
XSL-FOiText
JavaScriptCSSHTML
ODM
ODM extensions
XML4Pharma
CDISC standard version ‘x
sponsor requests
Oracle/database
(the other) define.pdf
old school brute force
Remember define.pdf?
• Purpose: document deliverablesDatasets: description, structure, sort orderVariables: attributes, codes, derivation, et al.
• Created using:Metadata, SAS macros
• Contents validated by:Visual inspectionProgrammatic checks of the metadata
• FDA now requests define.xml, aka CDSISC’s“Case Report Tabulation Data DefinitionSpecification”
• And conceptually it resembles define.pdf …
define.xml: Dataset-Level (transformed by XSL)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
2
define.xml: Variable-Level (transformed by XSL)
define.xml: Similar, but …
• define.xml differs from define “classic”:Unlike a PDF, it is easily machine-readableIt follows a strictly defined format (schema)It’s “meatier” than define.pdf, requiring muchricher metadataRequires validation of
• syntax• compliance with schema
• Clearly, we’re dealing with something new andcomplex
This Presentation …
• Briefly reviews XML basics• Describes metadata needed to support
construction of define.xml• Presents one way to build the XML file• Shows how to validate the file• Discusses define.pdf (no, not that define.pdf!)• Focuses on define Version 1 but identifies
issues relevant to Version 2• Is simply an overview of the file creation and
validation process
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
3
XML Basics
• Extensible Markup Language: plain text with mark-up(“tags”) similar in look & feel to HTML
• Content is user-defined, by schemas• Files are collections of elements (aka “nodes”), each of
which can have one or more attributes. Elements canbe arranged in a hierarchy.
• Unlike HTML, emphasis is on data content, not itsdisplay
• XML is part of a “family” of specificationsXSL – transforms XML into another formatXPath – navigates within the document. Used by XSL.XSD/Schema – defines rules for content and structure ofan XML file
XML Basics, Illustrated
“Study” element“OID” attribute of “Study”element
Element hierarchy: “GlobalVariables”is child of “Study”
Schema specifies whichelements can repeat
Schema specifies validattribute values
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
4
define.xml Basics
• define.xml must be valid from two perspectives:SyntaxContent (compliance with schema)
• define schema/contentAn extension of the CDISC Operational Data Model(ODM)Schema controls content, not display
• Rules for names, attributes, number of occurrences, order ofnodes, etc.
• A value can conform to the schema but still be wrong! (e.g., typeis Integer but really should be Float)
Available at CDISC, OpenCDISC web sitesDetermining what goes where is, arguably, the hardestpart of the file creation process.
Node OrderStart of OpenCDISC XML file showing node order
What You’ll Need
• An XML Viewer/Editor (display ODM schema,define.xml, XSL) such as:
XMLpadSAS XML Mapper
• ValidatorOpenCDISCSAS Clinical Standards ToolkitXML4PharmaCan be supplemented with home-grown tools
• Knowledge and patienceW3Schools.com, other sites/books
Between the Tags: Metadata• Metadata
Drives the creation of the XMLAnd can also be used for various tasks throughout theproject life cycle (next slide)
• Metadata tables can include:Study-level: protocol name, standard name/versionDatasets: name, structure, key fieldsVariables: attributes, controlled terminology usage,derivation/CRF sourceValue: detail of variable values (test codes, etc.)Comp. algorithms: extended and/or repeated derivationsControlled terms: descriptions and values ofcoded/enumeratedResults: description of TFLs – name, content, source(s),etc. (new in define v2)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
5
Metadata: Usage Throughout Study Life Cycle
VariablesTable
%cre8Spec %attrib
%domSplit
%domChk
%crXFDF %xpt
%defXML
%defPDF
domain
variable
type
length
label
order
definitionProg
definitionSub
use
crflocation
core
EDC /raw program /
validate domain XPTdefine.xml/pdfStudy
setup
\study\data\prog
m’data. config. sdtm. adam
blankcrf.pdf
exportdefineotherdataset
spec
Metadata Issues
• DesignIdeally, maps (directly/views) to XML elementsand attributes with a minimum of transformationShould be sensitive to changes in standards:
• define.xml• data (SDTM, ADaM)
• StorageThe metadata should be regarded as a valuablecorporate asset.So don’t store it in Excel! Oracle or similarenterprise-level database is a far better choice(though more resource intensive).
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
6
Metadata Issues: Entry (Dataset-Level)
Metadata Issues: Entry (Variable-Level)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
7
Building the XML
• Many ways to do this, among themSAS Clinical Standards ToolkitBrute force: Macros, DATA steps
• Benefits: extreme flexibility with respect to order ofdataset display, control of Comments content,selection of XSL, etc. Also, tool (macros) can performXML validation, create ZIP file of deliverables
• Drawbacks: lots of code; has to be responsive tochanges in the standards
Building (or not) the XSL
• XSL transforms XML into other formats (HTML is themost common) and makes the XML reader friendly.
• Since the define XML is in a predictable format,transformation of any file for any study can be done witha standard XSL file (the “XML Promise”)
• The XSL is identified by a reference in the XML:
<?xml version="1.0" encoding="ISO-8859-1" ?><?xml-stylesheet type="text/xsl" href=“define.xsl"?>
• Your choice:Use XSL found in the CDISC pilotsWrite your own (as with define.XML: flexibility, at the costof writing a lot of code)
A Word About XSL
• Before writing your own XSL, consider …• Different type of language: badly shaped
learning curve (for most of us)• Think about functionality to provide over and
above CDISC-supplied filesTable sorting, printingAdditional navigation (next/previous table, etc.)
• Consider whether the sponsor will accept theXSL (ActiveX, JavaScript, securityconsiderations)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
8
Sample XSL from Early CDISC Pilot<!-- ***************************************** --><!-- Code List Items --><!-- ***************************************** --><xsl:iftest="/odm:ODM/odm:Study/odm:MetaDataVersion/odm: CodeList[odm:CodeListItem]"> <div id="decodelist"> <xsl:for-eachselect="/odm:ODM/odm:Study/odm:MetaDataVersion/odm CodeList[odm:CodeListItem]"> <fieldset> <xsl:attribute name="id">CL.<xsl:value-of select="@OID"/></xsl:attribute> <legend>Code List - <xsl:value-ofselect="@Name"/>, Reference Name(<xsl:value-of select="@OID"/>) </legend> <table>
Syntaxresembles XML
Inclusion of “pure” HTML
The XSL can buildHTML statements
Element selection requiresknowledge of XPath
Coding of XSL can dramatically affect transformation and readability of anXML file, as shown in next slides …
define.xml: Style Sheet 1
The difference is in the HTML created by the XSL, not in the XML itself!
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
9
define.xml: Style Sheet 2
The difference is in the HTML created by the XSL, not in the XML itself!
Did We Get It Right? Validating the XML
• Recall define.pdf v. define.xml discussion: different,more stringent and definable validation requirements
• Ensures names/values, attributes, occurrences, order ofnodes conform to the schema.
• But we can’t validate that the data makes sense!Var. length of 20 may be valid according to the schema,but if length in the dataset was >20, problem lieselsewhere
• ToolsOpenCDISCSAS Clinical Standards ToolkitXML4Pharma CDISC Define.xml CheckerHome-grown (specialized, client-requested checks)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
10
Validation: OpenCDISC V1.3 Ruleshttp://www.opencdisc.org/projects/validator/cdisc-define.xml-1.0-validation-rules
Level of severityis arguable!
Validation: OpenCDISC Results (Summary)
Validation report has become part ofour deliverables to the client.Inclusion of any item flagged as anError or Warning must be explained.
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
11
Validation: OpenCDISC Results (Detail)
You’re Not Done Yet: define.pdf
• You mean define.xml• No, define.pdf – a PDF rendering of the XML• Why (oh why, oh why, …?)• How
Read the XML with SAS XML maps, then useREPORT for the various pieces (Jansen paper)
iText open source library (Java)XSL-FO (Formatting Objects) documentdescription languageOur old friend, Brute Force (next slide)
define.pdf: Brute Force, No FinessedefineXML.sasdata work.defpdf_value; set work.value; … write value-level XML …
defineXMLPDF.sas… ODS PROCLABEL, other …proc report data=work.defpdf_value;
Calling Program%setup(project=study)%defineXML(…parameters…)%defineXMLPDF(…parameters…)
define.xml: A Crash Course Frank DiIorio, CodeCrafters, Inc.
12
define.pdf: define.xml Transformed
Closing Comments
• The process to create define.xml is morecomplex than define.pdf:
New technologiesMore “moving partss” – metadata, XML, XSL, …Stringent validation
• Keys:Organizational commitmentTransparent access to robust metadataTools that facilitate flexible display (especiallyimportant to CROs)
Thank You!
Your comments are valued and encouraged:[email protected]