GRDDL: The Why, What, How, and Where
-
Upload
chimezie-ogbuji -
Category
Technology
-
view
1.686 -
download
1
Transcript of GRDDL: The Why, What, How, and Where
GRDDLGRDDLThe Why, What, How, and Where
Chimezie OgbujiCleveland Clinic Foundation
GRDDL: The AcronymGRDDL: The Acronym Gleaning Resource Descriptions (from) Dialects (of) Language
Rather long and intimidating
GRDDL: By DeconstructionGRDDL: By Deconstruction
Wordnet Definition of Glean:◦ (gather, as of natural products)◦ Synonyms: reap, harvest.
Resource Description Framework (RDF)◦ Logical assertions
Dialects of Language ◦ XML document families (XHTML, for instance)
GRDDL: By AnalogyGRDDL: By AnalogyGRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.
The WhyThe Why Vast amount of latent semantics in markup
Web content today is primarily built for human consumption
Text indexing will only get you so far for document retrieval
If machines are meant to harvest RDF from documents, reproducible protocols are needed
<span>Chimezie Ogbuji<span>
The Why (Cont.)The Why (Cont.) Microformats, eRDF, and RDFa
Specific to a particular family of documents
XHTML and HTML If the goal is machine consumption, the
bar needs to be raised beyond XHTML
The Why (Cont.)The Why (Cont.) It seems easy to forget that XHTML is
indeed an XML dialect You would think the (X) would make
that obvious What was needed was a standard way to
harvest RDF that is applicable to all XML dialects
The WhatThe What Faithful rendition Transformations GRDDL result Source documents GRDDL-aware Agents
Faithful RenditionFaithful Rendition“By specifying a GRDDL transformation, the author of a document
states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”
Licenses an author-certified interpretation of an XML document
A powerful paradigm for messaging See David Booths “RDF and SOA” http://www.w3.org/2007/01/wos-papers/booth
GRDDL TransformationsGRDDL Transformations Functions that take an XML document and
return an RDF graph Transformations can be written in any
particular language The “reference” transformation language is
XSLT “[XSLT1] is the format most widely supported by GRDDL-
aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good
safety characteristics”
Other Transformation LanguagesOther Transformation Languages “.. technically Javascript, C, or virtually any
other programming language may be used to express transformations for GRDDL”
However, these transformations need to be deterministic in order to ensure the result is a faithful rendition
Hence, they must be functions
GRDDL ResultGRDDL Result The result of applying the transformation is
an RDF serialization The RDF graph that corresponds to the
serialization is a GRDDL result of the original document
The “reference” result format is RDF/XML Other formats can be used (Turtle, N3,etc.)
GRDDL Source DocumentsGRDDL Source Documents The class of documents for which GRDDL
defines a way to extract a result graph: XML Documents XML Namespace Documents Valid XHTML XHTML Profiles
GRDDL Source DocumentsGRDDL Source Documents
GRDDL: XML DocumentsGRDDL: XML Documents GRDDL Namespace (grddl prefix)
http://www.w3.org/2003/g/data-view#
transformation attribute<?xml version=“1.0” encoding=“UTF-8”?>
<root
xmlns:grddl='http://www.w3.org/2003/g/data-view#’
grddl:transformation=“.. path to transform ..”>
… XML content ..
</root>
Namespace DocumentsNamespace Documents“Transformations can be associated not only with individual
documents but also with whole dialects that share an XML namespace”
A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)
The GRDDL result of the namespace document has a statement of the form:
?nsDoc grddl:namespaceTransformation ?txDoc
• txDoc is the location of a transformation applicable to such XML documents
Valid XHTML DocumentsValid XHTML Documents<html xmlns="http://www.w3.org/1999/xhtml">
<head
profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title>
<link rel="transformation"
href=”.. path to transformation .. " />
...
</head>
…
</html>
Refers to the GRDDL XHTML profile Licenses the interpretation of
rel=“transformation” links
XHTML ProfilesXHTML Profiles“Adding a GRDDL profileTransformation assertion to a profile
document is much like adding a namespaceTransformation assertion to a namespace document”
A GRDDL source document lives at the location of the profile URI an XHTML document
The GRDDL result of the profile document has a statement of the form:
?profileDoc grddl:profileTransformation ?txDoc
• txDoc is the location of a transformation applicable to such XML documents
The HowThe How GRDDL builds on existing XML & RDF
standards An implementation mostly needs to
orchestrate: Parsing of data representations Resolving representations from web locations The necessary XML processing to peek into and
harvest RDF from the various sources The highly recursive nature of GRDDL
Technological OverlapTechnological Overlap
Anatomy of a GRDDL Anatomy of a GRDDL Implementation: GRDDL.pyImplementation: GRDDL.py A reference implementation from scratch 650 LOC
RDFLib, 4Suite-XML, and Python control logic
A layered approach Core module that handles transformations One module per source type stacked on top of the
core A top layer that orchestrates the recursion and
identification of which ‘class’ a source document belongs to
GRDDL.py CoreGRDDL.py Core
Component StackComponent Stack
The WhereThe Where GRDDL services online:
http://triplr.org/ (Stuff in, triples out) http://www.w3.org/2007/08/grddl/ (W3C GRDDL
Service) Primary GRDDL implementations:
Redland GRDDL.py Virtuoso GRDDL Reader for Jena
RDFa is most common GRDDL source content format in the wild
Hidden Value PropositionHidden Value Proposition Supports separation of concerns:
XML for messaging, data collection, structural validation
RDF for Expressive assertions, inference, etc.
A way to invest in data richness and accessibility
GRDDL UsecasesGRDDL Usecases Embedding scheduling assertions on
personal pages Using GRDDL for extracting RDF from XML
medical record documents Cleveland Clinic use case (clinical
research) Aggregating web-based product reviews Embedding web service descriptions Adding semantic assertions to XML schemas Embedding semantic assertions to Wikis