RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful...

47
RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ACS RDF Symposium, Boston, August 2010 Ideaconsult Ltd., 4 A. Kanchev str. Sofia 1000, Bulgaria

Transcript of RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful...

Page 1: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

RESTful RDF Web Services for

Predictive Toxicology

Dr Nina Jeliazkova

ACS RDF Symposium Boston August 2010

Ideaconsult Ltd 4 A Kanchev str Sofia 1000 Bulgaria

Develops and maintains open source software applications

bull Toxtree 210 ndash toxic hazard

estimation 12 modules

bull Toxmatch 106 ndash A chemical

similarity evaluation tool

bull Online QMRF repository

httpqsardbjrcit

bull Ambit Ambit XT

bull httpambitsourceforgenet

bull Partner in OpenTox FP7 project

bull Partner in CADASTER FP7

project

Ideaconsult Ltd2

bull Objective to develop a framework that provides an unified access tondash toxicity data

ndash predictive models

ndash procedures supporting validation and additional information that helps with the interpretation of predicted results

bull European CommisionFramework Programme 7 HEALTH-2007-133

bull 11 partners

OpenTox project

httpwwwopentoxorg

Ideaconsult Ltd3

Why integration framework for predictive

toxicology

August 22 2010 Ideaconsult Ltd4

bull What we would like to do

ndash Build use validate and compare multiple

models

ndash Reliable reproduce models from the literature

ndash Merge data from different sources (files

databases)

ndash Find all models available for certain endpoint

ndash More hellip

bull Challengesndash Chemical structures

bull Might be ambiguous

bull Might be error prone or time consuming to reproduce from publications

ndash Data bull Multiple formats

bull Implicit semantics often buried in human readable documentation only

ndash Modelsbull Tens of thousands available in software or in publications

bull Multiple software solutions mostly incompatible

bull Predictions reproducibility is time consuming and often hard to achieve

bull Automatic comparison of prediction results difficult

Why integration framework for predictive

toxicology

Ideaconsult Ltd5

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 2: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Develops and maintains open source software applications

bull Toxtree 210 ndash toxic hazard

estimation 12 modules

bull Toxmatch 106 ndash A chemical

similarity evaluation tool

bull Online QMRF repository

httpqsardbjrcit

bull Ambit Ambit XT

bull httpambitsourceforgenet

bull Partner in OpenTox FP7 project

bull Partner in CADASTER FP7

project

Ideaconsult Ltd2

bull Objective to develop a framework that provides an unified access tondash toxicity data

ndash predictive models

ndash procedures supporting validation and additional information that helps with the interpretation of predicted results

bull European CommisionFramework Programme 7 HEALTH-2007-133

bull 11 partners

OpenTox project

httpwwwopentoxorg

Ideaconsult Ltd3

Why integration framework for predictive

toxicology

August 22 2010 Ideaconsult Ltd4

bull What we would like to do

ndash Build use validate and compare multiple

models

ndash Reliable reproduce models from the literature

ndash Merge data from different sources (files

databases)

ndash Find all models available for certain endpoint

ndash More hellip

bull Challengesndash Chemical structures

bull Might be ambiguous

bull Might be error prone or time consuming to reproduce from publications

ndash Data bull Multiple formats

bull Implicit semantics often buried in human readable documentation only

ndash Modelsbull Tens of thousands available in software or in publications

bull Multiple software solutions mostly incompatible

bull Predictions reproducibility is time consuming and often hard to achieve

bull Automatic comparison of prediction results difficult

Why integration framework for predictive

toxicology

Ideaconsult Ltd5

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 3: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

bull Objective to develop a framework that provides an unified access tondash toxicity data

ndash predictive models

ndash procedures supporting validation and additional information that helps with the interpretation of predicted results

bull European CommisionFramework Programme 7 HEALTH-2007-133

bull 11 partners

OpenTox project

httpwwwopentoxorg

Ideaconsult Ltd3

Why integration framework for predictive

toxicology

August 22 2010 Ideaconsult Ltd4

bull What we would like to do

ndash Build use validate and compare multiple

models

ndash Reliable reproduce models from the literature

ndash Merge data from different sources (files

databases)

ndash Find all models available for certain endpoint

ndash More hellip

bull Challengesndash Chemical structures

bull Might be ambiguous

bull Might be error prone or time consuming to reproduce from publications

ndash Data bull Multiple formats

bull Implicit semantics often buried in human readable documentation only

ndash Modelsbull Tens of thousands available in software or in publications

bull Multiple software solutions mostly incompatible

bull Predictions reproducibility is time consuming and often hard to achieve

bull Automatic comparison of prediction results difficult

Why integration framework for predictive

toxicology

Ideaconsult Ltd5

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 4: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Why integration framework for predictive

toxicology

August 22 2010 Ideaconsult Ltd4

bull What we would like to do

ndash Build use validate and compare multiple

models

ndash Reliable reproduce models from the literature

ndash Merge data from different sources (files

databases)

ndash Find all models available for certain endpoint

ndash More hellip

bull Challengesndash Chemical structures

bull Might be ambiguous

bull Might be error prone or time consuming to reproduce from publications

ndash Data bull Multiple formats

bull Implicit semantics often buried in human readable documentation only

ndash Modelsbull Tens of thousands available in software or in publications

bull Multiple software solutions mostly incompatible

bull Predictions reproducibility is time consuming and often hard to achieve

bull Automatic comparison of prediction results difficult

Why integration framework for predictive

toxicology

Ideaconsult Ltd5

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 5: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

bull Challengesndash Chemical structures

bull Might be ambiguous

bull Might be error prone or time consuming to reproduce from publications

ndash Data bull Multiple formats

bull Implicit semantics often buried in human readable documentation only

ndash Modelsbull Tens of thousands available in software or in publications

bull Multiple software solutions mostly incompatible

bull Predictions reproducibility is time consuming and often hard to achieve

bull Automatic comparison of prediction results difficult

Why integration framework for predictive

toxicology

Ideaconsult Ltd5

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 6: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Framework design rationales

Ideaconsult Ltd6

User Requirements Software Requirements

Umambiguous data formal way of representing information about data

Unambiguous access well-defined interfaces

Transparency of

computational tools

formal way of representing information about

methods well-defined interfaces

Variety of user groups simplicity and modularity of design

Need to integrate various

resources (eg databases

prediction methods

models hellip) to make

meaningful predictions

distributed architecture interoperability

Need to integrate

biological information

again modularity of design extensibility

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 7: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

The framework

bull OpenTox API

ndash The way applications talk to each other

ndash The way developers talk to applications

ndash httpopentoxorgdevapisapi-11

bull The basic building blocks

ndash data chemical structures algorithms

and models

bull Functionality offered

ndash build models

ndash apply models

ndash validate models

ndash access and query data in various ways

bull Technologies

ndash REST style web services

ndash RDF for description of resources

ndash Links to existing and newly developed

ontologies (mainly to describe

metadata) about resources

Ideaconsult Ltd7

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Ontology

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

AppDomain

GET

POST

PUT

DELETEValidation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETE

Data

service

Models

service

Validatio

n service

Reporting

service

Ontology

Clients (User Interface) Standalone Browsers etc

Data

service

Data

service

all kind of

data

Registrati

on

Queries

Models

service

Reporting

service

Validation

service

Web servicesWeb services

Web services

Web services

WS

WS

Web services

Security

WSWS

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 8: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Design principles

bull Resource oriented

ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0

ndash RESTfull API design starts by identifying most important objects and groups of objects supported by

the software system and proceeds by defining URL patterns

bull Transport protocol

ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well

bull Operations

ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP

(GET POST PUT DELETE) operations are the common choice when the transport protocol is

HTTP

bull Hypermedia as the Engine of Application State

ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful

applications Thus a representation of a resource should return hypermedia links to related resources

bull Error codes (for each resourceoperation pair )

ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used

Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST

API There are (currently) no standards for RESTful applications but merely design guides

Ideaconsult LtdAugust 22 20108

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 9: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

OpenTox resources (1)

OpenTox considers the following set of entities as essential building blocks

bull Structures of chemical compounds

bull Properties and identifiers of chemical compounds

bull Datasets of chemical compounds and various properties (measured or calculated)

bull Algorithms

ndash Data processing algorithms

bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)

bull Data preprocessing (eg Principal component analysis feature selection)

bull Structure processing (eg structure optimization)

bull Algorithms relating set of structures to another set of structures (eg similarity search or

metabolite generation)

ndash Machine learning algorithms

bull Supervised (eg Regression Classification)

bull Unsupervised (eg Clustering )

ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human

experts not derived by learning algorithms)

Ideaconsult LtdAugust 22 20109

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 10: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

OpenTox resources (2)

bull Models are generated by respective algorithms given specific parameters

bull Statistical models are generated by applying statisticalmachine learning algorithms

to specific dataset and parameters

bull Models can be other than statistical eg expert defined rules quantum mechanical

calculations metabolite generation etc The intention of the framework is to be

generic enough to accommodate varieties of predictive models

bull Validation provides procedures independent of model building facilities (eg

crossvalidation) and generates relevant statistics

bull Reports

ndash Various types of reports might be generated using building blocks above (eg validation

report can be generated using validation object a model and a dataset)

bull In addition the following components are introduced

ndash Task (asynchronous processing of computationally intensive tasks)

ndash Authentication and authorization (Ensuring secure access to sensitive resources)

ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources

registration)

Ideaconsult LtdAugust 22 201010

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 11: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources identificationAll resources are identified via unique web address assigned according to the URL templates

Ideaconsult Ltd11

Component Description URL Template (example)

Compound Representations of chemical compounds httphostportcompoundcompoundid

Feature Properties and identifiers httphostportfeaturefeatureid

Dataset Encapsulates set of chemical compounds and their property

values

httphostportdatasetdatasetid

Model OpenTox model services httphostportmodelmodeld

Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid

Validation

Report

A validation corresponds to the validation of a model on a

test dataset

httphostportvalidationvalidationid

httphostportreportreportid

Task Asynchronous jobs are handled via an intermediate Task

resource A resource submitting an asynchronous job

should return the URI of the task

httphostporttasktaskid

Ontology service Provides storage and SPARQL search functionality for

objects defined in OpenTox services and relevant

ontologies

httphostportontology

Authentication and

authorisation

Granting access to protected resources for authorised users httphostportopensso

httphostportopensso-pol

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 12: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

OpenTox REST operations

Ideaconsult Ltd12

Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg

httphostportmodelmodel_id or httphostportdatasetdataset_id

bull GET ndash retrieve representation of the resource

bull PUT ndash update representation of the resource

bull POST

ndash replace representation of the resource with a new one (eg replace the dataset with new

content)

ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a

model URI as a result)

bull DELETE ndash delete the resource

Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)

bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)

bull PUT - NA

bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset

content to the dataset service)

bull DELETE ndash NA

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 13: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Build a predictive model

Create a model

Run calculations with

dataset

httphost1datasetid

Structures

descriptors

endpoints

Dataset service

Returns the model URL

httphost1modelid

HTTP POST

Build a predictive model

Regression

Classification

Quantum Chemistry

Descriptors etc

validationid

Algorithm service

Validation service

modelid

Published models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 14: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Read data from a web address ndash process ndash write to a web address

Uniform approach to models creation

14Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmneuralnetwork

httpmyhostcommodelpredictivemodel1

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 15: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Use an algorithm to build a model

Ideaconsult Ltd15

bullAn algorithm is applied by submitting

HTTP POST to the algorithm URI and

providing required parameters

bullA common required parameter is

dataset_uri=httphostportdatasetda

tasetid which specifies the data set to be

operated on

bullHTTP POST in REST style services

returns URI of the result and not the

content of the result

bullThe algorithm services are designed to

store the results into a dataset service and

return the URL of the resulted dataset

bullIn case of slow calculations a Task URI

instead of the dataset URI is returned

$ curl -H Accepttexturi-list -X POST -d

dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d

prediction_feature=httpappsideaconsultnet8080ambit2feature26

701 -d

dataset_service=httpappsideaconsultnet8080ambit2dataset

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmJ48 -iv

Connected to opentoxinformatiktu-muenchende (1311592816) port

8080 (0)

POST OpenTox-devalgorithmJ48 HTTP11

gt Host opentoxinformatiktu-muenchende8080

Accept

gt Content-Type applicationx-www-form-urlencoded

lt HTTP11 202 Accepted

lt Date Sat 31 Jul 2010 144638 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 99

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 16: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources The model

Ideaconsult Ltd16

$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-

muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c

About to connect() to opentoxinformatiktu-muenchende port 8080 (0)

Trying 1311592816 connected

Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)

gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11

gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g

zlib1233 libidn18 libssh2018

gt Host opentoxinformatiktu-muenchende8080

gt Accepttexturi-list

gt

lt HTTP11 200 OK

lt Date Sat 31 Jul 2010 144722 GMT

Date Sat 31 Jul 2010 144722 GMT

lt Location httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

lt Vary Accept-Charset Accept-Encoding Accept-Language Accept

lt Accept-Ranges bytes

lt Server Noelios-Restlet-Engine11snapshot

lt Content-Type texturi-listcharset=ISO-8859-1

lt Content-Length 86

lt

Connection 0 to host opentoxinformatiktu-muenchende left intact

Closing connection 0

httpopentoxinformatiktu-muenchende8080OpenTox-

devmodelTUMOpenToxModel_j48_48

bullWhen task URI is returned the

returned status code is HTTP 202

Accepted instead of HTTP 200

OK

bullThis tells the client the processing

is not completed and the client need

to poll the task URI until OK code

is returned

bullThe final result returned by

Example 25 is the URI of the new

model httpopentoxinformatiktu-

muenchende8080OpenTox-

devmodelTUMOpenToxModel_j4

8_48

bullTo obtain prediction results POST

a dataset to the model URI

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 17: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Read data from a web address ndash process ndash write to a web address

Uniform approach to data processing (eg

Descriptors calculation)

17Ideaconsult Ltd

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

Algorithm

GET

POST

PUT

DELETE

+ =

httpmyhostcomdatasettrainingset1

httpmyhostcomalgorithmdescriptorX

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 18: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Read data from a web address ndash process ndash write to a web address

Uniform approach to models validation and

report generation

18Ideaconsult Ltd

Dataset

GET

POST

PUT

DELETE

Model

GET

POST

PUT

DELETE

+

=Validation

GET

POST

PUT

DELETE

Report

GET

POST

PUT

DELETEModel generating

predictions

Validation report

httpmyhostcomreport1

httpmyhostcomdatasettrainingset1

httpmyhostcomdatasetpredictedresults1

httpmyhostcommodelpredictivemodel1

httpmyhostcomvalidation

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 19: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Apply predictive models

Returns the results

dataset URL

httphostdatasetid

Retrieve available

endpoints and model

URLs eg

httphost1modelid

Published models

Algorithms

Ontologies

metadata

Ontology

service

HTTP POST

SPARQL

HTTP POST

modelidApply the model

httphost1modelid

to dataset

httphost2datasetid

Model service

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 20: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Read data from a web address ndash process ndash write to a web address

Uniform approach to model prediction

20Ideaconsult LtdAugust 22 2010

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

+=

httpmyhostcomdatasetid1

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

=

httpmyhostcomdatasetresults1

Model

GET

POST

PUT

DELETE

httpmyhostcommodelpredictivemodel1

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 21: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Uniform access to data

Upload data receive

dataset URL

httphost2datasetid

Annotation

Find chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

Publish your data retrieve linked data

Published models

Algorithms

Ontologies

metadata

Ontology service

Feature

GET

POST

PUT

DELETE

Compound

GET

POST

PUT

DELETE

Dataset

GET

POST

PUT

DELETE

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 22: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

RDF - Resources representation

bull The opentoxowl ontology

ndash A common OWL data model of all

OpenTox resources

ndash Describes OpenTox resources

ndash Describes relationships between them

ndash Generates objects RDF representations

bull RDFXML representation is mandatory for

OpenTox resources

bull Uniform approach to data representation

ndash Calculated and measured properties of

chemical compounds are represented in an

uniform way

ndash Linked to the resource used for data

generation

ndash Annotated via ontology entries

ndash Model representations link to algorithms

and data used

bull Ideaconsult Ltd

22

All OpenTox components are defined by

OWL ontology

httpopentoxorgapi11opentoxowl

All resources are subclasses of

otOpenToxResource

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 23: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Chemical compound

Ideaconsult Ltd23

$ curl -H Acceptchemicalx-daylight-smiles

httpappsideaconsultnet8080ambit2compound1

O=C

$ curl -H Acceptchemicalx-mdl-molfile

httpappsideaconsultnet8080ambit2compound1

CH2O

APtclcactv09040902283D 0 000000 000000

4 3 0 0 0 0 0 0 0 0999 V2000

-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0

06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0

11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0

1 2 2 0 0 0 0

2 3 1 0 0 0 0

2 4 1 0 0 0 0

CompoundProvides different representations for

chemical compounds with a unique and

defined chemical structurecompoundid

Conformercompoundidconformerid

Documentationhttpopentoxorgdevapisapi-11structure

RepresentationA subclass of otOpenToxResource

Supports different Chemical MIME

formats

RDF representation only for specifying

owlsameAs links to external resources

Example 1 Retrieve compound as MOL

Example 2 Retrieve compound as SMILES

Example 3 Query compounds

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword

$ curl ndashH Acceptchemical-mime ldquo

httpappsideaconsultnet8080ambit2querysmartssearch=smarts

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 24: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Feature

Ideaconsult Ltd24

FeatureA Feature is a resource representing any

kind of a property or identifier assigned to

a Compound

The feature types are determined via their

links to ontologies (Feature ontologies

Decsriptor ontologies Endpoints

ontologies)featureid

Documentationhttpopentoxorgdevapisapi-11feature

RepresentationotFeature a subclass of

otOpenToxResource

Mandatory RDFXML format

Properties

bullName defined by dctitle (Dublin Core namespace )

bullUnits defined by otunits annotation property (OpenTox

namespace)

bullCreator defined by dccreator annotation property

(Dublin Core namespace)

bullThe origin of the Feature is defined by othasSource

object property (OpenTox namespace) element and can be

otAlgorithm otModel or otDataset

bullRelations to other resources which represent the same

entity could be established via owlsameAs property

This approach can be used for example to link the

otFeature resource to a resource from another ontology

(an example follows)

bullThere are subclasses of otFeature (namely) which are

used otNumericFeature otStringFeature

otNominalFeature denote if a feature holds numeric

nominal or string values

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 25: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Feature (an example)

Ideaconsult Ltd25

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature22114

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlnsotee=httpwwwopentoxorgechaEndpointsowl

xmlnsdc=httppurlorgdcelements11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsowl=httpwwww3org200207owl

xmlnsxsd=httpwwww3org2001XMLSchema

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt

ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt

ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt

ltrdfssubClassOf

rdfresource=httpwwwopentoxorgapi11Featuregt

ltowlClassgt

ltotNumericFeature rdfabout=feature22114gt

ltothasSourcegt

ltotAlgorithm

rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo

gPDescriptorgt

ltothasSourcegt

ltowlsameAs rdfresource=ldquooteeOctanol-

water_partition_coefficient_Kowgt

ltdctitlegtXLogPltdctitlegt

ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt

ltotNumericFeaturegt

ltrdfRDFgt

The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114

bullLinked to an entry of a simplified ontology of toxicological endpoints

The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor

bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 26: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Feature (an example)

Ideaconsult Ltd26

$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184

prefix ot lthttpwwwopentoxorgapi11gt

prefix dc lthttppurlorgdcelements11gt

prefix lthttpappsideaconsultnet8080ambit2gt

prefix rdfs lthttpwwww3org200001rdf-schemagt

prefix owl lthttpwwww3org200207owlgt

prefix xsd lthttpwwww3org2001XMLSchemagt

prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

othasSource

a owlObjectProperty

otunits

a owlDatatypeProperty

otFeature

a owlClass

otNumericFeature

a owlClass

rdfssubClassOf otFeature

af26184

a otFeature otNumericFeature

dccreator httpambituni-plovdivbg8080ambit2

dctitle TUM_CDK_XLogP

othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptorgt

otunits

= oteeOctanol-water_partition_coefficient_Kow

An example (N3) of another otFeature

resource XLogP descriptor (again)

bullGenerated by different implementation

bullIn this case its name is

ldquoTUM_CDK_XLogPrdquo and the algorithm

resource used to generate resides at Technical

University of Munich (TUM) premises

httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithmCDKPhysChemXLogPDescriptor

This algorithm URL could also be used to

initiate descriptor calculations

bullThe representation of Algorithm resources

refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-

algorithmsxlogP

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 27: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Toxicity endpoints ontologies

Ideaconsult Ltd27

bullDerived from ECHA classification of

endpoints published in REACH guidance

documents

bullPhysicochemical properties and various

toxicological endpoints

bullThe hierarchy doesnrsquot represent the

complexity of toxicological assays but

can be used as a first approximation to

assign meaning to the data entries and

generate REACH report

bullMore specific description of toxicological

assays can be used as well

bullOntologies for specific toxicity assays are

developed by OpenTox partners

bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 28: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources a feature

Ideaconsult Ltd28

curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2feature3

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

xmlns=httpappsideaconsultnet8080ambit2

xmlnsaf=httpappsideaconsultnet8080ambit2feature

hellip

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotFeature rdfabout=feature3gt

dccreatorgt

ltothasSourcegtECHAhellipltothasSourcegt

ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt

ltotunitsgtltotunitsgt

ltdctitlegtECltdctitlegt

ltotFeaturegt

ltrdfRDFgt

An illustration of otFeature

imported from a file and not

calculated

The example shows a feature

representing EINECS

number imported from the

ECHA preregistration list

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 29: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Feature summary

bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs

bull These URIs are dereferencable

bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values

bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc

Ideaconsult Ltd29

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 30: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Dataset

OperationsPOST ndash Upload a dataset

PUT ndash Update the dataset content

DELETE ndash Remove the dataset

Representation

RDFXML (mandatory)

bullThe dataset consists of data entries

bullEach entry is associated with exactly one

chemical compound identified by its

URI and available via OpenTox

Compound service API

bullOne and the same compound can be

associated with multiple dataset entries

bullEvery ldquocolumnrdquo is associated with a

Feature its representation should be

available via OpenTox Feature API

Ideaconsult Ltd30

DatasetProvides access to chemical compounds and their

features (eg structural physical-chemical

biological toxicological properties)

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 31: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Dataset

Representation

Ideaconsult Ltd31

The dataset services optionally

support formats other than RDF

bulltextcsv (Comma delimited)

bulltextx-arff (Weka ARFF)

bullapplicationpdf

bullchemicalx-mdl-sdfile

bullother Chemical MIME formats

This allows retrieving the same data

in convenient format but the URL

links to compound and feature

resources are being lost

prefix ad lthttpappsideaconsultnet8080ambit2datasetgt

prefix af lthttpappsideaconsultnet8080ambit2featuregt

prefix ot lthttpwwwopentoxorgapi11gt

hellip

ad9 a otDataset

otdataEntry

[ a otDataEntry

otcompound

lthttpappsideaconsultnet8080ambit2compound413conformer409421gt

otvalues

[ a otFeatureValue

otfeature af21576

otvalue 3309999942779541^^xsddouble

]

otvalues

[ a otFeatureValue

otfeature af21573

otvalue 30^^xsddouble

]

]

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 32: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Dataset metadata and features

Ideaconsult Ltd32

Description URI Template

Retrieve entire dataset content If uri-list

retrieve only compound URIs

httphostportdatasetid

Retrieve representation of features (columns)

of the dataset

httphostportdatasetidfeature

Retrieves dataset metadata (name etc) httphostportdatasetidmetadata

$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata

ltrdfRDF

xmlnsot=httpwwwopentoxorgapi11

helliphellip

xmlnsrdfs=httpwwww3org200001rdf-schema

xmlbase=httpappsideaconsultnet8080ambit2gt

ltotDataset rdfabout=dataset9gt

ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt

ltdcpublishergtsomebodyltdcpublishergt

ltrdfsseeAlsogt

ltbxEntry rdfabout=reference20117gt

ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt

ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt

ltbxEntrygt

ltrdfsseeAlsogt

ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt

ltotDatasetgt

ltrdfRDFgt

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 33: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Data publishing

Upload data receive

dataset URL

httphost2datasetid

AnnotationFind chemical compounds

return dataset URL

httphost2compoundid

HTTP GET POST

Dataset service

Structures

endpoints

amppredictions

1)POST a file with chemical structures and properties to

OpenTox dataset servicebullThe structures and data are assigned a dataset URL and

become available by multiple formats (RDF Chemical

MIME CSV Weka ARFF)

2)Assign metadata

bullPUT datasetidmetadata

3)Annotate any of dataset features

datasetidfeature by assigning links to relevant

ontologies

bullPUT featureid

Toxiciology related

ontologies

Algorithms

ontologies

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 34: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Algorithm

AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by

different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at

httphostportalgorithm

Ideaconsult Ltd34

curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-

devalgorithm

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2

httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 35: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Algorithm

Representation

bull Multiple type of algorithms

bull descriptor calculation algorithms

bull machine learning procedures

bull data preprocessing

bull The representation of algorithms is again defined by

Opentox ontology where all algorithms are subclass of

otAlgorithm

Algorithm types ontology

httpopentoxorgdatadocumentsdevelopmentRDF2

0filesAlgorithmTypes

bull provides a classification of algorithm types

bull Algorithm type in RDF representation is set by direct

subclassing (rdftype) of a class from the algorithm types

ontology (otahttpwwwopentoxorgalgorithmsowl )

eg ltmyalgorithmgt rdftype otaClassification

Ideaconsult Ltd35

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 36: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Algorithm

Ideaconsult Ltd36

$ curl -H Acceptapplicationrdf+xml

httpappsideaconsultnet8080ambit2algorithmJ48

ltrdfRDF

xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns

xmlnsot=httpwwwopentoxorgapi11

hellip

xmlnsota=httpwwwopentoxorgalgorithmTypesowl

ltotaSupervised

rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt

ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring

gtClassification Decision tree J48ltdctitlegt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt

ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring

gtltdcdescriptiongt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt

ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI

gtSomebodyltdcpublishergt

ltrdftype

rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt

ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt

ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime

gtSat Jul 31 171126 EEST 2010ltdcdategt

ltotaSupervisedgt

ltrdfRDFgt

RepresentationbullAlgorithm name is defined by dctitle

Parameters supported by the algorithm are

specified via object property otparameters

and should be of class otParameter (as

defined in opentoxowl)

These entries serve as a information what

parameters are required in order to run the

algorithm the values itself should be

provided by the client when initiating the

calculations via POST

bullAlgorithm types are distinguished by

means of Algorithm types ontology

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 37: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Resources Model

bull Representations of predictive models

bull A Model is created by HTTP POST to an

otAlgorithm with specific parameters

andor input otDataset

Ideaconsult Ltd37

Representation

bull Model Name is defined by dctitle property

bull Model creator might be defined by dccreator

property

bull The date of Model creation is defined by dcdate

property

bull The Algorithm defined by otalgorithm object

property

bull The independent variables are instances of

otFeature defined by otindependentVariables

property (can be multiple)

bull The dependent variables are are instances of

otFeature and are defined by

otdependentVariables property (can be multiple)

bull The variables where prediction results will be

stored are are instances of otFeature and are

defined by otpredictedVariables (can be multiple)

bull Parameters are defined by otparameters

bull The training Dataset is an instance of otDataset and

defined by ottrainingDataset

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 38: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Algorithm

service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd38

Model

resourceDataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Feature

service

Feature

service

Compoun

d service

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 39: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Linked resources Compound Algorithm Model Dataset Features

Ideaconsult Ltd39

Dataset

Resource

Descriptor

resource

Assay

resource

Chemical

compound

Regression

Classification

Quantum

Chemistry

Descriptorsetc

Blue Obelisk

algorithms

ontology

OpenTox

algorithm types

ontology

Toxiciology related

ontologies

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 40: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Make the model available

40Ideaconsult Ltd

Register at OpenTox ontology servicendash RDF tripple storage

ndash Accepts HTTP POST

ndash SPARQL endpoint

Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology

Becomes visible for applications

modelidPublished models

Algorithms

Ontologies

metadata

Ontology service

HTTP POST

Model service

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 41: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Services implementation by partner and

service type

Ideaconsult Ltd41

Part

ner

No

Serv

ice t

ype

Com

pound

Data

set

Featu

re

Alg

ori

thm

(pro

cess

ing)

Alg

ori

thm

(model)

Model

Task

Report

Validati

on

Auth

ern

ticati

on

and A

uth

ori

sati

on

serv

ice

Onto

logy s

erv

ice

2 Y Y Y Y Y Y

3

(IDEA)

Y Y Y Y Y Y Y Y

5 Y Y Y

6 Y Y Y Y

7 Y Y Y

10 Y

All components are implemented as REST web services

There could be multiple implementations of same type of

components

(Subset of) services could be hosted by the same provider or

by multiple providers on separate locations

httpappsideaconsultnet8080ambit2

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 42: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

OpenTox Is A Framework

Framework

Unified Access

Open Source

bull Toxicity data

bull Predictive models

bull Validation support

bull Interpretation aids

bull Toxicologists

bull Modelers

bull API for new algorithmsdevelopment amp integration

bull To optimise impact

bull To allow inspection review

bull To attract external contributors

OpenTox services can be used to develop specific applications or embedded in

workflow systems

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 43: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

bull Two end user oriented demo applications making use of OpenTox

webservices have been developed deployed and are available for

testing ndash httptoxcreateorg and httptoxpredictorg

bull ToxCreate creates models from user supplied datasets

bull ToxPredict uses existing OpenTox models to estimate chemical

compound properties

Demo applications

43Ideaconsult LtdAugust 22

2010

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 44: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

RDF Lessons learned

bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project

bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)

resource representation described by triples

bull Steep learning curve

bull Some hard topicsndash Data model vs format

ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure

ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF

Ideaconsult Ltd44

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 45: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

RDF Performance

RDF representation is

verbose hellip

hellip and in-memory RDF

libraries are slow hellip

Ideaconsult Ltd45

A dataset with 320 chemicals 60 columns

A dataset with 6500 chemicals 12 columns

hellip lack of streaming parserswriters hellip

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 46: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

RDF Wish list

bull Convenient explanation of subject-predicate-

object concept for beginners

bull A (high performant) triple storage should not be

a mandatory requirement to publish RDF data

bull Fast streaming parsers and writers

bull (terse) JSON serialisation

bull Security

bull Synchronisation of distributed RDF content

Ideaconsult Ltd46

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet

Page 47: RESTful RDF Web Services for Predictive Toxicologybulletin.acscinf.org/PDFs/240nm02.pdf · RESTful RDF Web Services for Predictive Toxicology Dr. Nina Jeliazkova ... Framework design

Thank you

Ideaconsult Ltd47

Build an application with OpenTox REST Web Services API

httpopentoxorgdevapisapi-11

Download AMBIT Implementation of OpenTox API

and launch your OpenTox service

httpambitsourceforgenet