ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

6
ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network P. Sopasakis 1 and H. Sarimveis 2,* 1 IMT Institute for Advances Studies Lucca, Piazza S. Ponziano, 6, 55100, Lucca, Italy, Email: [email protected], 2 Unit of Process Control and Informatics, School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou, 9, 15780, Zografou Campus, Athens, Greece, *Corresponding author: E-mail: [email protected], Tel +30 2107723237, Fax: +30 2107723138 Abstract The EU research project OpenTox has delivered a distributed computational network especially designed for the purposes of Predictive Toxicology by means of (Q)SAR Models and Read-Across methodologies. OpenTox has introduced new standards in the field in regard to how Internet resources can be integrated with in-house databases and algorithms to predict the impact of the release of a chemical to the environment on the local flora and fauna and on public health. The ToxOtis suite serves a double purpose in the quest for painless integration: First off, it is a Java interface to any OpenTox compliant web service and facilitates access control (Authentication and Authorization), the parsing of RDF (Resource Description Framework) documents that are exchanged with the web services, and the consumption of Model Building, Toxicity Prediction and other ancillary web services (e.g. computation of molecular similarity). Second, it facilitates the database management, the serialization of resources in RDF and provides all that is necessary to a web service provider to join the OpenTox network and offer predictive toxicology web services. Keywords: predictive toxicology; environmental toxicology; web ontologies; REST architecture; open- source software. 1. INTRODUCTION OpenTox [1] is a distributed computational network designed based on the principles of the REpresentational State Transfer (REST) [2]. ToxOtis offers elaborate search facilities to both online and local databases of chemical compounds and properties. It allows users to download and upload data to OpenTox web services, train predictive models (both regression and classification), analyze data (e.g. select the most descriptive molecular descriptors for a given target endpoint), evaluate the Domain of Applicability (DoA) of a (Q)SAR model, derive toxicological predictions and, last but not least, specify access restrictions on the resources they create on the network. Asynchronous computational tasks initiated on the OpenTox network are mapped to lightweight Java threads client-side which monitor the progress of the remote execution. 2. THE CORE MODULE ToxOtis implements a modular architecture where the OpenTox components are mapped to Java classes. Each one of these components has a corresponding representation in RDF, i.e., a standard representation of a data model that describes its meta-data. Examples of such components are Algorithms, Models, Tasks and Datasets. As far as their ontological nature and the corresponding RESTful API are concerned, you can find detailed documentation at the OpenTox web site [3].

description

The ToxOtis suite serves a double purpose in the quest for painless integration: First off, it is a Java interface to any OpenTox compliant web service and facilitates access control (Authentication and Authorization), the parsing of RDF (Resource Description Framework) documents that are exchanged with the web services, and the consumption of Model Building, Toxicity Prediction and other ancillary web services (e.g. computation of molecular similarity). Second, it facilitates the database management, the serialization of resources in RDF and provides all that is necessary to a web service provider to join the OpenTox network and offer predictive toxicology web services.

Transcript of ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

Page 1: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

P. Sopasakis1 and H. Sarimveis2,*

1 IMT Institute for Advances Studies Lucca, Piazza S. Ponziano, 6, 55100, Lucca, Italy, Email: [email protected],

2 Unit of Process Control and Informatics, School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou, 9, 15780, Zografou Campus, Athens, Greece,

*Corresponding author: E-mail: [email protected], Tel +30 2107723237, Fax: +30 2107723138

Abstract The EU research project OpenTox has delivered a distributed computational network especially designed for the purposes of Predictive Toxicology by means of (Q)SAR Models and Read-Across methodologies. OpenTox has introduced new standards in the field in regard to how Internet resources can be integrated with in-house databases and algorithms to predict the impact of the release of a chemical to the environment on the local flora and fauna and on public health. The ToxOtis suite serves a double purpose in the quest for painless integration: First off, it is a Java interface to any OpenTox compliant web service and facilitates access control (Authentication and Authorization), the parsing of RDF (Resource Description Framework) documents that are exchanged with the web services, and the consumption of Model Building, Toxicity Prediction and other ancillary web services (e.g. computation of molecular similarity). Second, it facilitates the database management, the serialization of resources in RDF and provides all that is necessary to a web service provider to join the OpenTox network and offer predictive toxicology web services. Keywords: predictive toxicology; environmental toxicology; web ontologies; REST architecture; open-source software. 1. INTRODUCTION OpenTox [1] is a distributed computational network designed based on the principles of the REpresentational State Transfer (REST) [2]. ToxOtis offers elaborate search facilities to both online and local databases of chemical compounds and properties. It allows users to download and upload data to OpenTox web services, train predictive models (both regression and classification), analyze data (e.g. select the most descriptive molecular descriptors for a given target endpoint), evaluate the Domain of Applicability (DoA) of a (Q)SAR model, derive toxicological predictions and, last but not least, specify access restrictions on the resources they create on the network. Asynchronous computational tasks initiated on the OpenTox network are mapped to lightweight Java threads client-side which monitor the progress of the remote execution. 2. THE CORE MODULE ToxOtis implements a modular architecture where the OpenTox components are mapped to Java classes. Each one of these components has a corresponding representation in RDF, i.e., a standard representation of a data model that describes its meta-data. Examples of such components are Algorithms, Models, Tasks and Datasets. As far as their ontological nature and the corresponding RESTful API are concerned, you can find detailed documentation at the OpenTox web site [3].

Page 2: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

The ToxOtis library implements an intuitive Java API facilitating the serialisation of objects to RDF and vise versa, that is parsing of RDF documents. All components are characterized by their meta information which consist of a subset of the Dublin Core properties, some RDFS and OWL properties and some OpenTox-specific properties (like ot:hasSource). In what follows we present these components and their functionality.

ToxOtis can cooperate with any web service that is compliant to the specifications of the OpenTox API v1.2 [4]. ToxOtis was designed using Apache Maven and is hosted as a Maven artifact on the Nexus repository at http://opentox.ntua.gr:8081/nexus from where one may download it. Thorough documentation is available on our wiki page at http://opentox.ntua.gr/wiki/. 2.1 Authentication and Authorisation with ToxOtis A lot of OpenTox web services require the clients to authenticate themselves against the central Signle Sign-On (SSO) system so as to grant them access to their functionalities. ToxOtis facilitates this procedure by providing a mechanism for acquiring and managing SSO authentication tokens. Authentication tokens are provided by the SSO service upon successful authentication. The simplest way to authenticate a user using ToxOtis is by the following one-line piece of code:

AuthenticationToken at = new AuthenticationToken("john_smith","pass");

The user can now pass the object at to any method that required it. In certain cases authentication tokens are of course optional and this step can be skipped. It is considered to be safer to use key files instead of hard-coding the users’ credentials in the source code. For this purpose, ToxOtis offers the following constructor:

File passwordFile = new File("/path/to/my_sercret.key"); AuthenticationToken at = new AuthenticationToken passwordFile);

Applications that support multiple users may make use of the class TokenPool which guarantees that every user is assigned a singe authentication token and that on shutdown all tokens are invalidated. Here is an example of use:

TokenPool tokenPool = TokenPool.getInstance();

tokenPool.login("/path/to/jsmith.key");// token for user ‘john_smith’ tokenPool.login("/path/to/jsmith.key");// does nothing tokenPool.login("/path/to/alice.key");// token for user ‘alice’ AuthenticationToken at = tokenPool.getToken("john_smith"); tokenPool.logoutAll();// invalidates all tokens

2.2 Compounds and Conformers A chemical conformer (class Conformer) is an identifier of a unique chemical substance up to its 3D characteristics. The class Compound is used in ToxOtis to describe all sorts of chemical compounds and map them to their properties. Users can easily create a compound or a conformer and assign an identifier to it that is a URI. If this URI corresponds to a valid URL of an OpenTox service, then one can download meta-information about the compound or a representation thereof in some desired format as in the following example where the compound with identifier http://apps.ideaconsult.net:8080/ambit2/compound/10 (CAS-RN: 50-11-3) is stored to a file in MOL format:

// Download Compound from a remote OpenTox web service using ToxOtis: Compound comp = new Compound(new VRI(Services.IDEACONSULT).augment("compound","10")); File destination = new File("/path/to/file.mol"); comp.download(destination, Media.CHEMICAL_MDLMOL, at);

Page 3: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

One may request other file formats such as CML, SDF, SMILES, InChI, InChI Key, Image formats such as PNG, meta-data representations like RDF, N3 and more. 2.3 Algorithms An OpenTox Algorithm is characterized by a set of Ontological Classes that classify it according to the OpenTox Algorithms Ontology [5]. As algorithms can be used for a wide variety of purposes (e.g. (Q)SAR/(Q)SPR model building, feature calculation, feature selection, similarity calculation, substructure matching), required and optional input parameters and algorithm results have to be specified in the algorithm representation along with a definition/description of the algorithm. A set of parameters along with their scope (optional/mandatory) and default values are also available for every algorithm. JAQPOT3, an OpenTox web service, implements the OPTIONS method providing guidelines to the client templated machine-readable documentation and directives for its consumption [6].

A predefined collection of Algorithms is available in the class OpenToxAlgorithms. Users may download RDF representations of Algorithms and acquire access to their meta-data using the method loadFromRemote of the class Algorithm. Here is an illustrative example:

// Download an Alorithm’s meta-data: Algorithm myAlg = new Algorithm(OpenToxAlgorithms.TUM_KNN_CLASSIFICATION.getServiceVri()); myAlg.loadFromRemote(); System.out.println(myAlg.getMeta());

Figure 1. An illustrative representation of the Algorithm ontology of OpenTox.

This will print the following message to the system’s standard output stream:

identifier : http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/kNNclassification^^string title : kNNclassification^^string description : OpenTox REST interface to the WEKA k-Nearest Neighbor learning algorithm. Can select appropriate value of K based on cross-validation. Can also do distance weighting.^^string date : Mon Sep 13 20:19:24 EEST 2010^^dateTime

2.4 Features In OpenTox, a Feature is an object representing any kind of property assigned to a Compound. The feature types are determined by links to an appropriate ontological class (Feature and Descriptor Ontologies, Endpoint Ontologies). According to the OpenTox API, clients may have access to the

Page 4: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

metadata of a Feature in RDF or other relevant format and can create their own Features and assign them desired ontological properties. Using ToxOtis, one may lookup on an OpenTox web service for features of some given ontological class; here is an example of use:

// Find Features: Set<VRI> features = FeatureFactory.lookupSameAs(OTEchaEndpoints.DissociationConstantPKa(), (AuthenticationToken)null); for (VRI f : features) { /* Do something */ }

This factory allows also for new features to be easily created and POSTed to a feature service for publication. Authentication and authorisation are required most of the times for such an operation. The invocation of the corresponding method is especially useful when developing model training web services where a prediction feature needs to be created for the model. Here is an example:

// Create and Publish a new Feature Model m = ...; Feature predictedFeature = FeatureFactory.createAndPublishFeature("Feature created as prediction feature for the RBF NN model "+m.getUri(), new ResourceValue(m.getUri(), OTClasses.Model()), featureService, token);

There is a great variety of methods that enable us to create Features with custom ontological properties. For instance:

// Create and customise a feature: Feature f = new Feature(); f.setUnits("m^4*mA*s^2*kg^-2"); f.getMeta().setTitle("Toxicity of my city"); f.getMeta().setHasSource("http://server.net:8283/model/15451"); f.getMeta().setSameAs("http://some.server.net:8080/feature/B10");

2.5 Models A Model object provides representations of (Q)SAR/(Q)SPR toxicological models. Models are the immutable results of learning algorithms described in section 2.3 and are characterised by their training set, parametrisation used for the training algorithm, dependent feature (e.g., some toxicological endpoint), predicted feature (an estimate of the dependent feature), independent features (i.e., a set of molecular descriptors and/or measured properties) and other meta-information. All the above are identified by URIs in according to the principles of REST. Models can be downloaded like any other ToxOtis object:

// Download and parse a Model: VRI vri = new VRI(Services.NTUA.augment("model","1")); Model model = new Model(vri); model.loadFromRemote(at);

2.6 Asynchronous Execution of Tasks In OpenTox various computational procedures such as model training, data filtering or even data upload operations can be time consuming. While such operations take place, the socket that binds the client to the server has to be released. Such a connection is prone to network errors and additionally it would be quite burdensome for both peers to keep lots of connections open until each operation completes. In order to tackle this problem three structures were adopted by the services: background jobs, queues and execution pools. A client request initializes a background job which is put in an execution queue (In the meanwhile other jobs may run in the background). A task is then created and returned to the user in a supported MIME such as application/rdf+xml. A task is a resource that provides information about the progress and the status of a background job. While the

Page 5: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

background job is running, an estimation of the time left for the completion of its execution (percentage completed) is also included in the representation of the Task. Clients can access the task’s URI to inspect the progress of their task and be informed about its (successful or unsuccessful) completion. We could say that a task is the loading bar of RESTful web services. Tasks can be naturally mapped to Java Threads with ToxOtis. In the following example a Task object is created to monitor the progress of a model training procedure; the Task is then converted to a Callable Java object1:

// Train a model using an algorithm: VRI algUri = new VRI("http://server.org/alorithm/mlr"); VRI dsUri = new VRI("http://server.org/dataset/100"); VRI featUri = new VRI("http://server.org/feature/5"); Algorithm alg = new Algorithm(algUri); Dataset ds = new Dataset(dsUri); Feature f = new Feature(featUri); Trainer trainer = new Trainer(alg, ds, f); Task task = f.train(algUri); Callable<Task> runner = new TaskRunner(task);

2.7 Datasets Datasets are bundles of Conformers along with some of their their Features and values assigned to Conformer-Feature pairs. The elementary blocks of a Dataset are its entries; instances of the class DataEntry. Every DataEntry object points to a chemical conformer and associates it with a set of Feature-Value pairs. Datasets can be downloaded from remote locations in a straightforward way:

// Downloade a Dataset: VRI vri = new VRI(Services.IDEACONSULT.augment("dataset","5")); // Require that the dataset will contain no more than 10 compounds final int size = 10; vri.addUrlParameter("max", size); Dataset ds = new Dataset(vri); ds.loadFromRemote(at);

It is also possible to serialise Dataset objects into RDF and to publish datasets to some OpenTox-compliant web service. In the following example a dataset is downloaded from a remote OpenTox web service and is published to a different location:

// Publish a dataset: VRI vri = new VRI(Services.IDEACONSULT.augment("dataset", "54")); Dataset ds = new Dataset(vri).loadFromRemote(); Task t = ds.publishOnline(Services.AMBIT_UNI_PLOVDIV.augment("dataset"), null); while (Task.Status.RUNNING.equals(t.getHasStatus())) { t.loadFromRemote(); }

Taking into account the popularity of Weka [7] in Machine Learning and its widespread use in predictive toxicology, we implemented converters from and to the Weka Instances objects and ARFF (Attribute-Relation File Format) files to Dataset objects which can are well integrated in the OpenTox framework. We provide the following examples:

// Convert weka.core.Instances to Dataset objects: Instance myInstances = ...; // The Weka Object Dataset myDataset = DatasetFactory.createFromArff(myInstances); // or from an ARFF File: File myFile = new File("/path/to/my.arff"); Dataset ds = DatasetFactory.createFromArff(myFile);

1 For details check out the Javadocs provided at http://opentox.ntua.gr/toxotis/org/opentox/toxotis/util/TaskRunner.html.

Page 6: ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network

3. THE DATABASE MODULE ToxOtis-DB is a module that can be used for the management of an SQL relational database for OpenTox models, datasets, features, tasks, users, models and BibTeX objects [8]. The following example illustrates how easily one can perform advanced searches in the database:

// Find a user with ID 'john': FindUser finder = new FindUser(); finder.setWhere("uid='john'"); User user = null; IdbIterator<User> iterator = finder.list(); if (iterator.hasNext()) { user = iterator.next(); } else { /* No such user in the database */ }

The database engine and the database scheme of ToxOtis-DB are highly optimised to offer high performance and ability to perform complex queries. The registration of objects into the database is rather straightforward as one can judge by the following snippet:

// Find a user with ID 'john': Task t = new Task(Services.ntua().augment("task", UUID.randomUUID())); DbWriter writer = new AddTask(t); writer.write(); writer.close();

The structure of the relational database of ToxOtis-DB is presented in detail in [9]. References 1. Hardy, B. et al. 2010. Collaborative development of predictive toxicology applications. Journal

of Cheminformatics, 2, 1 – 29. 2. Hernández, A. G., García M. N. M., 2010. Distributed Computing Using RESTful Semand Web

Services. Trends in Practical Applications of Agents and Multiagent Systems, Advances in Intelligent and Soft Computing, 71, 295 – 302.

3. The Official OpenTox web site: http://opentox.org (accessed on 25 April, 2013). 4. The OpenTox API v1.2 – Technical Specifications: http://opentox.org/dev/apis/api-1.2

(accessed on 23 April, 2013). 5. Tcheremenskaia, O., Benigni, R., Nikolova, I., Jeliazkova, N., Escher, S. E., Batke, M., Baier,

T., Poroikov, V., Lagunin, A., Rautenberg, M., Hardy, B. 2012. OpenTox predictive toxicology framework: toxicological ontology and semantic media wiki-based OpenToxipedia, Journal of Biomedical Semantics, 3.

6. Sopasakis, P. 2011, The OPTIONS HTTP Method in OpenTox, http://opentox.ntua.gr/wiki/OPTIONS_in_JAQPOT (accessed on 25 April, 2013).

7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann P., Witten, I. H., 2009, The Weka Data Mining Software: An update, SIGKDD Explorations, 11(1).

8. Sopasakis, P., Chomenides H. 2011, http://opentox.ntua.gr/wiki/ToxOtis-db (accessed on 19 April, 2013).

9. Sopasakis, P., Chomenides H. 2011, The Structure of the ToxOtis-DB Database http://opentox.ntua.gr/wiki/ToxOtis_DB_structure (accessed on 20 April, 2013).