Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources: ...
-
Upload
wilfred-newman -
Category
Documents
-
view
216 -
download
2
Transcript of Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources: ...
Chemical Informatics and Cyber-infrastructure Building Blocks Chemical Informatics Resources:
Deluge of experimental data > 100,000 compounds screened by 10 publicly funded high throughput
screening centers using various assay techniques (molecular to cellular) Molecular Libraries Screening Center Network
Chemical databases maintained by various groups NIH PubChem, NIH DTP
Chemical informatics and computational chemistry Data clustering, data mining, descriptor calculations, toxicity prediction,
docking, molecular modeling, and quantum chemistry Visualization tools Web resources: journal articles, etc.
A Chemical Informatics Grid will need to integrate these into a common, loosely coupled, open, distributed computing environment.
Our Solution Stack Domain specific Web Services
VOTables, CDK services Grid services, Cyber-
infrastructure for computationally intensive applications. Clustering, quantum chemistry
Workflow and service management We work with Taverna Many solutions: Kepler, BPEL
engines, etc. Portlets and other user
interfaces Rich desktop apps Ubiquitous clients
Portals and Other User Interfaces
Workflow and ServiceManagement
Web and Grid Services
Each level is subject for research and development, as is their integration.
Wrapping Science Applications as Services Science Grid services typically must wrap legacy
applications written in C or Fortran. You must handle such problems as
Specifying several input and output files These may need to be staged in
Launching executables and monitoring their progress. Specifying environment variables
Often these have also shell scripts to do some miscellaneous tasks.
How do you convert this to WSDL? Or (equivalently) how do you automatically generate the
XML job description for WS-GRAM?
Flow Chart of SMILES to Cluster Partitioned of BCI Web ServiceSMILEString
Makebits
Dictionary(Default)
Fingerprint(*.scn)
DivKmeansCluster
Hierarchy(*.dkm)
Optclus RNNclusOne
ColumnProcess
MergeProcess
ExtractedCluster
Hierarchy(*.clu)
NewSMILEString
GeneratingFingerprints
ClusteringFingerprints
Generatingthe best levels
SMILES to DKM
Extracting individualcluster partitions
best
level
BCI Clustering Service Methods
Service Method Description Input Output
makebitsGenerate Generate fingerprints from a SMILES structure
SMIstring Fingerprint string
divkmGenerate Cluster fingerprints with Divkmeans
SCNstring Clustered Hierarchy
smile2dkm Makebits + divkm SMIstring Clustered Hierarchy
optclusGenerate Generate the best levels in a hierarchy
DKMstring Best partition cluster level
rnnclusGenerate Extract individual cluster partitions
DKMstring Indiv. cluster partitions
smile2ClusterPartitioned
Generate a new SMILES structure w/ extra col.
SMIstring New SMILES structure
Submitting Applications with Condor We are working to use Condor-G as a simple bridge
to the NSF’s TeraGrid for job submission. Condor has a Web Service interface (called
BirdBath) that we are using to construct Java portlets.
We are investigating how to construct Condor classads using GPIR. Required for Condor matchmaking But no facility for this built in to the TeraGrid.
CondorMaster
Condor
Condor
Condor
Condor
Condor Only Condor-G and Globus
(Portal)Client
Condor-G
LSFPBS
TeraGridGlobus
TeraGridGlobus
(Portal)Client
VOTables: Handling Tabular Data Developed by the Virtual Observatory community for encoding
astronomy data. The VOTable format is an XML representation of the tabular
data (data coming from BCI, NIH DTP databases, and so on). VOTables-compatible tools have been built
We just inherit them. SAVOT and JAVOT JAVA Parser APIs for VOTable allow us
to easily build VOTable-based applications Web Services Spread sheet Plotting applications.
VOPlot and TopCat are two
VOPlot Application from generated votable.xml file : Graph plotted on Mass (X–axis) and PSA (Y-axis)
More Services: WWMM ServicesServices Descriptions Input Output
InChIGoogle Search an InChI structure through Google
inchiBasic
type
Search result in HTML format
InChIServer Generate InChI version
format
An InChI structure
OpenBabelServer
Transform a chemical format to another using Open Babel
format
inputData
outputData
options
Converted chemical structure string
CMLRSSServer
Generate CMLRSS feed from CML data
mol, title description link, source
Converted CMLRSS feed of CML data
CDK-Based Services
Common Substructure
Calculates the common substructure between two molecules.
CDKsim Takes two SMILES and evaluates the Tanimoto coefficient (ratio of intersection to union of their fingerprints).
CDKdesc Calculates a variety of molecular and atomic descriptors for QSAR modeling
CDKws Fingerprint generation
CDKsdg Creates a jpeg of the compound’s 2D structure
CDKStruct3D Generates 3D coordinates of a molecule from its SMILE
ToxTree Service The Threshold of Toxicological
Concern (TTC) establishes a level of exposure for all chemicals below which there would be no appreciable risk to human health.
ToxTree implements the Cramer Decision Tree approach to estimate TTC.
We have converted this into a service. Uses SMILES as input. Note the GUI must be
separated from the library to be a service
http://ecb.jrc.it/QSAR/home.php?CONTENU=/QSAR/qsar_tools/qsar_tools_toxtree.php
OSCAR3 Service Oscar3 is a tool for shallow, chemistry-specific
natural language parsing of chemical documents (i.e. journal articles).
It identifies (or attempts to identify): Chemical names: singular nouns, plurals, verbs etc., also
formulae and acronyms. Chemical data: Spectra, melting/boiling point, yield etc. in
experimental sections. Other entities: Things like N(5)-C(3) and so on.
Results are exported as an XML file. There is a larger effort, SciBorg, in this area
http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html It also has potentially very interesting Workflows
http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3