Wrapping third-party analytical services for caBIG
-
Upload
harrison-sullivan -
Category
Documents
-
view
32 -
download
2
description
Transcript of Wrapping third-party analytical services for caBIG
![Page 1: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/1.jpg)
Wrapping third-party analytical
services for caBIG
Taverna-caBIG project
Stian Soiland-ReyesAlexandra Nenadic
University of Manchester, UK
http://www.mygrid.org.uk/dev/wiki/display/caGrid
September 2009
![Page 2: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/2.jpg)
Agenda
• Project overview
• Primary goals
• Service selection
• Why these services?
• Why wrapping?
• Wrapping benefits?
• How we did it
• How does it work
• Architecture
• UML models
• Example client and outputs
• Project info
![Page 3: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/3.jpg)
Project overview
• Taverna-caBIG cooperation on several levels:
1. caGrid-enabling third party analytical services
2. Taverna Workbench enhancements for:
• Semantic search of caBIG services
• Invocation of caBIG services from Taverna workflows
• Support for secure caBIG services (interacting with GAARDS infrastructure prior to service invocation)
• This presentation addresses caGrid-enablement of third party analytical services (wrapping + achieving silver level of compatibility)
![Page 4: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/4.jpg)
Primary goals
• Identify two publicly available analytical
services currently accessible through
Taverna
• Wrap, i.e. caGrid-enable, the services:
• Design the wrapper services in UML and
semantically describe/annotate them
using caBIG’s tooling (EA + SIW)
• Wrap/implement and deploy them as
standard caBIG services on caGrid
(Introduce)
![Page 5: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/5.jpg)
Analytical service selection
• Services have been selected in collaboration with caBIG Workflow Working Group, lead by Juli Klemm
• Winners:
• NCBI BLAST service hosted by EBI (European Bioinformatics Institute)
• Protein and nucleotide sequence similarity search service
• InterProScan service hosted by EBI
• Scans a range of protein signatures in InterPro warehouse against a protein sequence
![Page 6: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/6.jpg)
Why these services?
• Freely available
• Highly reliable, hosted by EBI
• Widely used by the scientific community
• Can be combined with existing caBIG tools in biologically meaningful workflows
•caBIO, GridPIR, etc.
![Page 7: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/7.jpg)
NCBI BLAST service
• A popular sequence similarity search tool using local sequence alignment
• Supports sequences of proteins, DNA, RNA
• Searches sequences in a whole range of databases:
• UNIPROT, NCBI, EMBL, etc.
• SOAP web service hosted by EMBL-EBI
![Page 8: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/8.jpg)
InterProScan service
• InterPro warehouse integrates various databases of protein domains and functional sites
• Searches the InterPro warehouse using protein signature recognition methods, e.g. blastprodom, gene3d, hmmpfam, hmmsmart, scanregexp, profilescan..
• SOAP web service hosted by EMBL-EBI
![Page 9: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/9.jpg)
Why wrapping the services?
• Original services use various data formats for inputs/outputs (although xml)
• Does not conform to the caBIG compatibility rules
• Output format was not even compatible with input format
• The requirement for the wrapped service:
• Translate the input data from caBIG-compatible xml to xml format understood by analytical services
• Convert the received results back to a format understood by caBIG clients
![Page 10: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/10.jpg)
NCBI BLAST Output (Untranslated)<?xml version="1.0"?><EBIApplicationResult xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/ApplicationResult.xsd"><Header>...</Header><SequenceSimilaritySearchResult> <hits total="1"> <hit number="1" database="uniprot" id="WAP_RAT" ac="P01174" length="137"description="Whey acidic protein OS=Rattus norvegicus GN=Wap PE=1 SV=2"> <alignments total="1"> <alignment number="1"> <score>763</score> <bits>298</bits> <expectation>8e-80</expectation> <identity>100</identity> <positives>100</positives> <querySeq start="1" end="137">MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ</querySeq> <pattern>MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ</pattern> <matchSeq start="1" end="137">MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ</matchSeq> </alignment> </alignments> </hit></hits></SequenceSimilaritySearchResult></EBIApplicationResult>
![Page 11: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/11.jpg)
InterProScan Output (Untranslated)<EBIInterProScanResults xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/InterProScanResult.xsd"> <Header>...</Header><interpro_matches> <protein id="uniprot|P01174|WAP_RAT" length="137" crc64="1C2E8ADA9FD97949" > <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> <child_list><rel_ref ipr_ref="IPR008198"/></child_list> <contains><rel_ref ipr_ref="IPR002098"/></contains> <classification id="GO:0030414" class_type="GO"> <category>Molecular Function</category> <description>protease inhibitor activity</description> </classification> <match id="G3DSA:4.10.75.10" name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score="9.899996308397199E-5" status="T" evidence="Gene3D" /> </match> <match id="PF00095" name="WAP" dbname="PFAM"> <location start="30" end="72" score="6.30000254573025E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score="1.59999889349247E-14" status="T" evidence="HMMPfam" /> </match> </interpro> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197"> ...</interpro> </protein></interpro_matches></EBIInterProScanResults>
![Page 12: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/12.jpg)
Motivational workflow
This Taverna workflow uses both Blast and InterProScan which can be replaced with wrapped versions of the services
Nested workflow that internally invokes InterProScan and checks job status before fetching results
Nested workflow that internally invokes NCBI BLAST and checks job status before fetching results
Web Service that looks up protein sequences in a database. Will be replaced with the caBIG service caBIO.
Shim that splits a stringinto a list of Fasta strings
http://www.myexperiment.org/workflows/230
![Page 13: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/13.jpg)
Benefits of wrapped services
• Making analytical services from other service providers available to caBIG users
• Wrapped services are caBIG Silver Level compatible:
• Ensures shared meaning and interoperability between these and other caBIG services
• Data can be exchanged and understood between services
![Page 14: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/14.jpg)
How we wrapped the services (1)
• Making the services ‘silver’ encompassed:
1. Modelled data in UML using Enterprise Architect (EA)
2. Exported model to XMI from EA
3. Using the SIW tool, the XMI file has been semantically annotated using caBIG’s vocabularies/ontologies
4. Common Data Elements (CDEs) have been generated for services inputs/outputs, reviewed by the curation team and loaded into caDSR production database
5. Annotated XMI loaded back to the EA to update UML
![Page 15: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/15.jpg)
How we wrapped the services (2)
6. From the EA, the UML model was exported to a set of xsd files
7. The xsd files have been imported into the Introduce tool, which was used to generate the skeleton APIs of the wrapped services
8. Axis 2 was used to invoke the original InterPro and NCBI BLAST services from the wrapper services
9. The wrapped services are asynchronous; job status and results are available as WSRF resource properties and can be subscribed to using WS-Notifications. There is also a synchronous version where polling is done from the client side.
![Page 16: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/16.jpg)
How it works
• Client: using client library, calls wrapped WSRF web service
• Service: convert input to original format, submit converted input to original service, return a Job Resource that references the jobID
• Client: Subscribe to notifications from job resource
• Job Monitor (server): For all jobs, check status using jobID, notify client on completion
• Client library: Request output data
• Job Resource: Convert data from original format,Return converted data to client
![Page 17: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/17.jpg)
Architecture of wrapped services
![Page 18: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/18.jpg)
UML model of wrapped NCBI BLAST
![Page 19: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/19.jpg)
UML model of wrapped InterProScan
![Page 20: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/20.jpg)
Reused several data elements
•Green classes in diagram reused from IRWG
•Sequence, NucleicAcidSequence
•DatabaseCrossReference
•GeneGenomicIdentifier et al.
•Red UML classes in diagram reused from PIR
•ProteinSequence
•Partial reuse of attributes in ProteinDomainLocation
![Page 21: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/21.jpg)
Example client NCBI Blast
NCBIBlastClient client = new NCBIBlastClient(url);NCBIBlastInput input = new NCBIBlastInput();ProteinSequenceRepresentation sequenceRepresentation = new ProteinSequenceRepresentation();ProteinGenomicIdentifier proteinId = new ProteinGenomicIdentifier();proteinId.setDataSourceName("uniprot");proteinId.setCrossReferenceId("wap_rat");sequenceRepresentation.setProteinId(proteinId);input.setSequenceRepresentation(sequenceRepresentation);NCBIBlastInputParameters params = new NCBIBlastInputParameters();params.setEmail("[email protected]");params.setQueryDatabase(new MolecularSequenceDatabase("", "uniprot"));params.setBlastProgram(BLASTProgram.BLASTP);input.setNcbiBLASTInputParameters(params);NCBIBlastClientUtils clientUtils = new NCBIBlastClientUtils(client);NCBIBlastOutput ncbiBlastOut = clientUtils.ncbiBlastSync(input, TIMEOUT_SECONDS * 1000);SequenceSimilarity[] similarities = ncbiBlastOut.getSequenceSimilarities();for (SequenceSimilarity similarity : similarities) { for (Alignment align : similarity.getAlignments()) { SequenceFragment querySequenceFragment = align.getQuerySequenceFragment(); System.out.print("Q: " + querySequenceFragment.getSequence().getValue()); (..)
data
id
![Page 22: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/22.jpg)
Example SOAP input NCBI Blast
<service:NcbiBlastRequest xmlns:service="http://www.mygrid.org.uk/2009/cagrid/servicewrapper/service/NCBIBlast" xmlns="gme://Taverna-caGrid.caBIG/1.0/uk.org.mygrid.cagrid.domain.ncbiblast" xmlns:irwg="http://www.mygrid.org.uk/2009/cagrid/servicewrapper/imported/IRWG" xmlns:common="gme://Taverna-caGrid.caBIG/1.0/uk.org.mygrid.cagrid.domain.common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <service:nCBIBlastInput> <NCBIBlastInput> <ncbiBLASTInputParameters> <blastProgram>BLASTP</blastProgram> <email>[email protected]</email> <queryDatabase> <common:name>uniprot</common:name> <common:description /> </queryDatabase> </ncbiBLASTInputParameters> <sequenceRepresentation xsi:type="irwg:ProteinSequenceRepresentation"> <irwg:proteinId> <irwg:crossReferenceId>wap_rat</irwg:crossReferenceId> <irwg:dataSourceName>uniprot</irwg:dataSourceName> </irwg:proteinId> </sequenceRepresentation> </NCBIBlastInput> </service:nCBIBlastInput></service:NcbiBlastRequest>
data
reused
id
![Page 23: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/23.jpg)
Example client output NCBI BlastRunning NCBI Blast client
uk.org.mygrid.cagrid.servicewrapper.service.ncbiblast.example. ExampleNCBIBlastClient -url <service url> -- Using default service at http://cagrid.taverna.org.uk:8080/wsrf/services/cagrid/NCBIBlastCalling NCBI Blast synchronously (Set -DGLOBUS_LOCATION=/Users/bob/cagrid/ws-core-4.0.3 to do asynchronous client calls)Found 50 similaritiesSimilarity in uniprot:WAP_RAT (sequence length:137) 1 alignments Alignment score=763.0 bits=298.0 eValue=1.0E-79 Q: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ 1-137 P: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ M: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA AGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ 1-137Similarity in uniprot:Q3UQ94_MOUSE (sequence length:140) 1 alignments Alignment score=465.0 bits=183.0 eValue=4.0E-45 Q: MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIA A-GPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVI 1-134 P: MRC ISLVLGLLALEVALA+NL+E VFNSVQSM S E TECI CQTNEECAQN MCCP SCGR+ KTPVNI V KAG CPWN +QMI+ + GPCP CS D +CSG MKCC C+M+C P P+ ++I M: MRCLISLVLGLLALEVALAQNLEEQVFNSVQSMFPKASPIEGTECIICQTNEECAQNAMCCPGSCGRTRKTPVNIGVPKAGFCPWNLLQMIS STGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPPVPEVWSII 1-134
data id
![Page 24: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/24.jpg)
Example SOAP output NCBI Blast<NCBIBlastOutput xmlns:xsd="http://www.w3.org/2001/XMLSchema" ...>
<sequenceSimilarities> <accessionNumber>P01174</accessionNumber> <description>Whey acidic protein OS=Rattus norvegicus GN=Wap PE=1 SV=2</description> <sequenceLength>137</sequenceLength> <sequenceId> <irwg:crossReferenceId>WAP_RAT</irwg:crossReferenceId> <irwg:dataSourceName>uniprot</irwg:dataSourceName> </sequenceId> <alignments> <bits>298.0</bits> <eValue>1.0E-79</eValue> <identity>100</identity> <positives>100</positives> <score>763.0</score> <sequenceSimilarityPattern>MRCSISLVLGLLALEVAL..ISFQ</sequenceSimilarityPattern> <matchSequenceFragment> <end>137</end> <start>1</start> <sequence> <irwg:value>MRCSISLVLGLLALEVAL..ISFQ</irwg:value> <irwg:valueInFastaFormat xsi:nil="true" /> </sequence> </matchSequenceFragment> <querySequenceFragment> <end>137</end> <start>1</start> <sequence> <irwg:value>MRCSISLVLGLLALEVAL..ISFQ</irwg:value> <irwg:valueInFastaFormat xsi:nil="true" /> </sequence> </querySequenceFragment> </alignments> </sequenceSimilarities> datareuse
did
![Page 25: Wrapping third-party analytical services for caBIG](https://reader036.fdocuments.net/reader036/viewer/2022062517/568133d2550346895d9ac8cd/html5/thumbnails/25.jpg)
Project info
•On gForge: https://gforge.nci.nih.gov/projects/taverna-cagrid/
•On myGrid wiki: http://www.mygrid.org.uk/dev/wiki/display/caGrid/Home
•Source and documentation available via Subversion: https://gforge.nci.nih.gov/svnroot/taverna-cagrid/