Taverna workflows in caGrid
description
Transcript of Taverna workflows in caGrid
Taverna workflowsin caGrid
caGrid Architecture Face-to-face meeting
Stian Soiland-Reyes & Aleksandra Nenadic, myGridUniversity of Manchester, UK
Boston, 2009-05-11http://www.mygrid.org.uk/dev/wiki/display/caGrid
Agenda– What is a Taverna Workflow?– Abstract caGrid workflow example– Actual Taverna workflow– caGrid plugin for Taverna– Current work– Where do we go next?
What is a Taverna workflow?• Set of services (web services, RESTful, local scripts, other
workflows, etc)
• Set of data links between services - “put output X from service A as input Y to service B”– If needed: List handling, control links
• This can be called a data-oriented workflows (dataflow)– Say where you want the data to flow instead of what
you want to do
– Compare with more procedural workflow languages like BPEL
• Beneficial way of thinking for much data-driven scientific research
Abstract caGrid workflow
• Use (parts of) result to query GridPIR and caBIO data services for matching sequences
• Query the CPAS data service to find protein sequence
Actual Taverna workflow• Looks very similar
to abstract workflow
• Introduces shim services to build and parse data elementsOrange: Local scripts
to parse the description string and
build CQL queries
Purple: Build/parse complex type for web service input/output
Blue: Constant CQL query
Green: caGrid WSDL services
http://www.myexperiment.org/workflows/752
caGrid plugin for Taverna (1)• Discover/browse
services registered in the caGrid Index Service
• Easy to install into Taverna:
• Listing all services:
caGrid plugin for Taverna (2)
• …or by semantic search:
Current work by myGrid & caGrid• Develop Taverna support for GAARDS-secured caGrid
services• Wrap existing 3rd party services (that are used by
existing Taverna users) for caGrid and annotate them to match Silver-level compatibility guidelines
• Taverna workflow as a caGrid service• Service discovery improvements• Documentation, building example workflows
Real example: Lymphoma type prediction• Scientific value
– Using gene-expression patterns associated with DLBCL and FL to predict the lymphoma type of an unknown sample.
– Using SVM (Support Vector Machine) to classify data, and predicting the tumor types of unknown examples.
• Main steps– Query training data from experiments
stored in caArray– Preprocess (normalize) the microarray data.– Add training and testing data into SVM
service to get classification results
*Fig. from MA Shipp. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002
Lymphoma type prediction workflow
Classify & predict
Query
Preprocess
Wei Tanhttp://www.myexperiment.org/workflows/746
Lymphoma type prediction results
The (few) classification errors are highlightedAcknowledgements:
Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI)Jared Nedzel (MIT), Wei Tan
Where do we go next?
• Just some ideas..– Tighter integration with caDSR– Partial rerun of workflows– Improve Taverna’s support for complex XML types– Workflow sharing– Workflows in caGrid portal– Guided workflow building using caGrid metadata– Easily build CQL queries from Taverna
• Google Summer of Code 2009
Any questions..?