Post on 14-Jan-2016
Taverna Workbench
Stuart OwenUniversity of Mancester, UK
stuart.owen@manchester.ac.uk
What is a workflow
• Data workflows– A task is invoked once its expected
data has been received, and when complete passes any resulting data downstream.
– B starts when it receives data from A.– C and D run in parallel when they
receive data from B– E starts once its received data from
both C and D.
• Control workflows– A task is invoked once its dependant
tasks have completed.– B starts when A has completed.– C and D run in parallel once B has
completed– E starts once both C and D have
completed.
A
B
C D
E
F
Advantages of workflows
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Advantages to workflows
• High-level abstraction– Easier to understand and modify.– Easier to describe and discuss
with others.– Describes what you want to do,
not how to do it.
• Automation
• Sytematic
• Sharing and re-use– Either on its own, or within other
workflows!
Workflows within Taverna
• Predominantly based around the flow of data, but does allow control constraints as well.
• Service oriented workflows. Services may or not be grid enabled.
• High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow.
• Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems.
Taverna 1.4 Workbench
• Integral part of the myGrid project
• Java based, runs on Windows, Mac OS, Linux, Solaris
• Open source and user driven development
• Taverna in OMII-UK– Dedicated team of developers focused on design,
implementation, testing and support – leading to production quality software.
– Development of Taverna 2.0
Taverna 1.4 workbench
Freefluo Workflow enactor
Scufl + Workflow Object Model
Processor Processor
WebService
Soap
lab
Processor
LocalApp
Processor
Enactor
TavernaWorkbench
Processor
BioMOBY
Processor
?
SCUFL
Application data flow layerScufl graph + service introspection
Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates
Processor invocation layer
Workflow Execution
(Simple Conceptual Unified Flow Language)
Nested workflows
• A processor can be a workflow itself.
• Encourages the reuse of workflows within a more complex scenario.
• Greater abstraction of an overall process making it more manageable.
Iterations
• Scufl handles iterations implicitly• i.e. Taverna handles it automagically, theres no need for the user to
indicate that there is an iteration required.• Taverna recognises the data mismatch and repeatedly runs the task
over each data element in the list.
• Iteration stategy with multiple inputs can be configured.
•“Cross product” - all against all
•“Dot product” – first against first, second against second ….. etc
What about when a service fails?
• Most services are owned by other people• No control over service failure• Some are research level• Workflows are only as good as the services they
connect!• To help - Taverna can:• Notify failures• Instigate retries• Set criticality• Substitute alternative • services
Provenance Data?
• Supports scientific method and best practice
• Metadata about the origin of a resource (workflow , service, data , experiment hypothesis etc) and the process of how a resource was generated.
• The Who? , What? , When? ,Where? and Why? about resources.
• Stored as RDF triples
• Also available as OWL, opening it up to complex reasoning
Provenance Record
Result Result Result Result Result
Input
Typed Workflow Run
urn:lsid:..:wfInstance:8
runs
launchedBy
Experimenter
belongsTo
Organization
urn:lsid:…:org:HY7
ProcessRunWorkflowRun Workflow
Provenance Ontology
runs
launchedBy
belongsTo
executed
urn:lsid:…:person:4
urn:lsid:…:workflow:6
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
executed
executed
Provenance Browser
New plans for Taverna 2.0
Evolving challenges
• Long running data intensive workflows
• Manipulation of confidential or otherwise protected information
• Use with classical grid systems
• Publishing and sharing of workflows
• Better use of provenance
Runtime Service Binding
• Service definition consists of an abstract description
• Resolved at workflow runtime to one or more concrete resources by a broker
• Allows load balancing or economic model based service selection over grid environments
Processor Dispatch Stack
3rd party data transfers
• Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine and
data provider– Allows restricted access to sensitive data
• Automatic de-reference when a reference type is linked to a value type within a workflow.
Streaming Data
• Allow execution of downstream workflow stages on partially complete results from upstream.
Service 1 Service 2 Service 3
Non streaming (Taverna 1), entire iteration must complete at each stage
Streamed data, Service 2 starts operating on partial results from Service 1
Conclusions
• Taverna and its source code is free to download.– http://taverna.sourceforge.net
• Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy.
• Open architecture and support for plugins to cope with open world – allows expansion into other areas
• User driven development– Taverna users mailing list– Taverna hackers mailing list
• Production quality software within OMII-UK
Acknowledgements
• The myGrid group, past and present.• OMII-UK• All our users
• Carole Goble• Katy Wolstencroft• Daniele Turi• Matthew Gamble• Tom Oinn• Paul Fisher