Taverna 2 in Pictures
-
Upload
bosc -
Category
Technology
-
view
2.375 -
download
1
description
Transcript of Taverna 2 in Pictures
Tom Oinn, [email protected], BOSC2007, 19th July
LSID
URL
File
…?
Reference
Scheme Plugins
(extension point)
DataManager instance
Has unique namespace
within peer group
Locational Context
Configuration
LSID
• No context required?
URL
• Local network name, subnet mask
File
• File system name and mount point
…?
• Whatever you need here
Access through DataManager
interface locally, DataPeer remotely
Data Document
Identifier with namespace
LSID reference
File reference
Zero or more ref scheme instances
pointing to identical immutable data
…..
List (depth)
Identifier with namespace
Depth, List of child IDs
Error
Identifier with namespace
Depth, Detail
Stores
list1
List2Leaf1
Leaf2
List3 leaf3
Example nested list structure :
Leaf3[1,0] List3[1] Leaf2[0,1] Leaf1[0,0] List2[0] List1[]
Appears on data link as :
•Downstream process filters on the event depth it needs:
•If the minimum depth is too high it iterates, discarding all
but the finest grained events
•If the maximum depth is too low it wraps in a new single
element collection, discarding all but the root event
•Identifiers in the boxes are those from the previous slide
Processors (or, more accurately, service proxies) can now emit results piece by piece
Sensor proxy that can emit a temperature reading / cell count / image every ten seconds
Database query that returns rows one row at a time from the data server
Management of collection events is handled by the framework
DispatchLayer implementation
Job Queue &
Service List
Single Job & Service List
Fault
Message
Result
Message
Data and error messages from
layer below
Job specification messages from
layer above
Taverna 2 opens up the per-processor dispatch logic.
Dispatch layers can ignore, pass unmodified, block, modify or act on any message and can
communicate with adjacent layers.
Each processor contains a single stack of arbitrarily many dispatch layers.
Single dispatch layer Dispatch layer composition allows
for complex control flow within a
given processor.
DispatchLayer is an extensibility
point.
Use it to implement dynamic
binding, caching, recursive
behaviour…?
• Ensures that at least ‘n’ jobs are pulled from the queue and sent to the layer below
• Reacts to faults and results by pulling more jobs off the queue and sending them down, passing the fault or result message back up to the stack manager
Parallelize
• Responds to job events from above by storing the job, removing all but one service from the service list and passing the job down.
• Responds to faults by fetching the corresponding job, rewriting the original service set to include only the next service and resending the job down. If no more services are available propagate the fault upwards
• Responds to results by discarding any failover state for that job
Failover
• Responds to jobs by storing the job along with an initial retry count of zero
• Responds to faults by checking the retry count, and either incrementing and resending the job or propagating the fault message if the count is exceeded
Retry
• Responds to jobs by invoking the first concrete service in the service list with the specified input data
• Sends fault and results messages to the layer above
Invoke
This dispatch stack
configuration
replicates the current
Taverna 1 processor
logic in that retry is
within failover and
both are within the
parallelize layer.
Layers can occur
multiple times, you
could easily have
retry both above
and below the
failover layer for
example.
‘Service’ in this case means ‘Taverna 2 proxy to something we can invoke’ – name might
change!
Service invocation is asynchronous by default – all AsynchronousService implementations should
return control immediately and, ideally, use thread pooling amongst instances of that type.
Results, failure messages are pushed to an AsynchronousServiceCallback object which also
provides the necessary context to the invocation :
DataManager
• Resolve input data references
• Register result data to get an identifier to return
SecurityManager
• Provides a set of security agents available to manage authentication against protected resources
Provenance Connector
• Allows explicit push of actor state P-assertions to a connected provenance store for invocation specific metadata capture
Message Push
• Used to push fault and result messages back to the invocation layer of the dispatch stack
Set of credentials
Policy engine
Policy
Client
Service
Security Agent
In this scenario the
agent is discovered
based on the service, a
message is passed to
the agent to be signed
and that message
relayed.
Credentials never leave
the agent!
Taverna 2 combines data managers, workflow enactors and security agents into transient
collaborative virtual experiments within a peer group. These groups can be shared and
membership managed over time and can persist beyond a single workflow run.
Set of credentials
Policy engine
Policy
DMSet of
credentials
Policy engine
Policy
DM DMEnactor
Peer group (i.e. JXTA group) – Virtual Experiment Session
User 1 User 2External
Services
External Data
Stores i.e. SRB
P3
Define a workflow as nested boundaries of control.
Each boundary pushes its identifier onto an ID stack on data entering it and pops it when exiting.
When a new ID is created the controlling entity registers with a singleton monitor tree, attaching to
the parent identified by the path defined by the previous value of the ID stack on that data.
P1
P2
Q1
WF1
WF2
WF1_1
P1
P2
P3WF2_1 Q1
WF2_2 Q1
Iteration over nested
workflow here…
Each node defines a set of properties.
If a property is mutable it can be used to steer the
enactment.
Properties could include parallelism setting, service
binding criteria, current job queue length, queue
consumption, number of failures in the last minute…
Due December 2007 in ‘visible to end user’ form.
This release will probably not include everything, espsteering agents and virtual experiment management.
Early tech preview real soon now [tm]
Complete code rewrite, current status is around 90% complete on enactor and data manager core.
Java code in CVS on sourceforge, project name is ‘taverna’, CVS module is ‘t2core’
Licensed under LGPL at present
Hands on session later if anyone’s interested?
Investigators
•Matthew Addis, Andy Brass, Alvaro Fernandes, Rob Gaizauskas, Carole Goble, Chris Greenhalgh, Luc Moreau, Norman Paton, Peter Rice, Alan Robinson, Robert Stevens, Paul Watson, Anil Wipat
Postgraduates
•Tracy Craddock, Keith Flanagan, Antoon Goderis, Alastair Hampshire, Duncan Hull, Martin Szomszor, Kaixuan Wang, Qiuwei Yu, Jun Zhao
Pioneers
•Hannah Tipney, May Tassabehji, Medical Genetics team at St Marys Hospital, Manchester, UK; Simon Pearce, Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK; Doug Kell, Peter Li, Manchester Centre for Integrative Systems Biology, UoM, UK; Andy Brass, Paul Fisher, Bio-Health Informatics Group, UoM, UK, Simon Hubbard, Faculty of Life Sciences, UoM, UK
Funding and Industrial
•EPSRC
•Wellcome Trust
•OMII-UK
•Dennis Quan, Sean Martin, Michael Niemi (IBM), Mark Wilkinson (BioMOBY)
Core Research and Development
•Nedim Alpdemir, Pinar Alper, Khalid Belhajjame, Tim Carver, Rich Cawley, Justin Ferris, Matthew Gamble, Kevin Glover, Mark Greenwood, AnanthKrishna, Matt Lee, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, ArijitMukherjee, Tom Oinn, Stuart Owen, Juri Papay, Savas Parastatidis, Matthew Pocock, Stefan Rennick-Egglestone, Ian Roberts, Martin Senger, Nick Sharman, StianSoiland, Victor Tan, Franck Tanoh, Daniele Turi, Alan R. Williams, David Withers, Katy Wolstencroft and Chris Wroe
Please see http://www.mygrid.org.uk/wiki/Mygrid/Acknowledgements for most up to date list
Additional T2 thanks to Matthew Pocock, Thomas Down & David DeRoure amongst others!