Taverna 2 in Pictures

11
Tom Oinn, [email protected], BOSC2007, 19 th July

description

Title: Taverna 2 in Pictures Author: Tom Oinn

Transcript of Taverna 2 in Pictures

LSID

URL

File

…?

Reference

Scheme Plugins

(extension point)

DataManager instance

Has unique namespace

within peer group

Locational Context

Configuration

LSID

• No context required?

URL

• Local network name, subnet mask

File

• File system name and mount point

…?

• Whatever you need here

Access through DataManager

interface locally, DataPeer remotely

Data Document

Identifier with namespace

LSID reference

File reference

Zero or more ref scheme instances

pointing to identical immutable data

…..

List (depth)

Identifier with namespace

Depth, List of child IDs

Error

Identifier with namespace

Depth, Detail

Stores

list1

List2Leaf1

Leaf2

List3 leaf3

Example nested list structure :

Leaf3[1,0] List3[1] Leaf2[0,1] Leaf1[0,0] List2[0] List1[]

Appears on data link as :

•Downstream process filters on the event depth it needs:

•If the minimum depth is too high it iterates, discarding all

but the finest grained events

•If the maximum depth is too low it wraps in a new single

element collection, discarding all but the root event

•Identifiers in the boxes are those from the previous slide

Processors (or, more accurately, service proxies) can now emit results piece by piece

Sensor proxy that can emit a temperature reading / cell count / image every ten seconds

Database query that returns rows one row at a time from the data server

Management of collection events is handled by the framework

DispatchLayer implementation

Job Queue &

Service List

Single Job & Service List

Fault

Message

Result

Message

Data and error messages from

layer below

Job specification messages from

layer above

Taverna 2 opens up the per-processor dispatch logic.

Dispatch layers can ignore, pass unmodified, block, modify or act on any message and can

communicate with adjacent layers.

Each processor contains a single stack of arbitrarily many dispatch layers.

Single dispatch layer Dispatch layer composition allows

for complex control flow within a

given processor.

DispatchLayer is an extensibility

point.

Use it to implement dynamic

binding, caching, recursive

behaviour…?

• Ensures that at least ‘n’ jobs are pulled from the queue and sent to the layer below

• Reacts to faults and results by pulling more jobs off the queue and sending them down, passing the fault or result message back up to the stack manager

Parallelize

• Responds to job events from above by storing the job, removing all but one service from the service list and passing the job down.

• Responds to faults by fetching the corresponding job, rewriting the original service set to include only the next service and resending the job down. If no more services are available propagate the fault upwards

• Responds to results by discarding any failover state for that job

Failover

• Responds to jobs by storing the job along with an initial retry count of zero

• Responds to faults by checking the retry count, and either incrementing and resending the job or propagating the fault message if the count is exceeded

Retry

• Responds to jobs by invoking the first concrete service in the service list with the specified input data

• Sends fault and results messages to the layer above

Invoke

This dispatch stack

configuration

replicates the current

Taverna 1 processor

logic in that retry is

within failover and

both are within the

parallelize layer.

Layers can occur

multiple times, you

could easily have

retry both above

and below the

failover layer for

example.

‘Service’ in this case means ‘Taverna 2 proxy to something we can invoke’ – name might

change!

Service invocation is asynchronous by default – all AsynchronousService implementations should

return control immediately and, ideally, use thread pooling amongst instances of that type.

Results, failure messages are pushed to an AsynchronousServiceCallback object which also

provides the necessary context to the invocation :

DataManager

• Resolve input data references

• Register result data to get an identifier to return

SecurityManager

• Provides a set of security agents available to manage authentication against protected resources

Provenance Connector

• Allows explicit push of actor state P-assertions to a connected provenance store for invocation specific metadata capture

Message Push

• Used to push fault and result messages back to the invocation layer of the dispatch stack

Set of credentials

Policy engine

Policy

Client

Service

Security Agent

In this scenario the

agent is discovered

based on the service, a

message is passed to

the agent to be signed

and that message

relayed.

Credentials never leave

the agent!

Taverna 2 combines data managers, workflow enactors and security agents into transient

collaborative virtual experiments within a peer group. These groups can be shared and

membership managed over time and can persist beyond a single workflow run.

Set of credentials

Policy engine

Policy

DMSet of

credentials

Policy engine

Policy

DM DMEnactor

Peer group (i.e. JXTA group) – Virtual Experiment Session

User 1 User 2External

Services

External Data

Stores i.e. SRB

P3

Define a workflow as nested boundaries of control.

Each boundary pushes its identifier onto an ID stack on data entering it and pops it when exiting.

When a new ID is created the controlling entity registers with a singleton monitor tree, attaching to

the parent identified by the path defined by the previous value of the ID stack on that data.

P1

P2

Q1

WF1

WF2

WF1_1

P1

P2

P3WF2_1 Q1

WF2_2 Q1

Iteration over nested

workflow here…

Each node defines a set of properties.

If a property is mutable it can be used to steer the

enactment.

Properties could include parallelism setting, service

binding criteria, current job queue length, queue

consumption, number of failures in the last minute…

Due December 2007 in ‘visible to end user’ form.

This release will probably not include everything, espsteering agents and virtual experiment management.

Early tech preview real soon now [tm]

Complete code rewrite, current status is around 90% complete on enactor and data manager core.

Java code in CVS on sourceforge, project name is ‘taverna’, CVS module is ‘t2core’

Licensed under LGPL at present

Hands on session later if anyone’s interested?

Investigators

•Matthew Addis, Andy Brass, Alvaro Fernandes, Rob Gaizauskas, Carole Goble, Chris Greenhalgh, Luc Moreau, Norman Paton, Peter Rice, Alan Robinson, Robert Stevens, Paul Watson, Anil Wipat

Postgraduates

•Tracy Craddock, Keith Flanagan, Antoon Goderis, Alastair Hampshire, Duncan Hull, Martin Szomszor, Kaixuan Wang, Qiuwei Yu, Jun Zhao

Pioneers

•Hannah Tipney, May Tassabehji, Medical Genetics team at St Marys Hospital, Manchester, UK; Simon Pearce, Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK; Doug Kell, Peter Li, Manchester Centre for Integrative Systems Biology, UoM, UK; Andy Brass, Paul Fisher, Bio-Health Informatics Group, UoM, UK, Simon Hubbard, Faculty of Life Sciences, UoM, UK

Funding and Industrial

•EPSRC

•Wellcome Trust

•OMII-UK

•Dennis Quan, Sean Martin, Michael Niemi (IBM), Mark Wilkinson (BioMOBY)

Core Research and Development

•Nedim Alpdemir, Pinar Alper, Khalid Belhajjame, Tim Carver, Rich Cawley, Justin Ferris, Matthew Gamble, Kevin Glover, Mark Greenwood, AnanthKrishna, Matt Lee, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, ArijitMukherjee, Tom Oinn, Stuart Owen, Juri Papay, Savas Parastatidis, Matthew Pocock, Stefan Rennick-Egglestone, Ian Roberts, Martin Senger, Nick Sharman, StianSoiland, Victor Tan, Franck Tanoh, Daniele Turi, Alan R. Williams, David Withers, Katy Wolstencroft and Chris Wroe

Please see http://www.mygrid.org.uk/wiki/Mygrid/Acknowledgements for most up to date list

Additional T2 thanks to Matthew Pocock, Thomas Down & David DeRoure amongst others!