United Kingdom:

34
David Meredith 1 , Stephen Crouch 2 , Peter Turner 3 , Gerson Galang 4 , Ming Jiang 5 , Hung Nguyen 6 1 NGS, Science and Technology Facilities Council, Daresbury Labs, UK, [email protected] 2 OMII-UK, School of Electronics and Comp Sci, University of Southampton, UK, [email protected] 3 University of Sydney, Sydney, Australia, [email protected] 4 Victorian eResearch Strategic Initiative (VeRSI), Victoria, Australia, [email protected] 5 NGS, Science and Technology Facilities Council, Daresbury Labs, UK, [email protected] 6 University of Sydney, Sydney, Australia, [email protected] United Kingdom: Australi a (DataMIN X) Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs. http://code.google.com/p/dtsproject/

description

Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs. http://code.google.com/p/dtsproject/. David Meredith 1 , Stephen Crouch 2 , Peter Turner 3 , Gerson Galang 4 , Ming Jiang 5 , Hung Nguyen 6 - PowerPoint PPT Presentation

Transcript of United Kingdom:

Page 1: United Kingdom:

David Meredith1, Stephen Crouch2, Peter Turner3, Gerson Galang4, Ming Jiang5, Hung Nguyen6

1NGS, Science and Technology Facilities Council, Daresbury Labs, UK, [email protected], School of Electronics and Comp Sci, University of Southampton, UK, [email protected]

3University of Sydney, Sydney, Australia, [email protected] 4Victorian eResearch Strategic Initiative (VeRSI), Victoria, Australia, [email protected]

5NGS, Science and Technology Facilities Council, Daresbury Labs, UK, [email protected] 6University of Sydney, Sydney, Australia, [email protected]

United Kingdom:

Australia (DataMINX)

Towards a loosely coupled and scalable component set for scheduling bulk data copying across different

storage resources as fault tolerant batch jobs.

http://code.google.com/p/dtsproject/

Page 2: United Kingdom:

• An open-source project developing a set of loosely coupled components for efficiently brokering data copies between a wide range of (potentially incompatible) storage resources as schedulable, fault-tolerant batch jobs (ftp, gridftp, srb, irods, sftp, file, webdav, srm?).

• To scale from small embedded deployments to large distributed deployments through an expandable ‘worker-node pool’ controlled through message orientated middleware (MOM, JMS).

• To maximize data access and transfer efficiency through the strategic placement and subscription of worker-nodes at or between particular data sources/sinks.

• To be inherently asynchronous and side-step the bandwidth, concurrency and scalability concerns for clients in networks with limited capability relative to the direct connectivity between the source and sink.

• Aims to address geographical-topological deployment concerns by allowing service hosting to be either centralized (as part of a shared service), or confined to a single institution or domain.

• Adoption of established design patterns and open source components which are coupled with a proposal for an open standards based messaging protocol.

• Employs a single port-type document-centric model, with service semantics defined solely by the message model.

Overview / Aims

Page 3: United Kingdom:

DTS Features / Intentions 1

1. Encourage a common messaging modelWe are engaging with OGF in the definition of an open standard describing a bulk data copy activity with subsequent control and event messages. The aim is to provide a key foundation in addressing the challenges of data management. Ideally standards based; OGF engagement DMI, JSDL, also communications with Globus, Unicore, GridSAM developers (a longer term perspective).

2. Platform independenceIncludes the worker agent that manages a bulk data copy activity, the message broker, the message channel adapters that enable the different transports and protocols, commons VFS.

3. Adopts well recognized Enterprise Integration Patterns Described in Hohpe and Woolf (2003): Competing Consumers, Service Activator, Selective Consumer, Polling Consumer, Message Driven Consumer, Transport Channel Adapter, Header Based Router.

http://www.enterpriseintegrationpatterns.com

Page 4: United Kingdom:

4. Value in the correct framework choice – deploy out of the box features in remoting, scaling, batching:

• Spring Batch; one of the only open source batch processing frameworks currently available (purportedly the only?). It provides many functions that are essential in batch processing.

• Spring Integration; supports the EAI patterns identified by Hohpe and Woolf. Importantly it provides a set of inbound and outbound message-channel-adaptors for different integration options, both polling and message driven adapters (e.g. JMS subscription, file/directory polling, RMI, WS, email)

• Message broker (e.g. Apache ActiveMQ or any JMS 1.2 message-channel MOM broker).

DTS Features / Intentions 2

Page 5: United Kingdom:

SRB/ FTP

SFTP/ GSIFTP

Client e.g. Portal/Hermes

File operations (list, upload, download, delete, rename)

Get and Put, or Mem buffer Bit pipe

Authentication tokens (un/pw, x509?)

Client provides single interface to different (potentially incompatible) storage resources, e.g. Srb GsiFtp, Ftp, Sftp, iRODS, file, Webdav.

Client brokers between storage resources when third-party transfer is not available.

Buffering data via an intermediary when copying between incompatible resources / protocols

Page 6: United Kingdom:

Benefits

1. Auth tokens only in memory on one computer.

2. Self contained and interactive.

3. Extensible for new and emerging resources/protocols.

Challenges

1. Software is required that is capable of enacting a data copying activity between a variety of sources and sinks (bit pipe via byte streams or combined get/put).

2. The client must be constantly available throughout the duration of the transfer.

3. Buffering of large quantities of data introduces bandwidth and concurrency concerns for clients residing on networks with limited capability (e.g. wireless connectivity) relative to the direct connectivity between the source and sink.

Client-Side Intermediary

Page 7: United Kingdom:

DTS – Remotely Placed Worker Agents

Aim: Strategically place intermediary software agent(s) (e.g. at different institutions, within a network, at a local source/sink) and remotely invoke an appropriate agent using a message router with a ‘Bulk Data Copy Activity’ executed as a fault tolerant batch process. Best practice: process data as close to where it resides as possible.

3 Core DTS Components:

•Batch/Worker Agent. Software that will mange a bulk data copy activity. Is a batch operation – automated processing of large volumes of information that is most efficiently processed without user interaction (fire + forget).

•Common Message format that describes a data copying activity with subsequent control and event messages.

• Lists data sources and sinks. • Transfer requirements. • User credentials.

•Message Broker/Router for routing of messages to appropriate workers and scaling via the Competing Consumer pattern .

So that the recipient worker can access the data on behalf of the user.

Page 8: United Kingdom:

Clients

DTS workers

Queue Channel

Meta-data system or data catalogue (ICAT) that provides list of data URLs and credentials. OR lightweight file operations directly interacting with source/sink (list, delete, rename)

Data copy: Get/Put or Bit pipe

Authentication tokens (un/pw, myproxy details)

DTS Architecture (Simplified)

Source Sink

Data copy activity message.

Broker between remote sources and sinks

Page 9: United Kingdom:

Clients

Facility / Department Y

Facility Queue

DTS Architecture (Simplified)

Source/Sink

Source/Sink

Facility / Department X

Home Lab

Message Bus is a combination of a messaging infrastructure, a common data model and command set to allow different systems to communicate through a shared set of interfaces (our message channels).

http://www.enterpriseintegrationpatterns.com

Broker between local source and remote sink (and vice-versa)

Page 10: United Kingdom:

Deployment Strategies

Small– Local or embedded worker agentMed – Single worker pool

Large – Multiple worker pools and message router

Page 11: United Kingdom:

DMQ

ControlQ

ReplyQ

JobQb

e

s

c

s

e

s

c WNClient (Service Activator)

P

Source

Sink

Source

Sink

P

1) Lightweight local worker deployment. The worker agent is invoked by a script or is integrated into an existing application. S = Submit message (bulk copy activity document), C = Control message, e = Event message.

2) Distributed deployment with a single worker pool.

C

C

Worker pool

Page 12: United Kingdom:

DMQ

JobQa

ReplyQ

JobQb

AJobQ

BJobQ

JobQ Router

HTTPSs

se

3) Distributed deployment with a multiple worker pools.

P

C

C

C

C

C

Worker pool A

Worker pool B

ControlQ

AControlQ

BControlQControlQ Router

c

Page 13: United Kingdom:

Core Component Message Router / Broker

Schedule and route messages to strategically placed worker agents.

Scale with multiple agents using competing consumer pattern.

Page 14: United Kingdom:

Scaling

How can the architecture scale for increasing loads ?

•Scale Out: Competing Consumer Pattern To scale horizontally (or scale out) means to add more nodes to a system.

•Scale Up: Multi-process Service ActivatorTo scale vertically (or scale up) means to add resources and/or processes

to a single node in a system.

Page 15: United Kingdom:

Scale Out – Competing Consumer Pattern

JMS client(Producer)

• Only requirement is that the JMS client and consumer must be able to access the broker . • This provides location independence which enables scaling and clustering of services since

multiple workers can be configured to pull messages from the same queue.• If the service may become overburden and falls behind in its processing, all that is needed is to

turn-up a few more worker instances to listen to the queue.• Consumers do not have to coordinate with each other which improves resilience, since workers

can be added and removed without affecting each other.

Queue depth ok

Worker (Consumer)

Broker (Queue)

Basic architecture is repeatable – use multiple brokers and queues as required, (e.g. broker clusters, master slave brokers etc).

Page 16: United Kingdom:

Message RoutingHow can the appropriate remote worker(s) be invoked:•How to invoke a worker(s) that resides at the data source and/or sink ?•How to invoke a worker(s) that is installed at my institution or within a specific network ?•How to target a specific worker ?

1.Multiple Destinations2.Message Selectors3.Hybrid Approach

Page 17: United Kingdom:

Message Routing: Multiple Destinations

Request Qa

Request Qb

Request Qc

Group A (Facility A)

Group B(Project B)

Group C(Institution C)

Worker groups

JMS clients

Multiple static/administered queues can be configured on one broker in order to partition workers into different groupings.

Main Advantages: Queue depth is directly related to load. Therefore load balancing can be performed effectively since queues are not polluted with . DTS Should add new queues for different groupings (e.g. project queues, separate queues for different facilities).

Main Disadvantages: Changes are required on the broker to cater for new worker groupings (configuration of new administered queues). This does not provide a high level of decoupling between message producer and consumer since changes are required to the broker.

Broker

In DTS, multiple destinations are used to partition static queue consumer cluster groups, e.g. Request Q per facility, beam-line, project, institution etc.

Page 18: United Kingdom:

Message Selectors - workers can be ‘Selective Consumers‘ and clients can be ‘Specifying Producers’. A message selector is an expression based on SQL92 conditional syntax, e.g.

Facility=‘FacilityX‘ AND BeamLine=‘ProteinMX’ AND WorkerAccessKey=‘abcdefadsf_guuid'

•Filtering is performed by the broker – it delivers only those messages that match the selective consumer’s criteria. •Importantly, workers can therefore decide which messages to process depending on their own selector statements. •Main benefit is that this approach is extensible: provides for a higher level of decoupling between message producer and receiver since clients and workers can be easily added without change to the broker. •Selectors are optional, this pattern can also be combined with multiple destination approach to route messages as required (hybrid approach). Selectors can be used to perform fine-grained routing and route messages however you require, e.g. •Route to first available worker in a particular group that specifies a common/shared selector value, e.g. a common ‘groupID’ AND/OR ‘networkID’ AND/OR ‘facilityGroup’ AND/OR ‘domain’ AND/OR ‘GB limit’ etc…. (SQL).•Can route to a specific worker using a unique and opaque client identifier/access key, e.g. GUUID (this is ok since the broker performs filtering so different workers don’t see each others selectors). Specifying producer would need to persist this value between server re-starts/different sessions.

Selective Consumers

Specifying Producers =

Messages with selection values

Message Routing: Message Selectors

=

=

Request Q

Page 19: United Kingdom:

Message Routing: Hybrid Approach

Best approach is to use a combination of the message filtering approach and the multi-destination approach to suit your service instance requirements.

Each approach is not mutually exclusive and can be used together provided both patterns are catered for in your system.

Request Qa

Request Qb

Page 20: United Kingdom:

Request Response (Client Worker Conversation)

1. ReplyTo header2. Application ID exchange with message filtering3. Temporary queues

Page 21: United Kingdom:

Request Response (Conversation)

Specifying Producers(Clients)

Request message contains a Return Address that indicates where to send the reply. 1.Return Address is added to the message header. 2.Consumer does not need to know where to send the reply, it can just ask the request.

Selective Consumer(Workers)

Reply Channel 1

Reply Channel 2

Reply Channel 1 Reply Channel 2

Request Channel

Variations of this pattern depending on clients requirements:a) Further expand the Message Filtering Approach to Exchange client and worker Application IDs. Client can

also selectively consume response messages with its own client ID added to request header. b) Temporary queue created by the client (lasts only for duration of client session).

Page 22: United Kingdom:

JobSubmitQ

InvokeClientQ

InvokeWorkerQ

Q Consumer Cluster ‘facilityA’

DTS ClientsDTS Workers

Q Consumer Cluster ‘facilityB’

(Exchange of client and worker Application IDs so that recipient worker and client can converse)

DTS Client1

Request Response (Conversation) using Filtering

MDP Selective Consumer Pool on WorkerGroupID = facilityA

JMS Message HeadersMessageID = guuidAWorkerGroupID = facilityAClientID = DTSClient1

MDP Producer Pool Connected to JobSumitQ

1)

MDP Producer Pool Connected to InvokeClientQ

MDP Selective Consumer Pool on ClientID = DTSClient1

JMS Message HeadersCorrelationID = guuidAWorkerID = workerAClientID = DTSClient1

2)

MDP Producer Pool Connected to InvokeWorkerQ

MDP Selective Consumer Pool on WorkerID = workerA

JMS Message HeadersCorrelationID = guuidAWorkerID = workerAClientID = DTSClient1

3)

GridSAM (An App. Bounded to facilityB )

NGS Portal (An App. Bounded to facilityA )

Page 23: United Kingdom:

• Each JMS client (worker and client) has a unique instance/application ID (clientID, workerID).

1. A client sends a job request and adds its own clientID to the headers (in conjunction with the other headers used in message selection, e.g. MessageID and WorkerGroupID).

2. Worker picks up a message and responds to an administered response queue (not a dynamic queue) via the ReplyTo header and itself returns its own WorkerID and forwards the given ClientID in the message header.

3. Client receives messages from the response queue and filters on ClientID.4. Client can now converse with the recipient worker since both the client and worker have

their respective IDs and can correlate messages on the original message ID using CorrelationID.

• Using this approach only requires a limited number of administered queues: e.g. JobSumitQ, InvokeClientQ, InvokeWorkerQ .

• Main benefit is that this approach is extensible: provides for a higher level of decoupling between message producer and receiver since clients and workers easily added without change to the broker.

• Can also combine this approach with multiple channels as required (hybrid approach).

Request Response (Conversation) using Filtering

Page 24: United Kingdom:

Core Component Batch / Worker Agent

Enacts the Bulk Data Copy Activity as a fault tolerant batch job for copying between

sources and sinks. Scopes, checkpoints and restarts.

Page 25: United Kingdom:

Batch / Worker Agent• Role is to enact the data copy activity according to the activity document, report status

events and respond to control messages.

• Copy activity is a batch processing task (automated processing of large volumes of information is most efficiently processed without user interaction).

• DTS worker based on Spring Batch and Commons VFS (contract driven approach facilitates different implementations e.g. scripts / shelling out to command line client).

• Spring Batch provides framework for functions that are essential in batch processing e.g. split/monitor/merge, logging/tracing, tx management, processing statistics, job pause and restart, skip, retry, check-pointing.

A Spring Bach implementation deals with breaking apart the business logic and sharing it efficiently between parallel processes or processors as step-jobs.

http://static.springsource.org/spring-batch/index.html

Page 26: United Kingdom:

Core Component Message Model

Bulk Data Copy Activity Document.Control Messages (stop, start, cancel)

Event Messages (faults, status, instance attributes)

Page 27: United Kingdom:

Message Model Requirements

Document Message •Bulk Data Copy Activity description•Captures all information required to connect to each source and sink URI and subsequently enact the activity. •Transfer requirements e.g. URI Properties, file selectors (reg-expression), scheduling (batch-window), retry count, source/sink alternatives, checksums?, sequential ordering? DAG? •Serialized user credentials. •Probably adopt/extend the Data End Point Reference (DEPR) construct from DMI. A specialized form of WS-Address element which does not mandate any particular URL/transport scheme, multiple <DataLocations/>

Control Messages •Interact with a state/lifecycle model (e.g. stop, resume, cancel)

Event Messages •Standard fault types and status updates

Information Model •To advertise the service capabilities / properties / supported protocols

Page 28: United Kingdom:

Existing/In-Scope Specifications

Related Specifications1. Job Submission Description Language (JSDL)

• An activity description language for generic compute applications. 2. OGSA Data Movement Interface (DMI)

• Low level schema for defining the transfer of bytes between and single source and sink. 3. JSDL HPC File Staging Profile (HPCFS)

• Designed to address file staging not bulk copying. 4. OGSA Basic Execution Service (BES)

• Defines a basic framework for defining and interacting with generic compute activities: JSDL + extensible state and information models.

• Neither fully captures our requirements (this is not a criticism of these specs, they are designed to address their existing use-cases which only partially overlap with the requirements for a bulk data copy activity).

Proprietary• Condor Stork - based on Condor Class-Ads • Glite JDL (again based on a Class-Ads)• Not sure if Globus has/intends a similar definition in its new developments (e.g. SaaS) anyone ?

Page 29: United Kingdom:

JSDL Data Staging 1 and the HPC File Staging Profile<jsdl:DataStaging><jsdl:FileName>fileA</jsdl:FileName><jsdl:CreationFlag>overwrite</jsdl:CreationFlag><jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination><jsdl:Source>

<jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Source>

<jsdl:Target><jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI>

</jsdl:Target> <Credentials> … </Credentials></jsdl:DataStaging>

define both the source and target within the same <DataStaging/> element which is permitted in JSDL.However, the HPC File Staging Profile (Wasson et al. 2008), which is an extension to JSDL, limits the use of credentials to a single credential definition within a data staging element. Often, different credentials will be required for the source and the target.

Page 30: United Kingdom:

<jsdl:DataStaging><jsdl:FileName>fileA</jsdl:FileName><jsdl:FilesystemName>DL_HOME</jsdl:FilesystemName><jsdl:CreationFlag>overwrite</jsdl:CreationFlag><jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination><jsdl:Source>

<jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Source>

<Credentials> … </Credentials></jsdl:DataStaging>

<jsdl:DataStaging><jsdl:FileName>fileA</jsdl:FileName><jsdl:FilesystemName>NGS_HOME</jsdl:FilesystemName><jsdl:CreationFlag>overwrite</jsdl:CreationFlag><jsdl:Target>

<jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Target>

<Credentials> … </Credentials></jsdl:DataStaging>

Coupled staging elements; A source data staging element for fileA and a corresponding target element for staging out of the same file. By specifying that the input file is deleted after the job has executed, this example simulates the effect of a data copy from one location to another through the staging host.

No multiple data locations (alternative sources and sinks). More elements required (e.g. transfer requirements, file selectors, uri properties).Intended for compute and data staging, not really bulk data copying.

JSDL Data Staging 2

Page 31: United Kingdom:

OGSA DMI

The OGSA Data Movement Interface (DMI) (Antonioletti et al. 2008) defines a number of XML constructs for describing and interacting with a data transfer activity.

The data source and destination are each described separately with a Data End Point Reference (DEPRs), which is a specialized form of WS-Address element (Box et al. 2004).

In contrast to the JSDL data staging model, a DEPR facilitates the definition of one or more <Data/> elements within a <DataLocations/> element. This is used to define alternative locations for the data source and/or sink. In doing this, an implementation is then free to select between its supported protocols and retry different source/sink combinations from the available list. This improves resilience and the likelihood of performing a successful data transfer by matching protocols supported by the service.

Page 32: United Kingdom:

DEPR Example <dmi:SourceDataEPR><wsa:Address>http://www.ogf.org/ogsa/2007/08/addressing/none</wsa:Address><wsa:Metadata><dmi:DataLocations><dmi:Data ProtocolUri="http://www.ogf.org/ogsadmi/2006/03/im/protocol/gridftp-v20"DataUrl="gsiftp://example.org/name/of/the/dir/"><dmi:Credentials><wsse:UsernameToken/></dmi:Credentials><other stuff/></dmi:Data><dmi:Data ProtocolUri="urn:my-project:srm"DataUrl="srm://example.org/name/of/the/dir/"><dmi:Credentials><wsse:UsernameToken/></dmi:Credentials><other stuff/></dmi:Data></dmi:DataLocations></wsa:Metadata></dmi:SourceDataEPR>

<dmi:SinkDataEPR> . . . Similar to above but for the sink . . .</dmi:SinkDataEPR>

Defines alternative locations for the data source and/or sink.

Page 33: United Kingdom:

DMI cont..

There are some limitations:

DMI is intended to describe only a single data transfer operation between one source and one sink. To do several transfers, multiple invocations of a DMI service factory would be required to create multiple DMI service instances. We require a single (atomic) message packet that wraps multiple transfers that can be delivery transacted, e.g. through a message routers.

Some of the existing constructs require extension / slight modification.

Therefore: DMI v2 strawman proposal at OGF to canvass some new extensions and to propose a new bulk-copy doc that builds on DMI.

Page 34: United Kingdom:

Bulk Data Copy Doc and JSDL Integration ?<jsdl:JobDefinition> <jsdl:JobDescription> <jsdl:JobIdentification ... /> <jsdl:Application> <!-- Option a) Embed BulkDataCopy document --> <other:BulkDataCopy ... /> <!-- If Basic Profile compliance is important --> <jsdl-hpcpa:HPCProfileApplication><jsdl-hpcpa:Executable>/usr/bin/datacopyagent.sh<jsdl-hpcpa:Executable><jsdl-hpcpa:Argument>‘myBulkDataCopyDoc.xml’</jsdl-hpcpa:Argument> ... </jsdl-hpcpa:HPCProfileApplication> </jsdl:Application> <jsdl:Resources> <!-- Option b) Stage-in BulkDataCopy document --> <jsdl:DataStaging> <jsdl:FileName>myBulkDataCopyDoc.xm</jsdl:FileName> ... </jsdl:DataStaging> </jsdl:Resources> </jsdl:JobDescription></jsdl:JobDefinition>

Possible? options for integrating the proposed <BulkDataCopy/> document within JSDL; a) nesting within the <jsdl:Application/> element or b) staging-in of a <BulkDataCopy/> document as input for the named executable - why not ?