Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National...

10
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Rich Baker Brookhaven National Brookhaven National Laboratory Laboratory April 4, 2002 April 4, 2002

Transcript of Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National...

Page 1: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

Virtual Batch Queues

A Service OrientedView of “The Fabric”

Rich BakerRich Baker

Brookhaven National LaboratoryBrookhaven National Laboratory

April 4, 2002April 4, 2002

Page 2: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

2

Fabrics Session of LCG Launch Workshop

Strict Uniformity is ImpossibleStrict Uniformity is Impossible Multiple Implementations Will Exist Even Within a Single Site

Different Economics Drive Different Choices at Different Sites

Expose Services, Not FacilitiesExpose Services, Not Facilities Users Should Expect Uniform Interfaces to Services

Define Boundaries

Site Can’t be a Black Box

Internal View May Vary From Site to Site

Page 3: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

3

LHC/iVDGL Facilities Workshop

Prototype Batch Queues to be ImplementedPrototype Batch Queues to be Implemented BNL, FNAL, UCSD, JHU

ATLAS, CMS, SDSS

First (Trivial) Implementation – Fully PreconfiguredFirst (Trivial) Implementation – Fully Preconfigured Queue is Described only by Name – Advertise via MDS

Requires User Pre-Knowledge of Queue Details

Evolve Towards More Abstract ImplementationEvolve Towards More Abstract Implementation Advertise Enough Information to Fully Describe Queue

Requires No User Pre-Knowledge

Page 4: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

4

Job Manager’s View of Computing Element

… … Computing Elements, distributed in possible different Computing Elements, distributed in possible different

administrative domains, can be very different and can rely administrative domains, can be very different and can rely

on different mechanisms, policies, implementations: they on different mechanisms, policies, implementations: they

can be different in hardware, they can run different can be different in hardware, they can run different

operating systems, they can be managed by different local operating systems, they can be managed by different local

resource management systems, they can use different resource management systems, they can use different

authentication and authorization mechanisms, etc… authentication and authorization mechanisms, etc…

These issues will be addressed relying on standard These issues will be addressed relying on standard

protocols: “forcing” the Computing Elements to use protocols: “forcing” the Computing Elements to use

standard protocols ...standard protocols ... (From EDG JSS Architecture and APIs document, July 2001)

Page 5: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

5

Various Views of a Compute Element

Pre-Grid Paradigm:Pre-Grid Paradigm: User Aware of All Local Resources

Jobs Can (Must) Use Local Configuration/Resources

Condor Standard UniverseCondor Standard Universe Just a CPU – Local Resources Irrelevant

Jobs Can Not Use Local Resources – Inefficiency

Virtual Batch QueueVirtual Batch Queue Advertise CPU Plus Local Resources

Jobs Can Take Advantage – Improved Efficiency

Page 6: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

6

Some Thoughts

Local Administration of HardwareLocal Administration of Hardware Remote Job Manager Can Not Reinstall OS

Local Monitoring and Security Must Be Respected

Must Advertise Enough Information for Job Must Advertise Enough Information for Job

Manager to Determine SuitabilityManager to Determine Suitability Unchangeable Configuration (OS, etc.)

Licensed Products

Minimum Scratch Space Available

Access Methods for Local Storage

Page 7: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

7

Additional Considerations

Typical Job Sets Dozens of Environment VariablesTypical Job Sets Dozens of Environment Variables All of these Must be Abstracted and Discoverable

Some Can be Discovered At Job Initiation

Input and Output “Sandboxes” Are Local Directories

Setting “PATH” Requires Information

What Defines a Single “VBQ”?What Defines a Single “VBQ”? Same Unchangeable Environment

Same View of Local (Non-WAN) Storage

APIs for Interactions Between Job and Remote ManagerAPIs for Interactions Between Job and Remote Manager

APIs for Interactions Between Job and Local ManagerAPIs for Interactions Between Job and Local Manager

Page 8: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

8

For Example

Site May Have Two Different libC VersionsSite May Have Two Different libC Versions

Virtual Queue 1 Advertises libC-xVirtual Queue 1 Advertises libC-x Set Path to Use /usr/libC-x directory

Virtual Queue 2 Advertises libC-yVirtual Queue 2 Advertises libC-y Set Path to Use /usr/libC-y directory

Job Manager “Knows” Which Version User NeedsJob Manager “Knows” Which Version User Needs If x or y, Use Local Installation

If libC-z, no problem! Bring it with you and set path

Page 9: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

9

“The Big Picture”

Fully Integrated Compile Through ResultsFully Integrated Compile Through Results User Builds Application – Dependencies Tracked (CMT)

Simple User I/F With Portal (Grappa)

Job Manager Learns Job Dependencies

Available VBQs Discovered – “Best” Match Selected

User Environment Deployed (PacMan)

Abstract Job Parameters Mapped to Local Reality

Job Interactions with Local and Remote Managers

Error Handling

Local Clean Up

Page 10: Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.

4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop

10

Immediate Work for US ATLAS

Develop AFS Free Run-Time EnvironmentDevelop AFS Free Run-Time Environment

Deploy and Test at US ATLAS Test Bed SitesDeploy and Test at US ATLAS Test Bed Sites

Use Trivial Implementation of Queue DescriptionUse Trivial Implementation of Queue Description

Use FNAL, UCSD and JHU for Proof of PortabilityUse FNAL, UCSD and JHU for Proof of Portability

Start to Define/Enumerate DetailsStart to Define/Enumerate Details What Info is Needed to “Fully” Describe a Queue?

How to Take Advantage of Local Resources?