15.6 – Queues 15.6 – Queues 15.6 – javafoundations.Queue 15.7 ...
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National...
-
Upload
amanda-fields -
Category
Documents
-
view
212 -
download
0
Transcript of Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National...
Virtual Batch Queues
A Service OrientedView of “The Fabric”
Rich BakerRich Baker
Brookhaven National LaboratoryBrookhaven National Laboratory
April 4, 2002April 4, 2002
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
2
Fabrics Session of LCG Launch Workshop
Strict Uniformity is ImpossibleStrict Uniformity is Impossible Multiple Implementations Will Exist Even Within a Single Site
Different Economics Drive Different Choices at Different Sites
Expose Services, Not FacilitiesExpose Services, Not Facilities Users Should Expect Uniform Interfaces to Services
Define Boundaries
Site Can’t be a Black Box
Internal View May Vary From Site to Site
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
3
LHC/iVDGL Facilities Workshop
Prototype Batch Queues to be ImplementedPrototype Batch Queues to be Implemented BNL, FNAL, UCSD, JHU
ATLAS, CMS, SDSS
First (Trivial) Implementation – Fully PreconfiguredFirst (Trivial) Implementation – Fully Preconfigured Queue is Described only by Name – Advertise via MDS
Requires User Pre-Knowledge of Queue Details
Evolve Towards More Abstract ImplementationEvolve Towards More Abstract Implementation Advertise Enough Information to Fully Describe Queue
Requires No User Pre-Knowledge
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
4
Job Manager’s View of Computing Element
… … Computing Elements, distributed in possible different Computing Elements, distributed in possible different
administrative domains, can be very different and can rely administrative domains, can be very different and can rely
on different mechanisms, policies, implementations: they on different mechanisms, policies, implementations: they
can be different in hardware, they can run different can be different in hardware, they can run different
operating systems, they can be managed by different local operating systems, they can be managed by different local
resource management systems, they can use different resource management systems, they can use different
authentication and authorization mechanisms, etc… authentication and authorization mechanisms, etc…
These issues will be addressed relying on standard These issues will be addressed relying on standard
protocols: “forcing” the Computing Elements to use protocols: “forcing” the Computing Elements to use
standard protocols ...standard protocols ... (From EDG JSS Architecture and APIs document, July 2001)
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
5
Various Views of a Compute Element
Pre-Grid Paradigm:Pre-Grid Paradigm: User Aware of All Local Resources
Jobs Can (Must) Use Local Configuration/Resources
Condor Standard UniverseCondor Standard Universe Just a CPU – Local Resources Irrelevant
Jobs Can Not Use Local Resources – Inefficiency
Virtual Batch QueueVirtual Batch Queue Advertise CPU Plus Local Resources
Jobs Can Take Advantage – Improved Efficiency
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
6
Some Thoughts
Local Administration of HardwareLocal Administration of Hardware Remote Job Manager Can Not Reinstall OS
Local Monitoring and Security Must Be Respected
Must Advertise Enough Information for Job Must Advertise Enough Information for Job
Manager to Determine SuitabilityManager to Determine Suitability Unchangeable Configuration (OS, etc.)
Licensed Products
Minimum Scratch Space Available
Access Methods for Local Storage
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
7
Additional Considerations
Typical Job Sets Dozens of Environment VariablesTypical Job Sets Dozens of Environment Variables All of these Must be Abstracted and Discoverable
Some Can be Discovered At Job Initiation
Input and Output “Sandboxes” Are Local Directories
Setting “PATH” Requires Information
What Defines a Single “VBQ”?What Defines a Single “VBQ”? Same Unchangeable Environment
Same View of Local (Non-WAN) Storage
APIs for Interactions Between Job and Remote ManagerAPIs for Interactions Between Job and Remote Manager
APIs for Interactions Between Job and Local ManagerAPIs for Interactions Between Job and Local Manager
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
8
For Example
Site May Have Two Different libC VersionsSite May Have Two Different libC Versions
Virtual Queue 1 Advertises libC-xVirtual Queue 1 Advertises libC-x Set Path to Use /usr/libC-x directory
Virtual Queue 2 Advertises libC-yVirtual Queue 2 Advertises libC-y Set Path to Use /usr/libC-y directory
Job Manager “Knows” Which Version User NeedsJob Manager “Knows” Which Version User Needs If x or y, Use Local Installation
If libC-z, no problem! Bring it with you and set path
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
9
“The Big Picture”
Fully Integrated Compile Through ResultsFully Integrated Compile Through Results User Builds Application – Dependencies Tracked (CMT)
Simple User I/F With Portal (Grappa)
Job Manager Learns Job Dependencies
Available VBQs Discovered – “Best” Match Selected
User Environment Deployed (PacMan)
Abstract Job Parameters Mapped to Local Reality
Job Interactions with Local and Remote Managers
Error Handling
Local Clean Up
4 April, 20024 April, 2002R. Baker US ATLAS Grid Testbed WorkshopR. Baker US ATLAS Grid Testbed Workshop
10
Immediate Work for US ATLAS
Develop AFS Free Run-Time EnvironmentDevelop AFS Free Run-Time Environment
Deploy and Test at US ATLAS Test Bed SitesDeploy and Test at US ATLAS Test Bed Sites
Use Trivial Implementation of Queue DescriptionUse Trivial Implementation of Queue Description
Use FNAL, UCSD and JHU for Proof of PortabilityUse FNAL, UCSD and JHU for Proof of Portability
Start to Define/Enumerate DetailsStart to Define/Enumerate Details What Info is Needed to “Fully” Describe a Queue?
How to Take Advantage of Local Resources?