how Shibboleth can work with job schedulers to create grids to support everyone
description
Transcript of how Shibboleth can work with job schedulers to create grids to support everyone
how Shibboleth can work with job schedulers to
create grids to support everyone
Exposing Computational Resources Across Administrative Domains
H. David Lambert, Stephen Moore, Arnie Miles, Chad La Joie, Brent Putman, Jess Cannata
Large amounts of computing power goes untapped, yet researchers cannot typically find computing power.
Resource owners must set policies for the use of their equipment.
Users must find and leverage resources that apply to their needs.
The Paradox of Grid Computing
Secure grid-like installations are not growing beyond small groups of known players.
but....WHY?The only method currently available for ensuring security of a resource involves personal interaction between resourceowners and resource consumers.
Enabling a user or resource to access a resource requires manually adding user to a local map file.
Various methods of grouping users and resources to share certificates have sprung up.
On the other hand
Grids that encourage resource owners to connecttheir machines to a central portal that only allows
specific efforts to run have exploded.
S.E.T.I.United Devices Grid.org
IBM's World Community Grid
What does this mean?
Historically, getting massive quantities of resourceson the grid has been a challenge.
However, in situations where the potential resource ownersare relieved of heavy administrative burdens,
resource owners flock to the grid.
When massive numbers of resources are madeavailable to researchers, real work gets
accomplished.
How are jobs executed?
Modern Job Scheduling software include:Condor
Sun Grid Engine (N1)PBS (Pro and Open)
LSF Platform
Job scheduling software is unsurpassed in environmentswhere there is only one administrative domain.
Beowulf ClustersHigh Performance n-way devices
Unfortunately, as soon as you begin to cross any sort ofadministrative line, these products become less robust.
Intra-Campus gridsInter-Campus grids
Attempts to leverage existing grid tools to handle this have resulted incompromises.
Groups of users sharing one certificate.
User management issues.
Accounting issues.
In general, job scheduling software accepts a job description filethat describes the work to be done.
Job file is free form text, containing name-value pairs.
We can therefore add anything we want to these files, as long as weteach the execution machines to understand.
# Example condor_submit input file# (Lines beginning with # are comments)Universe = vanillaExecutable = /home/arnie/condor/my_job.condorInput = my_job.stdinOutput = my_job.stdoutError = my_job.stderrArguments = -arg1 -arg2InitialDir = /home/arnie/condor/run_1Queue
Example Submission file (Condor)
Condor in the Beowulf, Supercomputer,or campus Grid world.
Universe = vanillaExecutable = /home/arnie/condor/my_job.condorInput = my_job.stdinOutput = my_job.stdoutError = my_job.stderrArguments = -arg1 -arg2InitialDir = /home/arnie/condor/run_1Queue
User has an account onthe cluster or HP device,all nodes are in a closelycontrolled administrativedomain.
Schedd
Collector
Negotiator
Central Manager
(CONDOR_HOST)
Collector
Negotiator
Pool-Foo Central Manager
Collector
Negotiator
Pool-BarCentral Manager
SubmitMachine
Condor Grid with Flocking
“Flocks” are introduced to each other byhostname or IP address.
Job Scheduling with Conventional “Grid” Products:Globus and Condor-GUser submits
job viaGlobus enabledversion of Condor.
Any number of resources “on the grid” accept jobs from Globus Gatekeeper and are distributed to Globus JobManagers to be distributed to resources.
Each resource must physically map a Globus x.509 certificate to a local user account.
User and Resources Management Problems
How does the owner of a grid resource grant access to large numbers of individuals?
Summary of Limitations from Previous Examples
How does the owner of a grid resource know when a usergranted access by membership in an organization leaves that
organization?
How does a user easily get added to a resource?
How does a user find available resources?
SAML based solutions provide secure access to attributes about a
user to a resource to become a powerful partner to existing batch
job schedulers.
While Condor was already able to leverage user attributes from a local LDAP store, this project
demonstrates the first time that Condor can consume user attributes from a remote store.
LDAP
DB
Shib/Condor Portal
CondorSchedd
CondorSchedd
Job ClassAd
Resource ClassAd
User at Site 'A' Resource at Site 'B'
WAYF
IdP
Running Job
111
10
8
9
7
4
6
3
2
5
4
Condor Startd
What we are doing now with Shibboleth, LDAP, and CondorUser at Site 'A' is aware of a Resource at Site 'B' and Owner
of Resource 'B' has granted access to Site 'A'.
We leverage the free-textjob submission files to
add attributes from SAMLto our jobs.
Now, Resource owners can grant access to users based upon their attributes instead of their identities.
Management of users is again the responsibility of the local administration, as it should be.
When Resource Owners can easily set policieswithout worrying about user management
and group memberships, they will become willing to attach
their resources to this new computationalGrid.
Intelligent Resource Management
Users have their own policy decisions to make:Processor type, Operating System Type, executable location,data location, memory requirements, etc.
In the perfect world, Users will have multiple Resources to choose from.These Resources will have different configurations that can match the User policy requirements.
These varied Resources will also have an ever-changing availability!
An Intelligent Resource Management System will allow users to launchjobs from their portal and trust that the work will be sent to the Resourcethat not only correctly matches the User's job policy, but has the leastload on it.
This will be done without the User being aware of where the work willbe executed.
This solution will be scheduler agnostic.
Identity Provider
Job Submission
Client
User Job File
Resource DiscoveryNetwork
Company “A”
University “B”
ResourceDiscovery
Network Node
ResourceDiscovery
Network Node
ResourceDiscovery
Network Node
Scheduler
Scheduler
Scheduler
Scheduler
RunningJob
RunningJob
RunningJob
RunningJob
Example of Intelligent Agent
Acknowledgments
Georgetown University:Charlie LeonhardtSteve MooreArnie MilesChad La JoieBent PutmanJess Cannata
University of Wisconsin:Miron LivnyTodd TannenbaumIan Alderman
Internet2: Ken Klingenstein, Mike McGill