Globus: Where Are We Going For the Next 10 Years? Jennifer M. Schopf Argonne National Laboratory UK...
-
Upload
alexa-fuller -
Category
Documents
-
view
217 -
download
0
Transcript of Globus: Where Are We Going For the Next 10 Years? Jennifer M. Schopf Argonne National Laboratory UK...
Globus:Where Are We Going For the
Next 10 Years?
Jennifer M. Schopf
Argonne National Laboratory
UK National eScience Centre
Globus: Where Are We Going Next?
A Pragmatic Viewpoint
Jennifer M. Schopf
Argonne National Laboratory
UK National eScience Centre
Grids
You’ve all just spent the last 4 days learning about hands on approaches to Grids
By now you should know a little about several different ways to use Grid technology
But we all know what we have isn’t enough, and if you talk to current users you’ll hear this as well
Globus Futures
Users Virtual Environments Higher level services
Security – GridShib Handle
Scheduling GridWay WorkFlow with Chimera/Pegasus
Data Open everything
I’m Going To TalkAbout High Level Concepts
If you want to know what is planned for each individual component – check out their roadmaps!
bugzilla.globus.org – search by project name, and component “roadmap”
Users and Applications
Growing and growing Not every application is well suited to
Grids, those that are Collaborate Share data or resources
Today’s applications may not be as willing to be cutting edge as the old standby’s
User Communities
Traditionally, we started with the physicists Hard core users (heroic users) Large computational problems Already had strong national and
international collaborations This is growing and changing as
understanding of how the resources can be used to further science and research are better understood
CeSC (Cambridge)
The eScience Centres
CeSC (Cambridge)
The High Energy Physicists
EGEEATLASCMSD0StarQCDLattice GridGridPP
Bio-Medical Community
CeSC (Cambridge)
CancerGrideDiamondmyGridIntegrative BiologyMouse Atlas
CeSC (Cambridge)
e-Science Institute
Support Services
Grid Operations
SupportCentre
DigitalCurationCentre
National eScience Centre
NationalGrid
Service
CeSC (Cambridge)
New Application-Focused Centres
Arts and Humanitiese-Science
Support Centre
NationalCentre fore-SocialScience
National Institute
forEnvironmental
e-Science
We Need the Users
Too many tools have been built without regard to what a user needs and wants
If you’re building tools Talk to users, early and often
If you’re a user- Tell the toolmakers what you like and don’t
like e constructive Offer to alpha test
Virtualization
Vision of the Grid: Plug in and get the services you need Just like electricity Doesn’t matter what resource is supplying
it, or where it is, just use the “juice” Concrete example, a use might ask..
“Run my job, finish by lunch” “Get a data set that has these attributes” “Tell me when that simulation will finish”
Where are we today?
“Run my job, finish by lunch” becomes Run my job on this exact machine With these data files transferred in using
this protocol I think it will take 2 hours, the queues have
been slow lately, so I should make sure I send this off by 9am, or earlier if I want to be safe in having results for 2
Where are we today?
“Get a data set that has these attributes” becomes Given a set of attributes, give me a set of
logical file names Given those, map them to physical file
names Given physical placements of the file, figure
out which one is easiest to access Copy the file to my machine
Where are we today (cont)
General agreement we have basic functionality Tell me what this set of resources look like Run this job on that resource Transfer this file Globus (among others) does give these
basic building blocks (mostly) General agreement general functionality
isn’t enough by far
Virtualization is happening General concept of service-oriented grids is being
accepted Service Level Agreements (SLAs) are coming into
place- this is a first step Higher-level tools to separate user from basic
resources are more common
We need to be careful to virtualize what a user is comfortable with Hide security hassles Tell users where their job is running in case of
failures
Virtual Workspace Project Kate Keahey, ANL Virtual Workspace
Abstraction of an execution environment Dynamically available to authorized clients
Abstraction captures Resource quota for execution environment on
deployment CPU or memory share)
Software configuration aspects of the environment OS installation or provided services
Workspace Service allows a Grid client to dynamically deploy and manage workspaces
Workspaces Workspaces can be implemented and deployed in many
ways Boot images Virtual machines Simply dynamically provide access to already deployed
workspaces by creating Unix accounts on the fly. Our infrastructure focuses on the deployment and
management of virtual machines Describe and environment using the workspace meta-data Deploy it on a specified resource quota Environments that can be deployed in this way range from
atomic workspaces to clusters and more complex constructs.
Incubator Project: Virtual Workspaces
Dynamic Accounts
Additional work has been done to provide basic services for creating dynamic accounts
Service to allow a Grid client to dynamically assign Unix accounts on a remote resource
Based on PKI credentials and the authorization information they carry
Incubator project: Dynamic Accounts
Higher Level Services
Security – GridShib Handle
Scheduling GridWay WorkFlow with Chimera/Pegasus
Data
GridShib Von Welch, NCSA
Allows the use of Shibboleth-transported attributes for authorization in GT4 deployments And, more generally, SAML support
2 year project started December 1, 2004 Beta software released September 16, 2005
Now an Incubator Project as well
Globus Toolkit Handle SystemIntegration
Frank Seibenelist, ANL The Handle System
CNRI (http://www.handle.net) General-purpose global name service Secure name resolution over the internet
The Handle System-GT Integration Project Leverage the Handle System for identifier and
resolution services Tight integration with GT4’s Web services protocols
Incubator project: gt-hs
GridWay MetaScheduler
Ignacio M. Llorente, Universidad Complutense Madrid
Automatically perform job scheduling steps Provide the runtime mechanisms needed for
dynamically adapting execution Runs over pre-WS GRAM, WS-GRAM, MDS2, MDS4 Basic scheduling, flexible policies
Incubator Project
Chimera “Virtual Data” Captures both logical and
physical steps in a data analysis process.
Transformations (logical) Derivations (physical)
Builds a catalog. Results can be used to
“replay” analysis. Generation of DAG (via
Pegasus) Execution on Grid
Catalog allows introspection of analysis process.
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Sloan Survey Data
Pegasus Workflow Transformation
Converts Abstract Workflow (AW) into Concrete Workflow (CW)
Uses Metadata to convert user request to logical data sources
Obtains AW from Chimera Uses replication data to
locate physical files Delivers CW to DAGman Executes using Condor Publishes new replication
and derivation data in RLS and Chimera (optional)
ChimeraVirtual Data
Catalog
ReplicaLocationService
MetadataCatalog
StorageSystemStorageSystemStorageSystem
ComputeServerCompute
ServerComputeServerCompute
Server
DAGman
Condor
t
Data Services Being Developed
Scheduling GridFTP Allowing GridFTP to control how many
requests come to a server, and how they’re handled
Data Replica Management One tool to locate a replica and transfer it Tools to verify that replicas are the same
Globus Futures
Users Virtual Environments Higher level services
Security Scheduling Data
Open everything Open Source Open Contribution
Why is Globus Software Open Source?
To allow for inspection For consideration in standardization
processes
To encourage adoption In pursuit of ubiquity and interoperability
To encourage contributions Harness the expertise of the community
Open Contribution
But distributing code under an open source license does not guarantee open development!
Open development requires open processes
So we have created dev.globus to facilitate contributions http://dev.globus.org/
Governance Model
Based on Apache Jakarta Individual development efforts organized as
projects Consensus-based decision making
Control over each project in the hands of its most active and respected contributors (committers)
Globus Management Committee (GMC) providing overall guidance and conflict resolution
Common Infrastructure
Code repositories (CVS, SVN) Mailing lists
*-dev, *-user, *-announce, *-commit for every project
Issue tracking (bugzilla) Including roadmap info for future development
Wikis Known interactions for people accessing your
project
Incubator Process in which an outside project becomes part
of Globus Overseen by the Incubator Management Project
(IMP) Outside project proposes itself as a candidate
IMP meet and discuss, and accept project as a ProtoProject
ProtoProject now part of the Incubator framework Get assigned a Mentor to help Opportunity to get up to speed on Globus
Development process Quarterly reviews
Current Incubator Projects
Incubator Management Dynamic Accounts GridShib GridWay gt-hs Metrics OGCE Virtual Data System Virtual Workspaces
Futures Summary
Users Virtual Environments Higher level services Open everything
For More Information
Jennifer M. Schopf [email protected] www.mcs.anl.gov/~jms Support from NeSC/JISC, NSF, DOE
This talk
www.mcs.anl.gov/~jms/Talks (not there yet)