First Steps in the Clouds
Kate Keahey
University of Chicago
Argonne National Laboratory
Virtual Workspaces: http//workspace.globus.org
Why Clouds? Resource consumers
Individual users or Virtual Organization Requirements
Customized environments for their services/applications Services/applications can be short-lived New environments/services deployed quickly and often
Resource providers Own and operate physical resources Requirements
Ability to monitor and control their resources Provide resources at reasonable operational cost Protection from activities performed by resource consumer
Consumers need to be able to lease (potentially for short-term) platforms that they can customize and control
Cloud Computing for Grid Communities:
The STAR Application Use Case
Virtual Workspaces: http//workspace.globus.org
The STAR Application
Complex experimental application codes Developed over more than 10 years, by more than 100 scientists,
comprises ~2 M lines of C++ and Fortran code www.star.bnl.gov
Require complex, customized environments Rely heavily on the right combination of compiler versions and
available libraries Dynamically load external libraries depending on the task to be
performed Environment validation
To ensure reproducibility and result uniformity across environments Why do we need a cloud?
Resources with the right configuration are hard to find A VM-based cloud gives us the required control
Virtual Workspaces: http//workspace.globus.org
Running STAR in a Cloud First Challenge: finding VM-enabled resources
Amazon Elastic Compute Cloud (EC2) More Challenges:
Can we use X.509 certs to submit to a cloud? Can we use Grid access protocols? How much manual configuration do we need to do for a cluster that we need for 4 hours? How do we integrate the cluster into the Grid infrastructure?
Workspace Service X.509 certificates are mapped to a project account Grid access protocols Creating a virtual cluster dynamically
Contextualization (cluster context): the cluster node VMs find out about each other and integrate that information at boot time
Integrating the cluster into the Grid Contextualization (grid context): cluster is configured with appropriate
host certs, gridmapfiles, etc.
Virtual Workspaces: http//workspace.globus.org
Running jobs : 300Running jobs : 300
PDSF
Fermi
VWS/EC2 BNL
Running jobs : 230
Running jobs : 150Running jobs : 50
Running jobs : 150
Running jobs : 300Running jobs : 282Running jobs : 243Running jobs : 221Running jobs : 195Running jobs : 140Running jobs : 76Running jobs : 0
Running jobs : 200Running jobs : 50
Running jobs : 150Running jobs : 142Running jobs : 124Running jobs : 109Running jobs : 94Running jobs : 73Running jobs : 42
Running jobs : 195Running jobs : 183Running jobs : 152Running jobs : 136Running jobs : 96Running jobs : 54Running jobs : 37Running jobs : 0Running jobs : 42Running jobs : 39Running jobs : 34Running jobs : 27Running jobs : 21Running jobs : 15Running jobs : 9Running jobs : 0
Running jobs : 0
Job Completion :
File Recovery :
WSU
with thanks to Jerome Lauret and Doug Olson of the STAR projectwith thanks to Jerome Lauret and Doug Olson of the STAR project, presented at CHEP’07
Virtual Workspaces: http//workspace.globus.org
NerscPDSF
EC2(via Workspace
Service)
WSU
Accelerated display of a workflow job state Y = job number, X = job state
with thanks to Jerome Lauret and Doug Olson of the STAR projectwith thanks to Jerome Lauret and Doug Olson of the STAR project, presented at CHEP’07
Virtual Workspaces: http//workspace.globus.org
What Did We Learn?
Performance was not an issue
The real comparison is having a resource to run on vs not having a resource to run on
Contextualization is key for dynamic virtual cluster deployment
Next steps: a more challenging application
Cloud Computing for Grid Providers:
Building the Science Cloud at the University of Chicago
Virtual Workspaces: http//workspace.globus.org
Challenges Virtualization adoption has been relatively slow among
Grid Providers Challenge: integrating VMs into current provisioning
models Integrate into a site without disrupting the current operation
of resources I.e., be able to run jobs as well as VMs
Non-invasive from the perspective of currently used tools E.g., no modification to the currently used schedulers and resource
managers
Can be used alongside the current mode of operation Batch jobs
Represent as small a change as possible Operate within familiar metaphors Avoid error-generating complexity
Virtual Workspaces: http//workspace.globus.org
Roll Your Own Cloud The Workspace Pilot
Operates on resources that can support jobs as well as VMs
E.g., have been booted into Xen domain 0
Non-invasive extension to batch schedulers (e.g., PBS)
Wrappers for submission operation, scheduler signals to operate on VMs
Glidein approach: submits a “pilot program” that prepares a resource slot for VM deployment
E.g., adjusts Xen domain 0 memory
Comes with administrator tools E.g., kill-all
Virtual Workspaces: http//workspace.globus.org
Workspace Pilot in Action
WorkspaceService
LRM/PBS
Xen dom0
Xen dom0
Xen dom0
VM
VMVM
VM
Level 1:provision raw
resources
Level 2:provision VMs
VMs aredecomissioned
raw resourcesare decomissioned
Virtual Workspaces: http//workspace.globus.org
The Pilot Program Uses Xen balloon driver to reduce/restore domain0 memory so
that guest domains (VMs) can be deployed Secure VM deployment
The pilot requires sudo privilege and thus can be used only with site administrator’s approval
The workspace service provides fine-grained authorization for all requests
Signal handling SIGTERM: pilot exceeded its allotted time
Notifies VWS, allows it to clean up After a configurable time period takes things into its hands.
Default policy: one VM per physical node Available for download
Workspace Release 1.3.1: http://workspace.globus.org/downloads/index.html
Virtual Workspaces: http//workspace.globus.org
Nimbus @ UC What is it?
The Science Cloud at University of Chicago UC TeraPort cluster configured with the workspace pilot Currently 16 nodes
What can it do for me? Allow you to “lease out” a cluster of VMs
Who can use it? Members of scientific community
In as much as usage policies will allow
What do I need to do if I want to use it? Contact us: [email protected] You will need a VM image (we can help and know others
who can), a certificate, and a simple client
Virtual Workspaces: http//workspace.globus.org
Cloud Interoperability Moving an app from a hardware platform to a cloud
is relatively hard Need to develop a VM image, learn about cloud
computing, figure our logistics Moving between clouds
E.g., STAR app EC2->Science Cloud and vice versa is very easy Rough consensus on the interfaces needed to provision
resources in the cloud
OGF gridvit-wg Chairs: Erol Bozak, Wolfgang Reichert Define the requirements for integration of Grid
architecture with system virtualization platforms Exploring the impact of virtualization on Grid use
cases Exploring the relationship with standards (DMTF, etc.)
Top Related