Post on 12-Jan-2016
Grid Computing and Middleware
Shawn Malhotra
Monday, February 5th, 2007
Overview
Background and definition Importance of middleware Globus Toolkit Sample Applications
What is Grid Computing?
Computing model that leverages the power of many networked resources
Not just CPUs Storage devices, special equipment (i.e. telescope)
Share resources across administrative domains Requires security features Different than traditional cluster computing
Programmer sees a single ‘virtual computer’ Web ↔ Information as Grid ↔ Computing Power
Why is Grid Computing Important?
Helps solve computationally expensive problems Flexible enough to handle many small problems
Share costly resources amongst institutions Federally funded research labs / academic institutions
Make resources available to anybody Cost barrier is lowered ‘Pay as you go’ type service Increases overall bandwidth
Motivation for Middleware
Need robust, efficient ways to pool resources Previous ‘ad-hoc’ methods not sufficient Need for standardization! Distributed Computing System (DCS)
Developed at the University of California at Irvine Early 1970s Focus on CPU management
Poor security solution Abandoned in the 1980s
Globus Toolkit
Broader scope, more complete solution CPU Management Storage Management Monitoring Services More details to come …
Most popular grid computing framework Implements several standards
OGSA, WSRF, SOAP, WSDL
Globus Toolkit - Overview
Facilitates grid application development Open, extensible, flexible, high abstraction
Job Submission
GRAM interface Grid Resource Allocation and Management
Specify resource requirements and flow Uniform way to submit remote jobs
Translate request for local resources Offers a variety of features
Retrieve job status Send job signals (kill, start, restart)
Uses Web services interface
Job Scheduling
What happens after the job is submitted? Submitted to a scheduler Queues jobs decides where/when to run
Requirement matching, priority systems, etc. Abstracts resources from user
Pool heterogeneous resources together Can have multiple layers of scheduling
Local schedulers vs. Metaschedulers
Security
Access to resources must be controlled Grid Security Infrastructure (GSI) Provides basic security constructs
Certificate-based PKI system Supports single sign-on over the grid Supports delegation
Access control left to individual services Infrastructure provides necessary info and control
Uses Web services interface
Other Provided Modules
Data management Facilitates file transfer, access to data stores
Monitoring and discovery APIs to get status, subscribe to content Important since ‘grid’ is never down, only
components Collaboration tools
Facilitates person-to-person collaboration Build web portals for chat, e-mail, etc.
Example Applications
What can you build with such a toolkit? Applications range from the depths of the sea
to the stars above! LOOKING deep sea research Condor batch computing infrastructure BIRN medical resource pooling LEAD meteorological data NVO virtual observatory
http://www.cs.wisc.edu/condor
Workload management system Queuing, scheduling, prioritization, monitoring
Pool desktops into batch system Use when idle, auto-detect when busy again
ClasAd mechanism Novel way to match resources with requests
Flocking Seamless combination of multiple networks
http://lookingtosea.ucsd.edu/
Make tools / data related to oceanography available to all researchers
‘20,000 Terabits Beneath the Sea’ Presented at iGrid2005 Real-time high definition deep sea video Monitor active underwater volcanoes
http://www.nbirn.net/
Resource pooling Tools for research and
diagnoses
Collaboration Common user interface
Better hypotheses testing Use a distributed patient
population
https://portal.leadproject.org/gridsphere/gridsphere
Sharing meteorological resources Algorithm Development and Mining (ADaM)
Works on observational data Provides analysis tools
ARPS Data Assimilation System (ADAS) Provides visualization tools
Earth Science Markup Language (ESML) Uniform way of expressing data
Data Access Systems Allow uniform access to distributed data
http://www.us-vo.org/
Expose the vast amount of astronomical data for all to use Telescopes will produce 7 petabytes per year by
2012 Standardized way of expressing data
VOTable Creation of tools to produce required data
ConeSearch Make accessing data like using real tools
http://grid.globalwatchonline.com/epicentric_portal/site/GRID/
The WISDOM Project
Analyze potential anti-malaria drugs Focus lab tests on promising compounds Uses up to 5000 computers in 27 countries Simulate drug interaction with malaria protein
Test 80,000 drugs per hour, 140 million in total Shows the power of collaboration
Many computers borrowed from particle physics simulator in the UK – GridPP
Shared spare capacity
Grid Computing – The Future
Currently the domain of ‘Big Science’ Make it more mainstream for ‘Little Science’ Technology is not the barrier
Evolution of the standards Continued enhancement of the toolkit
Better front-end design Promote peer-to-peer collaboration
Security is still a challenge
Summary
Grid computing is a powerful collaborative computing model
Grid computing requires efficient, fully featured middleware to thrive
Grid computing enables research and development that is not possible in isolation
References
Globus site http://www.globus.org/
Wikipedia http://en.wikipedia.org/wiki/Grid_computing
Grid Café http://gridcafe.web.cern.ch/gridcafe/
The Need for Grid Solutions
Grids are essential to sustain Moore’s Law as physical limitations will eventually limit what individual computing stations can achieve
It will become less necessary as individual resources become more powerful since technology grows faster than the complexity of our research
The Corporate Barrier
True grid computing will never be embraced by corporations due to security issues and sensitivity of data. This will limit the scope and power of the technology
Much like Web 2.0 has caused a shift in corporate presence on the internet, a ‘Grid 2.0’ will eventually force corporations to embrace this technology
Grid Middleware
Middleware designed to manage a grid will eventually merge with software designed to handle multiple CPUs on one motherboard to form a common solution.
Grid computing is far too different from multi-CPU processing to ever offer a common solution.
Expanding User Base
Development of a good middleware solution that abstracts most details of the grid will bring grid computing to ‘Little Science’ and eventually individual users.
The complexity of grid computing and lack of demand will prevent grid computing from ever becoming part of the main stream.