Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University...
-
Upload
beverly-stewart -
Category
Documents
-
view
219 -
download
0
Transcript of Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University...
![Page 1: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/1.jpg)
Distributed Grid Computing at ISIS using the Grid MP
System
Tom Griffin, ISIS Facility & University of Manchester / UMIST
![Page 2: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/2.jpg)
What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive
tasks
• Break large jobs into smaller chunks
• Send these chunks out to (distributed) machines
• Distributed machines do the work
• Collate and merge the results
![Page 3: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/3.jpg)
Spare Cycles Concept
• Typical PC usage is about 10%
• Most PCs not used at all after 5pm
• Even with ‘heavily used’ (Outlook, Word, IE)
PCs, the CPU is still grossly underutilised
• Everyone wants a fast PC!
• Can we use (“steal?”) their unused CPU cycles?
• SETI@home, World Community Grid (www.
worldcommunitygrid.org)
![Page 4: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/4.jpg)
• Toolkit e.g. COSM• Low level toolkit – source code level integration
• So time consuming work, for each application
• Entropia DC Grid• Trial run at ISIS two years ago. Some success
• Company bought out and in limbo (?)
• United Devices Grid MP• What we’re currently using
• Quite expensive
• Condor• Free (academic research project)
• In our experience 2 yrs ago, not reliable with Windows
Possible Software Implementations
![Page 5: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/5.jpg)
The United Devices System• Server hardware
• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients
• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs
•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc
• Management Console• Web browser based• Can manage services, jobs, devices etc
![Page 6: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/6.jpg)
Visual Introduction to the Grid
![Page 7: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/7.jpg)
Installing and Deploying the System• Servers
• Complete set up in under 3 hours
• Virtually self maintaining
• Clients• Windows only so far
• MSI Installer
• approx 20 seconds
• SMS
• MP Agent User
• Install to other OSs looks straightforward
![Page 8: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/8.jpg)
• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?
Suitable / Unsuitable Applications
![Page 9: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/9.jpg)
• Program
• Job
• Jobstep
• Data Set
• Data
• Workunit
• Client
Objects within the Grid
![Page 10: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/10.jpg)
1) Think about how to split your data and merge results
2) Wrap and upload your executable
3) Write the application service• Pre and Post processing
4) Use the Grid
• Fairly easy to write
• Interface to grid via Web Services
• So far used: C++, Java, Perl, C# (any .Net language)
How to write Grid Programs
![Page 11: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/11.jpg)
• Executable + any dlls etc
• Standard data files
• Compression
• Encryption
• Capture screen output
• Set Environmental Variables
• Command Line
Wrapping Your Executable
![Page 12: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/12.jpg)
• Pre-processing1) Partition data
2) Package data partitions
3) Log in to the Grid server
4) Create a Job and Job Step
5) Create a Data Set
6) Create Datas and upload data packages
7) Create Workunits
8) Set the Job running
• Post-Processing1) Retrieve results
2) Merge results
Application Service
![Page 13: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/13.jpg)
Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction dataParametric problem
• e.g. vary parameters such as acceptance ratio, to scan a 3D grid
• each run completely independent of any other
• Send one run to each machine on the grid
Example Application: HMC
![Page 14: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/14.jpg)
• Unchanged exe
• User edits or creates an appropriate settings file
• User runs “my” HMC submit program• Splits bat file into one line per machine
• Uploads chunks to the Grid server• Grid server distributes Workunits to clients
• User monitors the job with their web browser
• Clients return results to the Grid server
• User runs HMC retrieve program• Downloads results
Running HMC on the Grid
![Page 15: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/15.jpg)
• Split the batch file into lines
• Create a dataset (to hold our data)
• Package data (command line and zmatrix files etc)
• Associate data with dataset
• Upload data packages to Grid server
• Create Workunits from the dataset
• Create a Job to hold the Workunits
More on HMC Submit…
![Page 16: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/16.jpg)
Yet more…• Program written in C++
• Uses C++ classes to ‘hide’ SOAP calls
dsHMC.data_set_gid = mgsi->createDataSet(dsHMC);
ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException){ SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0)));
ud::uuid retval; response.GetReturnValue() >> retval; return retval;
}
• Auto generated by ‘Axis C++’ from WSDL file
• Also a C++ HTTPs file transfer program
![Page 17: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/17.jpg)
• Linear: 50 devices ≈ 50 times faster
• Affected by size of Workunit– Overhead for distribution is ≈ 1minute– Risk of device being switched off
Performance
![Page 18: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/18.jpg)
Example 2: MD Manager• Molecular Dynamics simulation(s)
• Program written in C#• Generated from WSDL (and modified) C# classes to hide
SOAP
• Wrote generic C# HTTP file transfer classes
• ‘Interactive’ program
• Typical runtime ~10 hours per single
simulation
• Need to investigate ‘grids’ of simulations
![Page 19: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/19.jpg)
IHG
FED
CBA
IHG
FED
CBA A B C
D E F
G H I
• But in 3-dimensions
• and with ‘ordering restrictions’
• plus a post processing stage
Temperature
Pressure
![Page 20: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/20.jpg)
![Page 21: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/21.jpg)
• Johnson & Johnson
• Novartis
• GSK
• National Physical Laboratory
• Accelrys
• IBM
• World Community Grid• http://www.worldcommunitygrid.org/
• Currently the Human Proteome Folding project
Who Else Does This?
![Page 22: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/22.jpg)
• Technical Problems• Mercifully few!
• Main issue has been RAM thresholding (now resolved)
• Encryption of certain files causes a problem
• Support• So far been very good
• Responses to queries always next day (time difference) and always insightful• Ease of setup / maintenance• Installed and fully running in ~3 hours
• Next to no maintenance required, other than backup
Problems Encountered & Support
![Page 23: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/23.jpg)
• Easiest thing to blame
• Too abstract for some users (no big box)• Stealing my cycles
• Expansion leads to political problems
‘Social’ Issues
![Page 24: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/24.jpg)
• Expansion• Proposal accepted for an additional 400 licenses
• Giving us a total of 480
• Change in licensing model
Future Developments - Expansion Upgrade to 280
Licences
Upgrade tounlimited licences
for 1 year
MP Insight
UnlimitedLicences forever
480 Permanentlicences
Completed
Funded
Seeking funding
$50k
$45k
$50k
$83k
• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD
• Total ≈ $250k
![Page 25: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.](https://reader036.fdocuments.net/reader036/viewer/2022062422/56649e915503460f94b96096/html5/thumbnails/25.jpg)
• Grid is here and running smoothly
• Easy to use
• Excellent performance
• Vast amount of compute power available
• Future looks good
Summary