The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others,...
-
Upload
alvin-palmer -
Category
Documents
-
view
215 -
download
2
Transcript of The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others,...
![Page 1: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/1.jpg)
The Inferno Grid(and the Reading Campus Grid)
Jon Blower
Reading e-Science Centre
Many others,
School of Systems Engineering, IT Services
http://www.resc.rdg.ac.uk
![Page 2: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/2.jpg)
Introduction
Reading are in early stages of Campus Grid construction Currently consists of two flocked Condor pools
– More of which later
Also experimenting with the Inferno Grid– Condor-like system for pooling ordinary desktops
– Although (like Condor) it could be used for more than this
The Inferno Grid is commercial software but free to UK e-Science community
Secure, low maintenance, firewall-friendly Perhaps not (yet) as feature-rich as Condor
![Page 3: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/3.jpg)
The Inferno operating system The Inferno Grid is based upon the Inferno OS Inferno OS is built from the ground up for distributed computing
– Mature technology, good pedigree (Bell Labs, Pike & Ritchie) Extremely lightweight (~ 1MB RAM) so can run as emulated
application identically on multiple platforms (Linux, Windows, etc) Hence it is a powerful base for Grid middleware Everything in Inferno is represented as a file or set of files
– cf. /dev/mouse in Unix So to create a distributed system, just have to know how to share
“files” – uses a protocol called Styx for this Inferno OS is released under Liberal Licence (free and open source)
for non-commercial use Can run applications in the host OS (Linux, Windows etc) Secure: certificate-based authentication, plus strong encryption built-in
at OS level
![Page 4: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/4.jpg)
The Inferno Grid
Built as an application in the Inferno OS– Hence uses OS’s built-in security and ease of distribution– Can run under all platforms that Inferno OS runs on
Essentially high-throughput computing cf. Condor Created by Vita Nuova (http://www.vitanuova.com) Free academic licence, but also used “for real”:
– Evotec OAI (speeds up drug discovery) 90% utilisation of machines
– “Major government department” modelling disease spread in mammals
– Other major company (can’t say more!) University installations at Reading and York (AHM2004 – created Inferno Grid from scratch easily)
![Page 5: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/5.jpg)
Host OS (Windows, Linux, MacOSX, Solaris, FreeBSD)
Inferno OS (Virtual OS)
Inferno Grid software (a Limbo program)
(Can also run Inferno native on bare hardware)
Could write all applications in Limbo (Inferno’s ownlanguage) and run on all platforms, guaranteed!
![Page 6: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/6.jpg)
Inferno Grid system overview
Matches jobs submitted to abilities of “worker nodes”– The whole show is run by a scheduler machine
Jobs are ordinary Windows/Linux/Mac executables Process is different from that of Condor
– Unless Condor has changed/is changing… In Condor, workers run daemon processes that wait for jobs to be sent to
them– i.e. “scheduler-push”– Requires incoming ports to be open on each worker node
In the Inferno Grid, workers “dial into” the scheduler and ask “have you got any work for me?”
– i.e. “worker-pull” or “labour exchange”– No incoming ports need to be open– Doesn’t poll – uses persistent connections– Studies have shown this to be more efficient (not sure which ones… ;-)
![Page 7: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/7.jpg)
Architecture
Scheduler – listens for job submissions and workers “reporting for duty”
Job submission is via supplied GUI. Could create other apps (command-line, Web interface)
Firewall: singleIncoming port open
Workers can bein differentadmin. domains
Worker firewalls:No incoming portsopen. Single outgoingport open (tofixed, knownserver)
Workers can connectand disconnect atwill.
![Page 8: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/8.jpg)
ControlCreate, start, stop, delete and change job priority with immediate effect
InformationJob description and parameters
Job Administration
StatusDetailed progress report for currently selected job
DisplayAll scheduled jobs with current status and priority
![Page 9: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/9.jpg)
Node Administration
Connected
Not Connected
Dead
Blacklisted
ControlInclude, exclude and delete nodes
Job GroupAssign nodes to individual job groups
InformationSee node operating system & list of installed packages
Power BarSee how much of the grid is being utilised
% available
% in use
At a Glance ViewingQuickly see the current state of the grid with colour coded job ids
Job id Task id
DisplayAll known nodes and current status
![Page 10: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/10.jpg)
Pros and Cons Pros:
– Easy to install and maintain– Good security:
• See next slide
– “Industry quality” Cons:
– Small user base and not-great documentation• Hence learning curve
– Doesn’t have all Condor’s features• E.g. migration, MPI universe, reducing impact on primary users
– No Globus integration yet• But probably not hard to do – JobManager for Inferno?
– Security mechanism is Inferno’s own• But might see other mechanisms in Inferno in future
– Question over scalability (100s of machines, fine: 1000s… not sure)• Inferno Grids don’t “flock” yet
![Page 11: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/11.jpg)
Security and impact on primary users
Only one incoming port on the scheduler needs to be open through the firewall
Nothing runs as root All connections in the Inferno Grid can be authenticated and encrypted
– Public-key certificates for auth, variety of encryption algs– Cert. usage is transparent, user is not aware it’s there– Similar to SSL in principle
Can setup worker nodes to only run certain jobs– So can prevent arbitrary code from being run
Doesn’t have all of Condor’s options for pausing jobs on keyboard press, etc– Runs jobs under low priority
But could set up so that workers don’t ask for work if they are loaded– But what happens to a job that has already started?
![Page 12: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/12.jpg)
Other points
Slow-running tasks are reallocated until whole job is finished. Could fairly easily write different front-ends for Inferno Grid for job
submission and monitoring– Don’t have to use supplied GUI– ReSC’s JStyx library could be used to write Java GUI or JSP
In fact, code base is small enough to make significant customisation realistic– Customise worker node behaviour
“Flocking” probably not hard to do– Schedulers could exchange jobs– Or workers could know about more than one scheduler
Inferno OS can be used to very easily create a distributed data store– This data store can link directly with the Inferno Grid
Caveat: We haven’t really used this in anger yet!
![Page 13: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/13.jpg)
Building an Inferno Grid in this room…
These are conservative estimates (I think) Install scheduler (Linux machine) – 10 minutes Install worker node software (Windows) – 2 minutes each Run toy job and monitor it within 15 minutes of start Set up Inferno Certificate Authority – 1 minute Provide Inferno certificates to all worker nodes – 2
minutes per node Provide Inferno cert to users + admins – 2 minutes each Fully-secured (small) Inferno Grid up and ready in an hour
or two. If you know what you’re doing!! (remember that docs
aren’t so good… )
![Page 14: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/14.jpg)
Reading Campus Grid so far
Collaboration between School of Systems Engineering, IT Services and e-Science Centre
Haven’t had as much time as we’d like to investigate Inferno Grid But have an embryonic “Campus Grid” of two flocked Condor pools
– Although both at Reading, come under different admin domains– Getting them to share data space was challenging, and firewalls caused
initial problem– (Incidentally, the Inferno Grid had no problems at all crossing the
domains) Small number of users running MPI and batch jobs
– Animal and Microbial Sciences, Environmental Systems Science Centre Ran demo project for University An “heroic effort” at the moment, but we are trying to secure funding
![Page 15: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/15.jpg)
Novel features of RCG
Problem: Most machines are Windows but most people want *nix environment for scientific programs
“Diskless Condor”:– Windows machines reboot into Linux overnight
– Loads Linux from a network-shared disk image
– Uses networked resources only (zero impact on hard drive)
– In morning, reboots back into Windows
Looking into CoLinux (www.colinux.org):– Free VM technology for running Linux under Windows
– Early days, but initial look is promising.
![Page 16: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/16.jpg)
Future work
Try to get funding! Intention is to make CG key part of campus infrastructure
– IT Services are supportive
Installation of SRB for distributed data store Add clusters/HPC resources to Campus Grid Working towards NGS compatibility
![Page 17: The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services .](https://reader036.fdocuments.net/reader036/viewer/2022082816/56649ce15503460f949abd18/html5/thumbnails/17.jpg)
Conclusions
Inferno Grid has lots of good points, especially in terms of security, ease of installation and maintenance– Should be attractive to IT Services…
We haven’t used it “in anger” yet but it is used successfully by others (in academia, industry and govt)– Caveat: these people tend to run a single app (or small number of
apps) rather than general code
Doesn’t have all of Condor’s features We don’t want to fragment effort or become marginalised
– Would be great to see good features of Inferno appear in Condor, esp. “worker pull” mechanism