Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence...
-
Upload
andrew-morgan -
Category
Documents
-
view
220 -
download
4
Transcript of Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence...
www.eu-eela.eu
E-science grid facility forEurope and Latin America
OurGrid and the co-existence with gLiteAlexandre Duarte
Universidade Federal de Campina Grande (Brazil)
Joint EELA-2/EGEE-III Tutorial for Trainers
30/06 to 04/07/2008
Part of these slides were created by Francisco Brasileiro (UFCG-Brazil) and Diego Scardaci (INFN-Italy)
www.eu-eela.eu
Agenda
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 2
• Introduction• The OurGrid Approach
– Architecture– Scheduling– Avoiding Free-Riders– Security Concerns– Application Models
• OurGrid and gLite Co-Existence• Conclusions
www.eu-eela.eu
Introduction
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 3
www.eu-eela.eu
EELA-2 Objectives
• Build a powerful, functional and well supported Grid Facility
• Address a large community of users
• Assert the financial & management schemes to operate and support the e-Infrastructure on the long range
• Anticipate the handover of the e-Infrastructure operation and support
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 4
www.eu-eela.eu
Powerful Grid Facility
• Dream– An active community of potential grid users– High skilled support team to deploy and manage resource
centres– A lot of resource centres with large amounts of computational
resource to put in the grid
• Reality:– An active community of potential grid users– Lack of skilled personnel– A few resource centres with a good amounts of computational
resources– A lot of resource centres with small amounts of computational
resources
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 5
www.eu-eela.eu
Making the Dream a Reality
• User Community– Continue with the good work– Keep finding more interesting applications to support
• Skilled personnel– That’s why we (you) are here (there)– Training training and training
• Computational Resources– Buy more computers? $$$– Buy clusters? $$$$$– Buy Supercomputers? $$$$$$$$$$$$– Share idle resources? FREE
Example: UFCG with ±3000 PCs * 16 idle hours / day• ≈ 2000 idle PCs!!! Totally free!
6Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008
www.eu-eela.eu
Sharing Idle Resources
• Voluntary Computing (eg. LCG@home)– Organisations donate their resources to A given project– Donators normally do not use the available resources for their
own purposes– Entrance barrier is high, because one must
invest a good deal of effort in “advertising” have a very high visibility project be in a prestigious institution
– May be useful when the organisation has access to a large number of desktops
• Peer2Peer Grid (eg. OurGrid)– Peers donate their resources to other Peers in the grid– Donators normally use the available resources for their own
purposes– Entrance barrier is low, just deploy a new Peer in the grid
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 7
www.eu-eela.eu
An Important Note
• These are not competing technologies!
• Each one is more appropriate to a particular subset of the users’ base
• Each one has its virtues and drawbacks
• It is very likely that they will be able not only to co-exist, but also to interoperate
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 8
www.eu-eela.eu
The OurGrid Approach
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 9
www.eu-eela.eu
The OurGrid Approach
• Labs can freely join the system without any human intervention– No need for negotiation; no paperwork
• Clear incentive to join the system– One can’t be worse off by joining the system– Noticeable increased response time– Free-riding resistant
• Basic dependability properties– Configurable level of security– Resilience to faults– Scalability
• Easy to install, configure, manage and program– No need for specialized support team
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 10
www.eu-eela.eu
But there is no free lunch
• To simplify the problem, OurGrid is focused on Bag-of-Tasks (BoT) applications– No need for communication among tasks
Facilitates scheduling and security enforcement
– Simple fail-over/retry mechanisms to tolerate faults– No need for QoS guarantees– Script-based programming is natural
Facilitates use
• Fortunately, many important applications are BoT!– Data mining, Massive search, Bio computing, Parameter sweep,
Monte Carlo simulations, Fractal calculations, Image processing and many others
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 11
www.eu-eela.eu
OurGrid Architecture
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 12
www.eu-eela.eu
Finding Resources
• OurGrid GIS (NodeWiz) allows the execution of rich queries that encompass not only multiple attributes, but also range operators
• A scheduler might want to locate suitable resources– OS=linux && RAM ≥ 1G && clock > 4GHz && load < 0.5
• A user may want to locate a dataset that contains particular data itens– rain_fall && -37º52’ < long < -37º46’ && 144º54’ < lat < 145º03’
&& date ≥ 01/01/2007
• …
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 13
www.eu-eela.eu
Scheduling with no Information
• Grid scheduling typically depends on information about the grid (eg. machine speed and load) and the application (eg. task size)
• However, getting accurate information about all applications and resources is hard in a large scale peer-to-peer grid
• Can we efficiently schedule tasks without requiring access to information?
• This would make the system much easier to deploy and simpler to use
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 14
www.eu-eela.eu
Workqueue with Replication
• Tasks are sent to idle processors
• When there are no more tasks, running tasks are replicated on idle processors
• The first replica to finish is the official execution
• Other replicas are cancelled
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 15
www.eu-eela.eu
Preventing Free-riders• It is important to encourage collaboration
– In file-sharing, most users free-ride
• OurGrid uses a reciprocation-based incentive mechanism– Tit-for-tat
• The Network of Favors– All peers maintain a local balance for all known peers– Peers with greater balances have priority when there is
contention for local resources– Under contention, the more one donates, the more one gets
back– No additional infrastructure is needed
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 16
www.eu-eela.eu
OurGrid Security• How to protect resources from applications?
– Leverages on the fact that BoT applications only communicate to receive input and return the output
– Input/output is done by the OurGrid Worker Manager that runs within a Java virtual machine
– The remote task runs inside a virtual machine, with no network access, and disk access only to a designated partition Other configurations are possible
– A new virtual machine is instantiated before a new task is run
• How to protect applications from resources?– Increased script language to accommodate an optional check
phase Application may introduce task-dependent water marks
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 17
www.eu-eela.eu
Application Models
• Script-based– Stage-in/out files.
• Embedded– Direct access to MyGrid’s API
• Portal-based– Web interface to Mygrid’s API
• Framework-based– MyGrid inside of frameworks
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 18
www.eu-eela.eu
OurGrid-enabling an Application
• Write a script using a very simple language– Simple abstractions
File transfer (put, store, get) Hide heterogeneity ($PLAYPEN, $STORAGE)
– Define constraints (job requirements and grid machine attributes)
• Write a program that embeds the business logic and may make use of more complex features available through a Java API
• Deploy a Portal that embeds the application
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 19
www.eu-eela.eu
An Example: Factoring a Numberjob:
label: my_factorial_useless_example
requirements: (OS=linux && RAM ≥ 1G && clock > 4GHz && load < 0.5)
task:init: store factoring $PLAYPENremote: factoring 3 18655 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results
task:init: store factoring $PLAYPENremote: factoring 18656 37307 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results
task:init: store factoring $PLAYPENremote: factoring 37308 55968 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results
…
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 20
www.eu-eela.eu
Some Applications• Script-based
– Risk assessment for agriculture loans (EMBRAPA)– Our own research on computer science
Simulations
• API-based– SmartPumping (PETROBRAS)
Parallel execution of genetic algorithms for optimizing oil pipeline operation
– EPANET-Grid (R&D project) Grid-enabled version of the EPANET system for simulation of water
supply systems– GridVida (R&D Project)
Image processing to support diagnosis by identifying similar cases in the archival database
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 21
www.eu-eela.eu
Some Applications• Portal-based: SegHidro (R&D project)
– Several uses related to management of water resources in a Brazilian semi-arid area
– Academic and industrial users– Allow the configuration of different workflows of simulation
models and the execution of them in ensembles – Sharing of computing resources, data and complementary
expertise
• Framework-based: GridUnit (R&D project)– An extension of JUnit– Features
Transparent and Automatic Distribution Test Case Contamination Avoidance Environmental coverage Graphical user interface
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 22
www.eu-eela.eu
OURGRID AND GLITE CO-EXISTENCE
OurGrid and gLite Co-Existence
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 23
www.eu-eela.eu
EELA-2 Joint Research Activity
• Help in fostering the sustainability of the e-Infrastructure– Making the e-Infrastructure more interesting
and wide spread by increasing its reach and its usability
• Promote a continued and increased interaction between research groups in Europe and Latin-America
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 24
www.eu-eela.eu
Increase the Reach by
• Allowing the scavenging of idle resources– Create the necessary mechanisms to allow resource
centres that run the OurGrid middleware to co-exist with resource centres running gLite within the EELA platform
– Provide some level of interoperation between these different kinds of resource centres and their associated applications
• Allowing the execution of the grid middleware on top of Microsoft Windows platforms– Port the gLite middleware to the Windows platform– Leveraging on the multi-platform characteristics of OurGrid
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 25
www.eu-eela.eu
Increase the Usability by
• Developing new application-oriented grid services– Easy the creation of digital archives and data grid frameworks– Secure storage to solve the insider abuse problem– Support for cooperative workflows– Other selected services required by NA3 applications
• Leveraging on the grid services provided by the OurGrid middleware to execute bag-of-tasks jobs
• Facilitating the management of resource centres
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 26
www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 27
• The first step is to allow EELA-2 OurGrid Resource Centres to be created– Provide support for the use of the gLite PKI by OurGrid resource
centres
OurGrid –gLite Co-Existence
www.eu-eela.eu
OurGrid –gLite Co-Existence
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 28
• The second step is to allow idle resources in anEELA-2 gLite resource centre to be exposed asOurGrid resources
www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 29
• The final step is to allow resources of an OurGrid resource centre to be exposed as gLite resources– This will be achieved in two sub-steps
Firstly, allow clusters to be exposed as a single resource in an OurGrid resource centre
www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 30
• The final step is to allow resources of an OurGrid resource centre to be exposed as gLite resources– This will be achieved in two sub-steps
Firstly, allow clusters to be exposed as a single resource in an OurGrid resource centre
Secondly, make these resources available at the gLite grid
www.eu-eela.eu
JRA1 Milestones• MJRA1.1: set/2008
– PKI-enabled OurGrid middleware• MJRA1.2: jan/2009
– Prototypes of the proposed services• MJRA1.3: jan/2009
– Common information system betweengLite and OurGrid up & running
• MJRA1.4: jul/2009– Prototype of the gateway to transfer jobs
from gLite to OurGrid and vice-versa• MJRA1.5: jul/2009
– Stable version of the proposed services• MJRA1.6: jan/2010
– Stable version of the gateway to transfer jobsfrom gLite to OurGrid and vice-versa
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 31
www.eu-eela.eu
References
• About OurGrid– Download the middleware and documentation from
http://www.ourgrid.org– Read “Labs of the world, unite!!! W. Cirne, F. Brasileiro, N.
Andrade, L. Costa, A. Andrade, R. Novaes, M. Mowbray. Journal of Grid Computing 4 (3) (2006) 225-246.” for more details.
• About JRA1– Requests/Comments/Suggestions/Criticisms
Send email to either Francisco Brasileiro ([email protected]) or Diego Scardaci ([email protected])
– Contact the developers at [email protected]– Download new software distributions from http://eela-forge.eu
Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 32