EGEE – A Large-Scale Production Grid Infrastructure
description
Transcript of EGEE – A Large-Scale Production Grid Infrastructure
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE – A Large-Scale Production Grid Infrastructure
Erwin Laure
EGEE Technical Director
CERN, Switzerland
GGF 16 - February 2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Contents
• The EGEE Project
• Operational Aspects
• Application Usage
• Interoperability, Standardization
• Outlook
GGF 16 - February 2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
The EGEE project
• Objectives– Large-scale, production-quality infrastructure for e-Science
leveraging national and regional grid activities worldwide consistent, robust and secure
– improving and maintaining the middleware– attracting new resources and users from industry as well as science
• EGEE – 1st April 2004 – 31 March 2006– 71 leading institutions in 27 countries,
federated in regional Grids
• EGEE-II– Proposed start 1 April 2006 (for 2 years)– Expanded consortium
> 90 partners in 32 countries (also non-European partners)
Related projects (such as BalticGrid, SEE-GRID, EUMedGrid, EUChinaGrid, EELA, …)
GGF 16 - February 2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE Infrastructure
Scale> 170 sites in 39 countries
> 17 000 CPUs
> 5 PB storage
> 10 000 concurrent jobs per day
> 60 Virtual Organisations
GGF 16 - February 2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Applications
>20 supported applications from 7 domains– High Energy Physics– Biomedicine– Earth Sciences – Computational Chemistry– Astronomy– Geo-Physics– Financial Simulation
Another 8 applications from
4 domains are in evaluation stage
GGF 16 - February 2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
10,000 jobs /day
From Accounting data: ~3 million jobs in 2005 so far Sustained daily rates (per month Jan – Nov 2005):
[2185, 2796, 7617, 10312, 11151, 9247, 9218, 11445, 10079, 11124, 9491]
~8.2 M kSI2K.cpu.hours >1000 cpu years
Real usage is higher as accounting data was not published from all sites until recently
GGF 16 - February 2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Some example uses
Domain distribution of Flexx run jobs
es; 5122
fr; 7580
gr; 2004 il; 263 it; 3687
nl; 3356
tw; 827
uk; 8106
bg; 597 com; 1072
de; 715 cy; 383
pl; 1877
ru; 218
ro; 337
LCG sustained data transfers using FTS; avg. 500 MB/s, peak 1000 MB/s
Zeus collaboration at DESY
WISDOM data challenge
BIOMED jobs distribution
0
20000
40000
60000
80000
100000
120000
2005-01 2005-02 2005-03 2005-04 2005-05 2005-06 2005-07 2005-08
nb
of
job
s
0
10
20
30
40
50
60
du
rati
on
est
imat
e (y
ears
)
registered jobs
successful jobs
cancelled jobs
aborted jobs
run duration estimate
GGF 16 - February 2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE Operations Structure
• Operations Management Centre (OMC):
– At CERN – coordination etc• Core Infrastructure Centres
(CIC)– Manage daily grid
operations – oversight, troubleshooting
“Operator on Duty”
– Run infrastructure services– Provide 2nd level support to
ROCs– UK/I, Fr, It, CERN, Russia,
Taipei• Regional Operations Centres
(ROC)– Front-line support for user
and operations issues– Provide local knowledge
and adaptations– One in each region – many
distributed• User Support Centre (GGUS)
– In FZK: provide single point of contact (service desk), portal
GGF 16 - February 2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Operations Process• CIC – on – duty (grid operator on duty)
– 6 teams working in weekly rotation CERN, IN2P3, INFN, UK/I, Ru,Taipei
– Crucial in improving site stability and management• Operations coordination
– Weekly operations meetings– Regular ROC, CIC managers meetings– Series of EGEE Operations Workshops
Last one was a joint workshop with Open Science Grid
Bring in related infrastructure projects – coordination point
• Geographically distributed responsibility for operations:– There is no “central” operation– Tools are developed/hosted at different sites:
GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon)
• Procedures described in Operations Manual• Infrastructure planning guide (cookbook):
– Contains information on operational procedures, middleware, certification, user support, etc.
Improvement in site stability and reliability is due to:
CIC on duty oversight and strong
follow-up Site Functional Tests, Information System monitor
GGF 16 - February 2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Applications Example: LHC
• Fundamental activity in preparation of LHC start up– Physics– Computing systems
• Examples:– LHCb: ~700 CPU/years in 2005 on the EGEE infrastructure– ATLAS: over 10,000 jobs per day
CPU used: 6,389,638 hData Output: 77 TB
DIRAC.Barcelona.es 0.214% DIRAC.Bologna-T2.it 0.696%DIRAC.CERN.ch 0.571% DIRAC.Cambridge.uk 0.001%DIRAC.CracowAgu.pl 0.001% DIRAC.IF-UFRJ.br 0.175%DIRAC.LHCBONLINE.ch 0.779% DIRAC.Lyon.fr 2.552%DIRAC.PNPI.ru 0.000% DIRAC.Santiago.es 0.148%DIRAC.ScotGrid.uk 3.068% DIRAC.Zurich-spz.ch 0.003%DIRAC.Zurich.ch 0.756% LCG.ACAD.bg 0.106%LCG.BHAM-HEP.uk 0.705% LCG.Barcelona.es 0.281%LCG.Bari.it 1.357% LCG.Bologna.it 0.032%LCG.CERN.ch 10.960% LCG.CESGA.es 0.528%LCG.CGG.fr 0.676% LCG.CNAF-GRIDIT.it 0.012%LCG.CNAF.it 13.196% LCG.CNB.es 0.385%LCG.CPPM.fr 0.242% LCG.CSCS.ch 0.282%LCG.CY01.cy 0.103% LCG.Cagliari.it 0.515%LCG.Cambridge.uk 0.010% LCG.Catania.it 0.551%LCG.Durham.uk 0.476% LCG.Edinburgh.uk 0.031%LCG.FZK.de 1.708% LCG.Ferrara.it 0.073%LCG.Firenze.it 1.047% LCG.GR-01.gr 0.349%LCG.GR-02.gr 0.226% LCG.GR-03.gr 0.171%LCG.GR-04.gr 0.056% LCG.GRNET.gr 1.170%LCG.HPC2N.se 0.001% LCG.ICI.ro 0.088%LCG.IFCA.es 0.022% LCG.IHEP.su 1.245%LCG.IN2P3.fr 4.143% LCG.INTA.es 0.076%LCG.IPP.bg 0.033% LCG.ITEP.ru 0.792%LCG.Imperial.uk 0.891% LCG.Iowa.us 0.287%LCG.JINR.ru 0.472% LCG.KFKI.hu 1.436%LCG.Lancashire.uk 6.796% LCG.Legnaro.it 1.569%LCG.Manchester.uk 0.285% LCG.Milano.it 0.770%LCG.Montreal.ca 0.069% LCG.NIKHEF.nl 5.140%LCG.NSC.se 0.465% LCG.Napoli.it 0.175%LCG.Oxford.uk 1.214% LCG.PIC.es 2.366%LCG.PNPI.ru 0.278% LCG.Padova.it 2.041%LCG.Pisa.it 0.121% LCG.QMUL.uk 6.407%LCG.RAL-HEP.uk 0.938% LCG.RAL.uk 9.518%LCG.RHUL.uk 2.168% LCG.SARA.nl 0.675%LCG.Sheffield.uk 0.094% LCG.Torino.it 1.455%LCG.Toronto.ca 0.343% LCG.Triumf.ca 0.105%LCG.UCL-CCC.uk 1.455% LCG.USC.es 1.853%
20-6-2005, P. Jenni 4LCG POB: ATLAS on the LCG/EGEE Grid
This is the first successful use of the grid by a largeuser community, which has however also revealed several shortcomings which need now to be fixed as LHC turn-on is onlytwo years ahead!
Very instructive comments from the user feedback have been presented at the Workshop (obviously this was one of the main themes and purposes of the meeting)
All this is available on the Web
ATLASLHCb
Note the usage of 3 Grid
infrastructures
GGF 16 - February 2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Applications Example: WISDOM
• Grid-enabled drug discovery process for neglected diseases– In silico docking
compute probability that potential drugs dock with target protein
– To speed up and reduce cost required to develop new drugs
• WISDOM (World-wide In Silico Docking On Malaria)– First biomedical data challenge
– 46 million ligands docked in 6 weeks Target proteins from malaria parasite Molecular docking applications: Autodock and FlexX ~1 million virtual ligands selected
– 1TB of data produced
– 1000 computers in 15 countries Equivalent to 80 CPU years
Never done for a neglected disease Never done on a large scale
production infrastructure
Domain distribution of Flexx run jobs
es; 5122
fr; 7580
gr; 2004 il; 263 it; 3687
nl; 3356
tw; 827
uk; 8106
bg; 597 com; 1072
de; 715 cy; 383
pl; 1877
ru; 218
ro; 337
GGF 16 - February 2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Interoperability
• We currently see different flavors of Grids deployed worldwide– Because of application needs, legacy constraints, funding, etc. – Diversity is positive! – Competition to find the best solutions
• Many applications need to operate on more than one Grid infrastructure– Pragmatic approach to interoperability is key – Applications need interoperable Grid infrastructures now– A production infrastructure cannot be an early adopter of quickly changing
standards
• EGEE is active contributor to interoperability and standardization efforts– Works with OSG, NAREGI, ARC, and the multi-grids interoperability effort– Provides valuable input on practical experiences to standardization process– Contributes to over 15 GGF WG/RG – Currently supplies two GGF area directors
GGF 16 - February 2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Interoperation
• Leaders from TeraGrid, OSG, EGEE, APAC, NAREGI, DEISA, Pragma, UK NGS, KISTI will lead an interoperation initiative in 2006.
• Six international teams will meet for the first time at GGF-16 in February 2006– Application Use Cases
(Bair/TeraGrid, Alessandrini/DEISA)
– Authentication/Identity Mgmt (Skow/TeraGrid)
– Job Description Language Newhouse/UK-NGS
– Data Location/Movement Pordes/OSG
– Information Schemas Matsuoka/NAREGI
– Testbeds Arzberger/Pragma
Leaders from nine Grid initiatives met at SC05 to plan an application-driven “Interop Challenge” in 2006.
Intero
perability w
orkshop Monday and Tuesday afte
rnoon
“AuthZ in
teroperabilit
y here and now” Thursday m
orning
GGF 16 - February 2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Future
• EGEE-II supposed to start on April 1st – Smooth transition from current phase – Additional support for more application domains– Increased number of partners, also from US and AP– Unified EGEE middleware distribution gLite 3.0
• Continue and reinforce interoperability work
• Standardization in GGF, other bodies, and industry important for long term sustainability of Grid
Need for commonly accepted standards
• Towards a long-term sustainable Grid infrastructure
GGF 16 - February 2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Upcoming EGEE Events
• Series of regional long term sustainability workshops in Europe during March– For more information get in contact with
• 1st EGEE User Forum, CERN, 1-3 March, 2006– Bring together the EGEE community– Exchange experience and needs on EGEE– http://cern.ch/egee-intranet/User-Forum
• EGEE-II conference, Geneva, 25-29 September, 2006– Include sessions with related projects – Theme will be long term sustainability