The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second...
Transcript of The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second...
The LHC Computing Grid
Gergely Debreczeni(CERN IT/Grid Deployment Group)
The data factory of LHC• 40 million collisions in each second• After on-line triggers and selections, only 100• 3-4 MB/event requires greater than 1GB/sec recording speed• More than 10 milliard collisions in a year yields 10 PB/year data flow• Additional Monte Carlo simulations
To compare:
o 1 TB is approximately corresponds to all the book produced around the world in one yearo 1 EB is the amount of information generated in one year around the world.
Requires ~100.000 today’s fastest PCs
Why we need it? The new collider and it’s detectors will generate an enormous amount of data.
No single supercomputer will be able to handle the data! A reliable, permanent, failure tolerant, flexible and distributed computing enviroment is needed, to meet the requirements of the new experiments and that of the geographicaly highly distributed collaborations.
The LHC Computing Grid is meant to be the solution !
SPF (Single Point of Failure) free computing system !
EGEE - the framework
• to develop a service Grid infrastructure which is available to scientists 24 hours-a-day
EGEE – Enabling Grids for e-Sciences in Europe
The aim of the EGEE and EGEE-2 projects:
The project concentrate on:
• building a consistent, robust and secure Grid network that will attract additional computing resources• continously maintain and improve the middleware in order to deliver a reliable service to users• attracting new users from industry as well as scinence and ensure they receive the high standard of training and support they need
EGEE facts
• largest Grid infrastructure project in Europe• 27 participating country• ~70 leading institution• ~30 additional contributors• over 180 site• over 30 M Euros funding per 2 year
http://www.eu-egee.org
Basic elements of LCGVirtual Organisations• A grouping of individuals, often not bound to a single institution or enterprise, who, by reason of their common membership of the VO, and in sharing a common goal, are granted rights to use a set of resources on the Grid
Certificates• Authentication and authorisation is based on X.509 type digital certificates. Digital ‘identity cards’ with extensions containing information about the user’s VO membership. Issued by the Certificate Authorities.
BDII – Information Index
Resource Brokers
Computing elements
Storage elements
Catalogs
Proxy servers
Disk or/and tape, common interaface
Matchmaking of job requirements with available resources based on the BDII informaton
Extends certificate lifetimes for long running jobs
Master/head node of a local batch system. Interface to the Grid. Publish resource availability and job status to the Grid’s II
Collects information from the CEs, publishes it using a special schema (GLUE).
Different file location catalogs, physical media and location independent logical file pointers
Working NodesJobs are running here
Map of LCG sites
• EGEE, OSG, NorduGrid• ~ 32.000 processors• ~ 10 PB storage• ~ 20K running job at anytime• ~ 185 site
Grid Monitoring - SAMSAM – Service Availability Monitor
Test jobs are submitted in every 3 hour to each site in production. Examines the state of the site, publish result to a central page and sends notifications to site admins if necessary
http://lcg-sam.cern.ch:8443/sam/sam.py
Joining the LCG I The ‘BUDAPEST’ site of the Central Research Institute for Particle and Nuclear Physics (KFKI) was the 6th to join LCG in Jun, 2003.
Based on our previous Condor cluster experience, that time we had 25 processor, 1.5 TB disk storage and Condor batch system used.
Now KFKI has ~110 processors, 6.5 TB storage, and supports the following Virtual Organisations:
Alice, Atlas, LHCb, CMS, dteam, ops, HunGrid, Voce, BioMed
Joining the LCG/EGEE IIPast and current activities:
• gLite certification testbed Installing certifying new versions of the EGEE middleware before being released • LHCb data challange Participation in LHCb’s data challangge (DC04)• CMS service challange Now BUDAPEST is recognized as a Tier-2 CMS center• Alice ALIEN grid Dedicated gateway node (VO-box) to run Alien jobs on the LCG cluster• BioMed service challange• GSVG activities Participation in the Grid Security Working Group.Vulnerability testing, risk estimation. • User support Providing technical support mainly for HunGrid users
• Joint EGEE – SEEGRID2 summer school organizes by SZTAKI
• Demo cluster and courses at BME
• Presentations, demos, tutorials organised by ELTE
• EGEE ’07 conference will be held in BUDAPEST
The HunGrid Virtual Organisation
http://www.grid.kfki.hu/
The HunGrid Virtual Organisation was created to serve as a general purpose scientific and educational national VO, by
KFKI RMKI
Central Research Institute for Particle and Nuclear Physics
ELTE
Eötvös Loránd University, Faculty of Sciences
The HunGrid Virtual OrganisationAdditional partners:
BME, Budapest University of Technology and Economics
NIIF, National Information Infrastructure Development Program
VEIN, University of Pannonia, Faculty of Information Technology
The HunGrid Virtual Organisation
http://pki.kfki.hu
http://www.ca.niif.hu/
KFKI RMKI set up the first EUGridPMA recognized Certification Authority in Hungary
Now RMKI CA operates as an RA (Regional Authority) and issues certificate for the members of the Institute....
EUGridPMA, is the European Policy Management Authority for Grid Authentication
...while the tasks of the top level Certificate Authority has been delegated to NIIF
The HunGrid P-Grade portalThe P-Grade portal was developped at SZTAKI serves as a graphical user interface to the Grid.
• Built-in graphical workflow editor• Multi-Grid management• Resource management• Quota management• Workflow-level fault tolerance• Certificate management• On-line workflow and paralell job monitoring• Built-in MDS and BDII based information system management• Local and remote files handling• Personalisation
Convenient tool to access and work on the Grid !
http://n42.hpcc.sztaki.hu
ClusterGrid and the LCGhttp://www.clustergrid.iif.hu/The ClusterGrid project is a
general pourpose Grid project which targets users from the academic and educational regions.
In a simple picture practicaly it is huge collection of Condor pools in night-only operating mode.
• ~1000 computer and• several 10 TB of storage
Set up of an LCG – ClusterGrid gateway is under consideration. Several difficulties to be solved in the hope of a significant improvement of resources and services!
Grid Competence CenterMembers of GCC play an outstanding and determining role in the Hungarian Grid R&D projects, they are leaders or participants in the vast majority of such projects including:
• VISSZKI• DemoGrid• SuperGrid• JGrid• Chemistry Grid• Super-Cluster Grid• HunGrid• NKFP Grid
http://www.mgkk.hu
Together easier to submit successful applications,and get more funding.
Formal framework is created first common applications are sent, but a much closer collaboration to reach our aims.
HunGrid todos and problems
• Significant extension of both participating institutes and available resources is necessary. Critical ‘Grid mass’ is necessary to be reached in order the machinery to work as planed/expected.• Attracting research groups and maybe industrial applications (in a longer term)• Demonstrate it’s advantages and usability
Problems:
To do:
• Fundamental financial problems (5 application out of 6 fails)• Hard to convince people to change, to use the Grid• With no user site admin’s has no motivation to maintain
How to join HunGrid, contact info:
• The HunGrid is OPEN for everybody belonging to the academic community • Contact e-mail: [email protected], [email protected] • Web site: http://www.grid.kfki.hu/