The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second...

16
The LHC Computing Grid Gergely Debreczeni (CERN IT/Grid Deployment Group)

Transcript of The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second...

Page 1: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The LHC Computing Grid

Gergely Debreczeni(CERN IT/Grid Deployment Group)

Page 2: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The data factory of LHC• 40 million collisions in each second• After on-line triggers and selections, only 100• 3-4 MB/event requires greater than 1GB/sec recording speed• More than 10 milliard collisions in a year yields 10 PB/year data flow• Additional Monte Carlo simulations

To compare:

o 1 TB is approximately corresponds to all the book produced around the world in one yearo 1 EB is the amount of information generated in one year around the world.

Requires ~100.000 today’s fastest PCs

Page 3: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Why we need it? The new collider and it’s detectors will generate an enormous amount of data.

No single supercomputer will be able to handle the data! A reliable, permanent, failure tolerant, flexible and distributed computing enviroment is needed, to meet the requirements of the new experiments and that of the geographicaly highly distributed collaborations.

The LHC Computing Grid is meant to be the solution !

SPF (Single Point of Failure) free computing system !

Page 4: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

EGEE - the framework

• to develop a service Grid infrastructure which is available to scientists 24 hours-a-day

EGEE – Enabling Grids for e-Sciences in Europe

The aim of the EGEE and EGEE-2 projects:

The project concentrate on:

• building a consistent, robust and secure Grid network that will attract additional computing resources• continously maintain and improve the middleware in order to deliver a reliable service to users• attracting new users from industry as well as scinence and ensure they receive the high standard of training and support they need

EGEE facts

• largest Grid infrastructure project in Europe• 27 participating country• ~70 leading institution• ~30 additional contributors• over 180 site• over 30 M Euros funding per 2 year

http://www.eu-egee.org

Page 5: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Basic elements of LCGVirtual Organisations• A grouping of individuals, often not bound to a single institution or enterprise, who, by reason of their common membership of the VO, and in sharing a common goal, are granted rights to use a set of resources on the Grid

Certificates• Authentication and authorisation is based on X.509 type digital certificates. Digital ‘identity cards’ with extensions containing information about the user’s VO membership. Issued by the Certificate Authorities.

BDII – Information Index

Resource Brokers

Computing elements

Storage elements

Catalogs

Proxy servers

Disk or/and tape, common interaface

Matchmaking of job requirements with available resources based on the BDII informaton

Extends certificate lifetimes for long running jobs

Master/head node of a local batch system. Interface to the Grid. Publish resource availability and job status to the Grid’s II

Collects information from the CEs, publishes it using a special schema (GLUE).

Different file location catalogs, physical media and location independent logical file pointers

Working NodesJobs are running here

Page 6: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Map of LCG sites

• EGEE, OSG, NorduGrid• ~ 32.000 processors• ~ 10 PB storage• ~ 20K running job at anytime• ~ 185 site

Page 7: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Grid Monitoring - SAMSAM – Service Availability Monitor

Test jobs are submitted in every 3 hour to each site in production. Examines the state of the site, publish result to a central page and sends notifications to site admins if necessary

http://lcg-sam.cern.ch:8443/sam/sam.py

Page 8: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Joining the LCG I The ‘BUDAPEST’ site of the Central Research Institute for Particle and Nuclear Physics (KFKI) was the 6th to join LCG in Jun, 2003.

Based on our previous Condor cluster experience, that time we had 25 processor, 1.5 TB disk storage and Condor batch system used.

Now KFKI has ~110 processors, 6.5 TB storage, and supports the following Virtual Organisations:

Alice, Atlas, LHCb, CMS, dteam, ops, HunGrid, Voce, BioMed

Page 9: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Joining the LCG/EGEE IIPast and current activities:

• gLite certification testbed Installing certifying new versions of the EGEE middleware before being released • LHCb data challange Participation in LHCb’s data challangge (DC04)• CMS service challange Now BUDAPEST is recognized as a Tier-2 CMS center• Alice ALIEN grid Dedicated gateway node (VO-box) to run Alien jobs on the LCG cluster• BioMed service challange• GSVG activities Participation in the Grid Security Working Group.Vulnerability testing, risk estimation. • User support Providing technical support mainly for HunGrid users

• Joint EGEE – SEEGRID2 summer school organizes by SZTAKI

• Demo cluster and courses at BME

• Presentations, demos, tutorials organised by ELTE

• EGEE ’07 conference will be held in BUDAPEST

Page 10: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The HunGrid Virtual Organisation

http://www.grid.kfki.hu/

The HunGrid Virtual Organisation was created to serve as a general purpose scientific and educational national VO, by

KFKI RMKI

Central Research Institute for Particle and Nuclear Physics

ELTE

Eötvös Loránd University, Faculty of Sciences

Page 11: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The HunGrid Virtual OrganisationAdditional partners:

BME, Budapest University of Technology and Economics

NIIF, National Information Infrastructure Development Program

VEIN, University of Pannonia, Faculty of Information Technology

Page 12: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The HunGrid Virtual Organisation

http://pki.kfki.hu

http://www.ca.niif.hu/

KFKI RMKI set up the first EUGridPMA recognized Certification Authority in Hungary

Now RMKI CA operates as an RA (Regional Authority) and issues certificate for the members of the Institute....

EUGridPMA, is the European Policy Management Authority for Grid Authentication

...while the tasks of the top level Certificate Authority has been delegated to NIIF

Page 13: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

The HunGrid P-Grade portalThe P-Grade portal was developped at SZTAKI serves as a graphical user interface to the Grid.

• Built-in graphical workflow editor• Multi-Grid management• Resource management• Quota management• Workflow-level fault tolerance• Certificate management• On-line workflow and paralell job monitoring• Built-in MDS and BDII based information system management• Local and remote files handling• Personalisation

Convenient tool to access and work on the Grid !

http://n42.hpcc.sztaki.hu

Page 14: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

ClusterGrid and the LCGhttp://www.clustergrid.iif.hu/The ClusterGrid project is a

general pourpose Grid project which targets users from the academic and educational regions.

In a simple picture practicaly it is huge collection of Condor pools in night-only operating mode.

• ~1000 computer and• several 10 TB of storage

Set up of an LCG – ClusterGrid gateway is under consideration. Several difficulties to be solved in the hope of a significant improvement of resources and services!

Page 15: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

Grid Competence CenterMembers of GCC play an outstanding and determining role in the Hungarian Grid R&D projects, they are leaders or participants in the vast majority of such projects including:

• VISSZKI• DemoGrid• SuperGrid• JGrid• Chemistry Grid• Super-Cluster Grid• HunGrid• NKFP Grid

http://www.mgkk.hu

Together easier to submit successful applications,and get more funding.

Formal framework is created first common applications are sent, but a much closer collaboration to reach our aims.

Page 16: The LHC Computing Grid - KFKI · The data factory of LHC • 40 million collisions in each second • After on-line triggers and selections, only 100 • 3-4 MB/event requires greater

HunGrid todos and problems

• Significant extension of both participating institutes and available resources is necessary. Critical ‘Grid mass’ is necessary to be reached in order the machinery to work as planed/expected.• Attracting research groups and maybe industrial applications (in a longer term)• Demonstrate it’s advantages and usability

Problems:

To do:

• Fundamental financial problems (5 application out of 6 fails)• Hard to convince people to change, to use the Grid• With no user site admin’s has no motivation to maintain

How to join HunGrid, contact info:

• The HunGrid is OPEN for everybody belonging to the academic community • Contact e-mail: [email protected], [email protected] • Web site: http://www.grid.kfki.hu/