The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd...

31
The ALICE GRID operation Galina Shabratova Joint Institute for Nuclear Research (on behalf of ALICE) 28 June 2010, GRID 2010, Dubna, Russia

Transcript of The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd...

Page 1: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

The ALICE GRID operation

Galina Shabratova

Joint Institute for Nuclear Research

(on behalf of ALICE)

28 June 2010, GRID 2010, Dubna, Russia

Page 2: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Outlook

ALICE job submission

ALICE services required at the sites

Storage discovery in AliEn

New items

Interactions between sites and interoperability

Alice Analysis Facility

Summary and Conclusions

28 June—2010

2GRID2010, Dubna, Russia

Page 3: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

ALICE workload management

.Job agent infrastructureSubmitted through the Grid infrastructure available at the site (gLite, ARC, OCG)

Real jobs held in a central queue handling priorities and quotas

Agents submitted to provide a standard environment (job wrapper) across different systems, check the local environment

Real jobs pulled by the sites

Automatize operations

Extensive monitoring

3

28 June—2010 GRID2010, Dubna, Russia

Page 4: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

WLCG Services of ALICE sites

4

VOBOX

With direct access to the experiment software area

This area has to be mounted also to all available WNs

CE front-end submit directly to the local batch system

gLite sites have to provide the CREAM-CE (Computing ResourceExecution And Management) under 1.6 version

Specific ALICE queue behind the CREAM-CE

Resource BDII must provide information about this service

Xrootd based Storage system

CPU requirements

2GB of memory per job for MC and reconstruction jobs

Optimization needed for some user analysis jobs

ALL Services must be at least SL5 (64bit)GRID2010, Dubna, Russia 28 June—2010

Page 5: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

The gLite3.2-VOBOX layer

The gLite-VOBOX is a gLite-UI with 2 added features:

Automatic renewal of the user proxyLong-lived proxy in myproxy.cern.ch

Direct access to the experiment software areaAccess restricted to the voms role: lcgadmin

SINGLE local account

Unique service not accepting pool accounts

Access to the machine based on a valid voms-aware user proxy

gsisshd service on port 1975

5GRID2010, Dubna, Russia 28 June—2010

Page 6: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Example: the gLite-VOBOX

70% of the ALICE VOBOXES are running the gLite middleware

6

ALICE Services

Operating system (SL5/64b)

GENERIC SERVICE:

gLite3.2 VOBOX

Site admin responsibility

•Root privileges

• Installation, maintenance,

upgrades and follow up for

OS/generic service

ALICE responsibility

•User privileges (gsissh)

• Installation, maintenance,

upgrades and follow up for

ALICE services

ClusterMonitor

(8084)

CE (talks

through CM)

MonaLisa

(8884)PackMan

(9991)

gsisshd

(1975)

vobox-

renewd

To world

World and

WN

ANY

lxplus@CERN

and SAM UIs

WNSpecific CERN IP

range

Myproxy and voms

servers

Gridftp

(obsolete)

GRID2010, Dubna, Russia 28 June—2010

Page 7: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

gLite-VOBOX: The proxy

renewal mechanism

7

Service Area: DAEMON

1.Connection to

myproxy@CERN

2.Connection to ALICE voms

server

3.Build up of a delegated

proxy from those registered

user proxies

User Area

• $HOME area assigned

• User must manually

register his proxy with the

VOBOX (in principle only

once…)

Myproxy

@

CERN

VOMS

Server

Loggin via

gsissh to the

VOBOX from a

UI with a valid

proxy

Require a

delegated

plain proxy

User must have

uploaded his proxy

before

Require valid

voms

extension

GRID2010, Dubna, Russia 28 June—2010

Page 8: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

gLite-VOBOX operations

Site side

Minimal!

One of the most stable services

Easy installation and configuration

Experiment support side

Sites are supported at any moment by the CERN

team

Remember…. The gLite3.2 VOBOX has been created by

the ALICE support team in collaboration with the WLCG

All VOBOXES were monitored via SAM each hour –

today there is migration to NAGIOS

T0 VOBOXEs are monitored via Lemon with an alarm

system behind

T1 sites can be notified via GGUS alarm tickets

T2 sites will be notified on best effort base

8GRID2010, Dubna, Russia 28 June—2010

Page 9: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

ALICE Workload Management

system

The ALICE tendency in the last two years is to

minimize the number of services used to

submit job agents

Already ensured with ARC, OSG and AliEn native

sites

Same approach followed on gLite sites

Through the CREAM-CE system

ALICE is asking for the deployment of this service at

all sites since 2008

ALICE has reduced the use of gLite3.X-WMS with the

deployment of AliEn v.2.18

The WMS submission module still available in AliEn v.2.18

31st

of May is the deadline established by ALICE to

deprecate the usage of WMS at all sites

ALL ALICE sites are encouraged to migrate to CREAM1.6

asap

9GRID2010, Dubna, Russia 28 June—2010

Page 10: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

ALICE and the CREAM-CEALICE first put CREAM-CE in production and

provide developers with important feedback

2008 approach: Testing

Testing phase (performance and stability) at FZK

2009 approach: AliEn implementation and distribution

System available at T0, all T1 (but NIKHEF) and several T2

sites

Dual submission (LGC-CE and CREAM-CE) at all sites providing

CREAM

Second VOBOX was required at the sites providing CREAM to

ensure the duality approach asked by the WLCG-GDB

2010 approach: Deprecation of the WMS

The 2nd

VOBOX provided in 2009 has been recuperated

through the failover mechanism included in AliEnv2.18

Both VOBOXES submitting in parallel to the same local

CREAM-CE

ALICE is involved in the operation of the service

at all sites together with the site admins and the

CREAM-CE developers

10GRID2010, Dubna, Russia 28 June—2010

Page 11: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

ALICE experiences with CREAM-CESites experience

Most of the sites admins confirmed a negligible

effort needed by the system

With the increase of the ALICE load in the last

months the stability of the system has been

confirmed

11GRID2010, Dubna, Russia 28 June—2010

CNAF

CNAF- CREAM

JINR- CREAM

Troitsk- CREAM

JINR

Troitsk

4 months of ALICE

data taking have

shown more efficient

operation of CREAM-

CE in comparison with

LCG-CE services

Page 12: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

ALICE Data Management system

Data access and file transfer: via xrootdData access

T0 and T1 sites: Interfaced with MSS

T2 sites: xrootd (strongly encouraged) or DPM with xrootd plugin

File transfer

T0-T1: raw data transfer for custodial and reconstruction purposes

Data volumes proportional to the T1 ALICE share

T1-T2: reconstructed data and analysis

Default behavior: Jobs are sent where data are physically stored

External files can be locally copied or remotely access based on the user decision

Outputs of the jobs locally stored and replicated to (default) two external sites

Choice of storage based on availability and proximity matrix (dynamic)

12GRID2010, Dubna, Russia 28 June—2010

Page 13: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Storage discovery in AliEn

13GRID2010, Dubna, Russia 28 June—2010

The save directive determines only the

number of replicas and QoS type (disk, tape)

How to efficiently write N replicas of a file ?From jobs running inside the sites

By a user running its software on a laptop while at

home

same case as for a worker running somewhere in the clouds

Then, how to efficiently read the data when N replicas

are available?

In the end this is just a variation of the data locality

problem

Site structure

Page 14: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Storage status

14GRID2010, Dubna, Russia 28 June—2010

SEs failing the functionality tests are

removed from the storage matrix

Functionality tests are executed every 2

hoursadd, get, remove of a test file from a remote location

ML monitoring of the storage elements

Page 15: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

15GRID2010, Dubna, Russia 28 June—2010

Discover and derived

network topologyEach SE is associated a set of IP

addresses

VOBox IP, IPs of xrootd redirector & nodes

MonALISA performs tracepath

/traceroute between all VOBoxes

recording all routers and the RTT (Round

Trip Time)of each link

SE nodes status & bandwidth between sites

Group of routers in the respective

Autonomous Systems (AS)

Distance(IP, IP) hierarchy v.s.:

same C-lass network, common domain ,

same AS, same country, use known

distance

between the AS-es , same continent et al.

Distance(IP, Set<IP>): Client's

public IP to all known IPs for the

storage

Examples

Page 16: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Storage discovery in AliEn -

Results

16GRID2010, Dubna, Russia 28 June—2010

Flexible storage configuration

QoS (Quality of Service) tags are all that users

should know about the system

We can store N replicas at once

Maintenance-free system

Monitoring feedback on known elements,

automatic discovery and configuration of

new resources

Reliable and efficient file access

No more failed jobs due to auto discovery and

failover in case of temporary problems

Use the closest working storage element(s)

to where the application runs

Page 17: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Sites connection and interoperability

Sites connection

Only at the data management level

In terms of job submission and services sites

are independent with each other

ALICE computing model foresees same service

infrastructure at all sites

ALICE assumes the T1-T2 connection

established by the ROC infrastructure for site

control and issue escalation

Interoperability between sites

Defined at the AliEn level

Definition of plug-ins for each middleware stack

Same AliEn setup for all sites

17GRID2010, Dubna, Russia 28 June—2010

Page 18: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

New items: Nagios Migration

All experiments have to migrate their

current SAM infrastructure for the WLCG

services monitoring and tracking to Nagios

Infrastructure available at CERN and responsible

of the whole WLCG service infrastructure

ALICE has defined specific tests for the VOBOX

only

Tests for the CE and the CREAM-CE sensors also

running with ALICE credentials

Test suite belongs to the ops VO

Site availability calculation performed on the CE

sensor results basis ONLY

The important point was to migrate the VOBOX

specific tests into Nagios

• DONE!

18GRID2010, Dubna, Russia 28 June—2010

Page 19: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Nagios:: Steps performed

Migration of the test suite

Upload of the ALICE software to the specific

Nagios repository

Build up of the software in the developed

and testing Nagios infrastructure

Rpm installed in the Nagios ALICE UI

Population of the configuration files with the

ALICE ROCs

CERN, Central Europe, Italy, NGI_FRANCE,

UKI, Russia

Start up of the service

ALICE is publishing the results in Nagios

web interface since 48 h

19GRID2010, Dubna, Russia 28 June—2010

Page 20: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

https://sam-alice.cern.ch/nagios/

(Left hand menu) Service

Groups: Summary

Service VO-box

Nagios::exampe

Page 21: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Nagios:: Next Steps

ALL SITES MUST PUBLISH THEIR

VOBOXES IN THE GOCDB

Monitoring flag set to YES

Ops test suites for the CE and the CREAM-

CE will be monitored from Nagios but

being executed with ALICE credentials

21GRID2010, Dubna, Russia 28 June—2010

Page 22: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

New items: gLExecMotivation

Sites in the Grid run jobs on behalf of ALICE users. Currently, all jobs run as one system user on each site.

Sites want to see the jobs running by users the jobs belong to.Due to Accountability / Traceability which let sites ban or restrain certain users. Currently, they could only affect the hole VO.

How to run a job with a user - owner of the job ? A site knows the VO ALICE, but not all it’s users. How to ensure a job runs with the right user ?

gLExec is a program that can switch users just like su or sudo

It takes the command to run (a job) and a proxy certificate.

The proxy certificate is mapped to a local user, based on rules.

The command is executed as the mapped user

nothing more !22

GRID2010, Dubna, Russia 28 June—2010

Page 23: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Alien Job Execution

After application of gLExec

Present status

GRID2010, Dubna, Russia 28 June—2010

23

Page 24: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Alice Analysis Facility - AAF

GRID2010, Dubna, Russia 28 June—2010

24

Idea about Common Cluster of ALICE

PROOF facilities is based at unique protocol

and storage management on the GRID

Data is proxied locally to

adequately feed PROOF

From the 91 AliEn sites

ALICE Global

Redirector

ALICE Super

Master

SKAF CAF LAF JRAF

Page 25: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

PROOF cluster storage

Take a PROOF cluster, with XROOTD storage

Use xrootd data management plug-in

Transform your AF into a proxy of the ALICE globalized

storage, through the ALICE GR

Also support sites not seen by the GR, through internal

dynamic prioritization of the AliEn sites.

Data management: how does the data appear?

Manual (or supervised) pre-staging requests, or:

Suppose that an user always runs the same analysis several

times

Which is almost always true

The first round will be as fast as the remote data staging, the

subsequent queries will be fast (data is already local)

25GRID2010, Dubna, Russia 28 June—2010

Page 26: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Alice Analysis Facility Example

GRID2010, Dubna, Russia 28 June—2010

26

Page 27: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Summary and conclusions

1/2For the data taking, the ALICE approach is to lighten the site requirements in terms of services

This approach is quite realistic and has demonstrated to work quite well

Able to escalate to the level of available man power at the core team @ CERN and the sites

High performance in terms of VOBOXEs, CREAM-CE and xrootd is needed

This is critical: if any of these services fails, your site is down

With the exception of the T0@CERN, there is a built-in redundancy

Site(s) malfunction will not stop the ALICE processing

But will slow it down considerably!

Sites are supported by a team of experts at any time

This approach has demonstrated its scalability

Same team since years with a clear increase of the sites entering in production

27GRID2010, Dubna, Russia 28 June—2010

Page 28: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

28GRID2010, Dubna, Russia 28 June—2010

Summary and conclusions2/2

During the 1st data taking, the ALICE computing model and its infrastructure

has demonstrated a high level of stability and performance.

It works!

Page 29: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Extra slides

Page 30: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

The VOBOX

1. Primary service required at every site - Service scope: * Run (local) specific ALICE services

* File system access to the experiment software area

- Mandatory requirement to enter the production * Independently of the middleware stack available

* Same concept valid for gLite, OSG, ARC and generic AliEn sites

1. Same setup requirements for all sites- Same behavior for T1 and T2 sites in terms of production

- Differences between T1 and T2 sites a QoS matter only * Applicable to ALL generic services provided at the sites

1. Service related problems should be managed by the site administrators

2. Support ensured at CERN for any of the middleware stacks

30GRID2010, Dubna, Russia 28 June—2010

Page 31: The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction

Failover mechanism: The

use of the 2nd VOBOX

This approach has been included in AliEnv2.18 to take advantage of the 2nd

VOBOX deployed in several ALICE sites

All sites which provided a CREAM-CE in 2009 had to provide a 2nd VOBOX for ALICE

Used to submit to the LCG-CE and the CREAM-CE backend in parallel

Message to the sites: Do not retire the 2nd

VOBOX, we will keep on using it in failover mode with the 1st VOBOX

04-May-2010ALICE - FAIR Offline Meeting 31