Xrootd monitoring status update (Data collectors and Dashboard)
The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd...
Transcript of The ALICE GRID operationgrid2010.jinr.ru/files/pdf/GRID2010_shabratova_plenary_mody.pdf · Xrootd...
The ALICE GRID operation
Galina Shabratova
Joint Institute for Nuclear Research
(on behalf of ALICE)
28 June 2010, GRID 2010, Dubna, Russia
Outlook
ALICE job submission
ALICE services required at the sites
Storage discovery in AliEn
New items
Interactions between sites and interoperability
Alice Analysis Facility
Summary and Conclusions
28 June—2010
2GRID2010, Dubna, Russia
ALICE workload management
.Job agent infrastructureSubmitted through the Grid infrastructure available at the site (gLite, ARC, OCG)
Real jobs held in a central queue handling priorities and quotas
Agents submitted to provide a standard environment (job wrapper) across different systems, check the local environment
Real jobs pulled by the sites
Automatize operations
Extensive monitoring
3
28 June—2010 GRID2010, Dubna, Russia
WLCG Services of ALICE sites
4
VOBOX
With direct access to the experiment software area
This area has to be mounted also to all available WNs
CE front-end submit directly to the local batch system
gLite sites have to provide the CREAM-CE (Computing ResourceExecution And Management) under 1.6 version
Specific ALICE queue behind the CREAM-CE
Resource BDII must provide information about this service
Xrootd based Storage system
CPU requirements
2GB of memory per job for MC and reconstruction jobs
Optimization needed for some user analysis jobs
ALL Services must be at least SL5 (64bit)GRID2010, Dubna, Russia 28 June—2010
The gLite3.2-VOBOX layer
The gLite-VOBOX is a gLite-UI with 2 added features:
Automatic renewal of the user proxyLong-lived proxy in myproxy.cern.ch
Direct access to the experiment software areaAccess restricted to the voms role: lcgadmin
SINGLE local account
Unique service not accepting pool accounts
Access to the machine based on a valid voms-aware user proxy
gsisshd service on port 1975
5GRID2010, Dubna, Russia 28 June—2010
Example: the gLite-VOBOX
70% of the ALICE VOBOXES are running the gLite middleware
6
ALICE Services
Operating system (SL5/64b)
GENERIC SERVICE:
gLite3.2 VOBOX
Site admin responsibility
•Root privileges
• Installation, maintenance,
upgrades and follow up for
OS/generic service
ALICE responsibility
•User privileges (gsissh)
• Installation, maintenance,
upgrades and follow up for
ALICE services
ClusterMonitor
(8084)
CE (talks
through CM)
MonaLisa
(8884)PackMan
(9991)
gsisshd
(1975)
vobox-
renewd
To world
World and
WN
ANY
lxplus@CERN
and SAM UIs
WNSpecific CERN IP
range
Myproxy and voms
servers
Gridftp
(obsolete)
GRID2010, Dubna, Russia 28 June—2010
gLite-VOBOX: The proxy
renewal mechanism
7
Service Area: DAEMON
1.Connection to
myproxy@CERN
2.Connection to ALICE voms
server
3.Build up of a delegated
proxy from those registered
user proxies
User Area
• $HOME area assigned
• User must manually
register his proxy with the
VOBOX (in principle only
once…)
Myproxy
@
CERN
VOMS
Server
Loggin via
gsissh to the
VOBOX from a
UI with a valid
proxy
Require a
delegated
plain proxy
User must have
uploaded his proxy
before
Require valid
voms
extension
GRID2010, Dubna, Russia 28 June—2010
gLite-VOBOX operations
Site side
Minimal!
One of the most stable services
Easy installation and configuration
Experiment support side
Sites are supported at any moment by the CERN
team
Remember…. The gLite3.2 VOBOX has been created by
the ALICE support team in collaboration with the WLCG
All VOBOXES were monitored via SAM each hour –
today there is migration to NAGIOS
T0 VOBOXEs are monitored via Lemon with an alarm
system behind
T1 sites can be notified via GGUS alarm tickets
T2 sites will be notified on best effort base
8GRID2010, Dubna, Russia 28 June—2010
ALICE Workload Management
system
The ALICE tendency in the last two years is to
minimize the number of services used to
submit job agents
Already ensured with ARC, OSG and AliEn native
sites
Same approach followed on gLite sites
Through the CREAM-CE system
ALICE is asking for the deployment of this service at
all sites since 2008
ALICE has reduced the use of gLite3.X-WMS with the
deployment of AliEn v.2.18
The WMS submission module still available in AliEn v.2.18
31st
of May is the deadline established by ALICE to
deprecate the usage of WMS at all sites
ALL ALICE sites are encouraged to migrate to CREAM1.6
asap
9GRID2010, Dubna, Russia 28 June—2010
ALICE and the CREAM-CEALICE first put CREAM-CE in production and
provide developers with important feedback
2008 approach: Testing
Testing phase (performance and stability) at FZK
2009 approach: AliEn implementation and distribution
System available at T0, all T1 (but NIKHEF) and several T2
sites
Dual submission (LGC-CE and CREAM-CE) at all sites providing
CREAM
Second VOBOX was required at the sites providing CREAM to
ensure the duality approach asked by the WLCG-GDB
2010 approach: Deprecation of the WMS
The 2nd
VOBOX provided in 2009 has been recuperated
through the failover mechanism included in AliEnv2.18
Both VOBOXES submitting in parallel to the same local
CREAM-CE
ALICE is involved in the operation of the service
at all sites together with the site admins and the
CREAM-CE developers
10GRID2010, Dubna, Russia 28 June—2010
ALICE experiences with CREAM-CESites experience
Most of the sites admins confirmed a negligible
effort needed by the system
With the increase of the ALICE load in the last
months the stability of the system has been
confirmed
11GRID2010, Dubna, Russia 28 June—2010
CNAF
CNAF- CREAM
JINR- CREAM
Troitsk- CREAM
JINR
Troitsk
4 months of ALICE
data taking have
shown more efficient
operation of CREAM-
CE in comparison with
LCG-CE services
ALICE Data Management system
Data access and file transfer: via xrootdData access
T0 and T1 sites: Interfaced with MSS
T2 sites: xrootd (strongly encouraged) or DPM with xrootd plugin
File transfer
T0-T1: raw data transfer for custodial and reconstruction purposes
Data volumes proportional to the T1 ALICE share
T1-T2: reconstructed data and analysis
Default behavior: Jobs are sent where data are physically stored
External files can be locally copied or remotely access based on the user decision
Outputs of the jobs locally stored and replicated to (default) two external sites
Choice of storage based on availability and proximity matrix (dynamic)
12GRID2010, Dubna, Russia 28 June—2010
Storage discovery in AliEn
13GRID2010, Dubna, Russia 28 June—2010
The save directive determines only the
number of replicas and QoS type (disk, tape)
How to efficiently write N replicas of a file ?From jobs running inside the sites
By a user running its software on a laptop while at
home
same case as for a worker running somewhere in the clouds
Then, how to efficiently read the data when N replicas
are available?
In the end this is just a variation of the data locality
problem
Site structure
Storage status
14GRID2010, Dubna, Russia 28 June—2010
SEs failing the functionality tests are
removed from the storage matrix
Functionality tests are executed every 2
hoursadd, get, remove of a test file from a remote location
ML monitoring of the storage elements
15GRID2010, Dubna, Russia 28 June—2010
Discover and derived
network topologyEach SE is associated a set of IP
addresses
VOBox IP, IPs of xrootd redirector & nodes
MonALISA performs tracepath
/traceroute between all VOBoxes
recording all routers and the RTT (Round
Trip Time)of each link
SE nodes status & bandwidth between sites
Group of routers in the respective
Autonomous Systems (AS)
Distance(IP, IP) hierarchy v.s.:
same C-lass network, common domain ,
same AS, same country, use known
distance
between the AS-es , same continent et al.
Distance(IP, Set<IP>): Client's
public IP to all known IPs for the
storage
Examples
Storage discovery in AliEn -
Results
16GRID2010, Dubna, Russia 28 June—2010
Flexible storage configuration
QoS (Quality of Service) tags are all that users
should know about the system
We can store N replicas at once
Maintenance-free system
Monitoring feedback on known elements,
automatic discovery and configuration of
new resources
Reliable and efficient file access
No more failed jobs due to auto discovery and
failover in case of temporary problems
Use the closest working storage element(s)
to where the application runs
Sites connection and interoperability
Sites connection
Only at the data management level
In terms of job submission and services sites
are independent with each other
ALICE computing model foresees same service
infrastructure at all sites
ALICE assumes the T1-T2 connection
established by the ROC infrastructure for site
control and issue escalation
Interoperability between sites
Defined at the AliEn level
Definition of plug-ins for each middleware stack
Same AliEn setup for all sites
17GRID2010, Dubna, Russia 28 June—2010
New items: Nagios Migration
All experiments have to migrate their
current SAM infrastructure for the WLCG
services monitoring and tracking to Nagios
Infrastructure available at CERN and responsible
of the whole WLCG service infrastructure
ALICE has defined specific tests for the VOBOX
only
Tests for the CE and the CREAM-CE sensors also
running with ALICE credentials
Test suite belongs to the ops VO
Site availability calculation performed on the CE
sensor results basis ONLY
The important point was to migrate the VOBOX
specific tests into Nagios
• DONE!
18GRID2010, Dubna, Russia 28 June—2010
Nagios:: Steps performed
Migration of the test suite
Upload of the ALICE software to the specific
Nagios repository
Build up of the software in the developed
and testing Nagios infrastructure
Rpm installed in the Nagios ALICE UI
Population of the configuration files with the
ALICE ROCs
CERN, Central Europe, Italy, NGI_FRANCE,
UKI, Russia
Start up of the service
ALICE is publishing the results in Nagios
web interface since 48 h
19GRID2010, Dubna, Russia 28 June—2010
https://sam-alice.cern.ch/nagios/
(Left hand menu) Service
Groups: Summary
Service VO-box
Nagios::exampe
Nagios:: Next Steps
ALL SITES MUST PUBLISH THEIR
VOBOXES IN THE GOCDB
Monitoring flag set to YES
Ops test suites for the CE and the CREAM-
CE will be monitored from Nagios but
being executed with ALICE credentials
21GRID2010, Dubna, Russia 28 June—2010
New items: gLExecMotivation
Sites in the Grid run jobs on behalf of ALICE users. Currently, all jobs run as one system user on each site.
Sites want to see the jobs running by users the jobs belong to.Due to Accountability / Traceability which let sites ban or restrain certain users. Currently, they could only affect the hole VO.
How to run a job with a user - owner of the job ? A site knows the VO ALICE, but not all it’s users. How to ensure a job runs with the right user ?
gLExec is a program that can switch users just like su or sudo
It takes the command to run (a job) and a proxy certificate.
The proxy certificate is mapped to a local user, based on rules.
The command is executed as the mapped user
nothing more !22
GRID2010, Dubna, Russia 28 June—2010
Alien Job Execution
After application of gLExec
Present status
GRID2010, Dubna, Russia 28 June—2010
23
Alice Analysis Facility - AAF
GRID2010, Dubna, Russia 28 June—2010
24
Idea about Common Cluster of ALICE
PROOF facilities is based at unique protocol
and storage management on the GRID
Data is proxied locally to
adequately feed PROOF
From the 91 AliEn sites
ALICE Global
Redirector
ALICE Super
Master
SKAF CAF LAF JRAF
PROOF cluster storage
Take a PROOF cluster, with XROOTD storage
Use xrootd data management plug-in
Transform your AF into a proxy of the ALICE globalized
storage, through the ALICE GR
Also support sites not seen by the GR, through internal
dynamic prioritization of the AliEn sites.
Data management: how does the data appear?
Manual (or supervised) pre-staging requests, or:
Suppose that an user always runs the same analysis several
times
Which is almost always true
The first round will be as fast as the remote data staging, the
subsequent queries will be fast (data is already local)
25GRID2010, Dubna, Russia 28 June—2010
Alice Analysis Facility Example
GRID2010, Dubna, Russia 28 June—2010
26
Summary and conclusions
1/2For the data taking, the ALICE approach is to lighten the site requirements in terms of services
This approach is quite realistic and has demonstrated to work quite well
Able to escalate to the level of available man power at the core team @ CERN and the sites
High performance in terms of VOBOXEs, CREAM-CE and xrootd is needed
This is critical: if any of these services fails, your site is down
With the exception of the T0@CERN, there is a built-in redundancy
Site(s) malfunction will not stop the ALICE processing
But will slow it down considerably!
Sites are supported by a team of experts at any time
This approach has demonstrated its scalability
Same team since years with a clear increase of the sites entering in production
27GRID2010, Dubna, Russia 28 June—2010
28GRID2010, Dubna, Russia 28 June—2010
Summary and conclusions2/2
During the 1st data taking, the ALICE computing model and its infrastructure
has demonstrated a high level of stability and performance.
It works!
Extra slides
The VOBOX
1. Primary service required at every site - Service scope: * Run (local) specific ALICE services
* File system access to the experiment software area
- Mandatory requirement to enter the production * Independently of the middleware stack available
* Same concept valid for gLite, OSG, ARC and generic AliEn sites
1. Same setup requirements for all sites- Same behavior for T1 and T2 sites in terms of production
- Differences between T1 and T2 sites a QoS matter only * Applicable to ALL generic services provided at the sites
1. Service related problems should be managed by the site administrators
2. Support ensured at CERN for any of the middleware stacks
30GRID2010, Dubna, Russia 28 June—2010
Failover mechanism: The
use of the 2nd VOBOX
This approach has been included in AliEnv2.18 to take advantage of the 2nd
VOBOX deployed in several ALICE sites
All sites which provided a CREAM-CE in 2009 had to provide a 2nd VOBOX for ALICE
Used to submit to the LCG-CE and the CREAM-CE backend in parallel
Message to the sites: Do not retire the 2nd
VOBOX, we will keep on using it in failover mode with the 1st VOBOX
04-May-2010ALICE - FAIR Offline Meeting 31