Site Validation Session Report

9
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June 19-20th 2006

description

Site Validation Session Report. Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June 19-20th 2006. Service Availability Monitoring (SAM) - “extension” of SFT:. - PowerPoint PPT Presentation

Transcript of Site Validation Session Report

Page 1: Site Validation Session Report

Site Validation Session Report

Co-Chairs:

Piotr Nyczyk, CERN IT/GD

Leigh Grundhoefer, IU / OSG

Notes from Judy Novak

WLCG-OSG-EGEE Workshop

CERN, June 19-20th 2006

Page 2: Site Validation Session Report

Service Availability Monitoring (SAM) - “extension” of SFT:• generalized framework to monitor all

LCG/EGEE services and not only CE: BDII, RB, LFC, FTS, etc.

• most of the sensors run remotely (from central machine)

• no installation needed on service machines• moved from MySQL to Oracle, optimized

data schemaAvailable at: https://lcg-sam.cern.ch:8443/sam/sam.cgi

Page 3: Site Validation Session Report

• SAM sensors:– currently: BDII (Taiwan), RB (RAL), CE, SRM, LFC, FTS, SE

(CERN)

• release updates + SAM (SFT) – certifying current tests with each new release– Create update tests as necessary– CA cert. releases are special

• Availability views– current, daily, weekly, monthly– For CE, SE, SRM, siteBDII – displayed with GridView

http://glite.cvs.cern.ch/cgi-bin/glite.cgi/sft2/tests/

Page 4: Site Validation Session Report

OSG Validation services

• CE/SE Validation aggregation : VORS - site scanner, BDII info – http://vors.grid.iu.edu/

• OSG VO’s VOMS validation– http://voms-monitor.grid.iu.edu/

• GridEX - application validation ( pilot job submissions )– http://www.cs.wisc.edu/condor/tools/exerciser/

• Site Policy template and publication– http://vors.grid.iu.edu/site_policies.html

• GIP Validation– http://grow.its.uiowa.edu/osg-gip/Production.shtml

• Monitoring validation : MonALisa Client status (VO Jobs I/O) – http://grid02.uits.indiana.edu:8080/stats?page=summary

• GridCat and the MIS-CI client – http://osg-cat.grid.iu.edu/ - Production instance– Client software: http://software.grid.iu.edu/pacman/tarballs/misci-0.4.1.tar.gz

Page 5: Site Validation Session Report

Summary

• It seems to be impossible to avoid cross-monitoring (OSG monitoring doesn't include LCG-specific services, and the other way around)

• We should synchronize on VO level, but LCG/EGEE is also using regional structuring

Page 6: Site Validation Session Report

OSG and EGEE Validation Interoperability

• Site discovery - using discovered sites using BDII– Ops VO - supported only on OSG sites which are

interoperable. (fully deployed in July)– How can we determine if EGEE site is

interoperable? Review certain BDII informations

• Cross installation of necessary tools and libraries for site validation– LCG tools - added as optionally installed package

for OSG sites– OSG environment variables - ? (GIP)

Page 7: Site Validation Session Report

OSG and EGEE Validation Interoperability (cont)

• Use of existing GGUS- OSG GOC ticket exchange for error reporting– SAM database to use contact information for OSG

GOC• Issue of coordinating scheduled downtime

– OSG GOC will maintain a web page with downtimes• Propose review of effort to add OSG specific

validations to SAM framework. • Testing and iterative development will be accomplished

using Pre-Production sites and OSG ITB

Page 8: Site Validation Session Report

DB monitoring in SAM for Tier 1’s (Dirk Duellmann)

• Jobs are connecting to the DB with either http (VO lib) or direct Oracle (instant client)

• Should be completed by October when experiments will start using DBs

• CMS + Alice don't need them, but only 'squid’• existing DB monitoring is too detailed for SAM/SFT, but SAM

could provide highlevel monitoring of DB service• some DB services (like LFC) are already tested by SAM, BUT

only the functionality is tested, not the DB! The test could be:– threshold for connection between T0 -> T1– user access (squid)– client latency (?)

• Oracle client will be installed on the Worker Nodes

Page 9: Site Validation Session Report

Comments/Discussion