Post on 27-Dec-2015
Executive Director ReportRuth Pordes
OSG Council Meeting, August 5th 2008
SummaryWe have met the performance and functionality needs for initial LHC data taking.
2
We have released the baseline production version of OSG software.
We have an energetic, solid team: committed, open and collaborative.
LIGO, D0, Engage production usage/efficiency /usability have made progress.
A potentially significant Campus and User entrant in the Structural Biology Grid (SBGrid). (need to understand expectations).
August 5th, 2008
Outline
Accomplishments – each one generating a question.
Interactions with the Joint Oversight Team
Security Report
3
August 5th, 2008
Deliverables to CCRC’08, ATLAS Full Dress Rehearsal (FDR-2) ,CMS Computing & Analysis Challenge (CSA08) met currently “hard” to separate out OSG specific contributions?
Up to 120 TB/day transfers over a month on >100 links many including Tier-2 end-point.
Robust job management and execution on Tier-2s - job throughputs of >40,000/day.
4
End-to-End Data Transfer Throughput(ATLAS)
1.0 GByte/Sec
Scaled to ~450 users across ~ 30 Tier2s.
Simultaneous with end-to-end Cosmic Ray running.
Simultaneous simulation production of >10Mevents/week.Mix of Production and Analysis Jobs(CMS)
80,000 / day
August 5th, 2008
LIGO usage increasing following focus on WS-Gram deployment/testing for Einstein@home
5
Note: Migration of majority of use from Nebraska to BNL to Purdue. why?
August 5th, 2008
Chemistry - Andrew Shultz, University of Buffalo.
Application to model virial coefficients of water.Anticipate research highlight/publication this summer.78,000 jobs consuming average of about 100 CPU days/day over 6 months.
6
User registered to the NYSGrid VO:So, how do we know it is Chemistry?Would running as part of the Engage VO
help?
August 5th, 2008
Computational Biology:Protein Folding/Structure
7
Assistant professor and 2 students running fairly steadily.
~620,000 CPU wallclock hours in 2008 (average 120 cpudays/day for ~210 days).
Expect research highlight in the next few months.
When does a set of individuals become a community/VO?
August 5th, 2008
Engage use is growing.Use continues to be cyclic. More sites used in 2008 than in 2007
sitename CPU DaysMIT_CMS 8,344USCMS-FNAL-WC1-CE 4,201UCSDT2 1,975UCR-HEP 1,749FNAL_CDFOSG_2 1,745BNL_ATLAS_1 1,625FNAL_GPFARM 1,437FNAL_CDFOSG_3 1,286CIT_CMS_T2 1,218Purdue-RCAC 1,210FNAL_DZEROOSG_2 1,209FNAL_DZEROOSG_1 1,109FNAL_GPGRID_1 1,047NERSC-Jacquard 958TTU-ANTAEUS 658FNAL_CDFOSG_4 588UCLA_Saxon_Tier3 441Nebraska 380GLOW 280SBGrid-Harvard-East 220OCI-NSF 185UFlorida-HPC 154Clemson-IT 121UWMilwaukee 105 8
August 5th, 2008
Request from D0 for access to local Storage Resources to
improve efficiency• Council asked ET to help.
With OSG 1.0 opportunistic & reserved storage more widely supported. US ATLAS, US CMS Tier-2s offered to let D0 use storage up to 1 TB /site over 3 ATLAS and 3 CMS sites.
• Help from the OSG site admins, users group, storage, D0 throughput increased from 3.6 to 5 and 4 M events /week over the past 2 weeks. efficiency is >50% some days and then (last weekend) can
go down to ~28%. D0 and OSG continue to track down specific problems.
• D0 gathering efficiency plots over time http://www-d0.fnal.gov/~snow/jobscan/effs.html
• How should OSG organize and prioritize ongoing support/help for robust, effective use of “grown-up” VOs?
9
August 5th, 2008
Recorded use ~15,000 CPUweeks/week.
Lack of availability? ability? need? to use cycles from locally reduced use by LHC
10
August 5th, 2008
Major Software Release in June 2008
OSG 1.0 Expect to maintain this as the
main baseline software version.
Many sites upgraded quickly – confidence or timeliness?
Testing included configuration & simple tests of Opportunistic Storage (dCache, Bestman).
Enabling US LHC Tier-2 site availability reporting to the WLCG with first official reports for July.
11
August 5th, 2008
Infrastructure Accomplishments
• Significantly improved local site monitoring tools (RSV).
• Well received set of storage tools released for administrators.
• Initial use by VOs of opportunistic storage. • Improved administrative information
capabilities (OIM).• Support for additional LHC Tier-3s:
US CMS: UCLA, UMD, FlTech, UIC US ATLAS: UIUC, UWisc-Madison, Iowa State (earlier?)
12
August 5th, 2008
Production Information
• OIM
• RSV
13
Number of Production CEs 86
Number of production SRMV2 SEs 17 (4 Bestman)
August 5th, 2008
Jump in # of support tickets for OSG 1.0 configurations –ensure all sites configured right for Information/Monitoring/Accounting
14
Interactions with the Joint Oversight Team
August 5th, 2008
Summary of Interactions with JOT
• New program managers: Don Petravick – DOE HEP Susan Turnbull – DOE ASCR
• Visit by core-Executive: Miron, Chander, Ruth – to DOE and NSF in June: Fred Johnson, ASCR, and Susan in the am. Moishe, Marv, Susan in the pm.
• Action item: institute regular JOT, OSG mgmt, US ATLAS & US CMS S&C mgmt phone meetings.
16
August 5th, 2008
First EJOT phone meeting
Don: “Goal to understand the reliance of the LHC
experiments on the OSG, and to understand that status of that reliance.”
Discussed kind of items that in a future work agenda:Experience at Run II that more effort is required when
experiments start up.Another meeting planned in 2-3 months.
17
Security Report
Mine Altunay, FNALOSG Security Officer
For the OSG Security Team:Doug Olson, Deputy Security Officer, LBNL,
Jim Basney NCSA, Ron Cudzewicz FNAL,
August 5th, 2008
Change in OSG – JSPG relationship
19
• No mandatory acceptance of JSPG policies in OSG. We contribute in the working groups to make
policies as uniform as reasonable & give feedback. Agreed to by OSG, EGEE, WLCG.
• We work with US LHC S&C and WLCG on OSG policies & to communicate (& agree on) differences from those recommended by JSPG. Contact is: Dave Kelsey, WLCG Security
Coordinator
August 5th, 2008
Title Comments
VO AUP Template OSG has a template AUP policy that member VOs are required to fill out.
VO User Registration and Management Template
OSG has a template policy that member VOs can fill out.
OSG Security Incident Handling and Response Plan
The JSPG’s incident handling and response policy is based on earlier OSG policy. Thus, two policies are compatible. Work is needed to address cross-grid coordination. Also, EGEE has a separate policy/procedure on software vulnerabilities, while OSG doesn't.
Grid Acceptable Use Policy The OSG and JSPG policies are identical.
Approval of Certificate Authorities The OSG specific policy complies with the JSPG policy.
Service Agreement OSG-specific policy with no equivalent JSP policy.
Policy on Grid Pilot Jobs The JSPG policy is based on a Fermilab policy, which will provide the basis for the OSG policy. OSG does not have a specific document produced yet.
Privacy Policy This is an OSG-only policy. It has been sent to the OSG EB and received feedback.
VO Operations Policy OSG has sent comments to the JSPG’s final call. There is no OSG-specific version of this policy
Site Operations Policy OSG approved the JSPG policy. OSG has not produced an OSG-specific policy yet.
Traceability and Logging Policy OSG has sent comments to JSPG’s policy. OSG does not have any auditing requirements formally approved beyond those in AUPs and accounting
VO Registration Policy and Site Registration Policy
These two policies are not reviewed by OSG yet. JSPG will start working on these documents soon and OSG will send comments later.
Working with VOs on appropriate detail and contents
List of OSG Policies
August 5th, 2008
Recent Iranian CA discussions and questions
• OSG EB (Kent & Bill) raised the issue of EAR wrt Iranian CA. FNAL security help determined that site size determines
threshold. Did a rough survey of OSG sites. No university sites are
near this threshold.
• Such policies were, are, and remain, the responsibility of the Site. We will undertake some understanding for communication purposes, but we have no responsibility here.
• For clarity we may add a sentence to our AUP saying we do not collect citizenship information in OSG?
21
August 5th, 2008
Recent Incidents/Alerts
• A root-level compromise at one site No grid incident detected Take-home messages: Good test for VO security officers: Very important for VOs to identify Security Officers.
• Alert at 2nd site was a mis-communication due to system admin mis-configuration.
• EGEE security challenge resulted in USCMS having a poor score Confusion over which policies to follow being addressed. Now agreed with EGEE security officer - OSG will be
involved in the next challenge. Issue of EGEE or WLCG challenge still to be clarified.
August 5th, 2008
Standing Issue:
• Partner grids with multiple VOs and sub-VOs:
• Identifying which VO/Application is the job submitter We report the partner grid as the VO How to provide the finer-granularity? How to provide VO usage policy?