EDLD 5398 Weeks 1_2_Part 2_Course Embedded Summary Kimberly McKay
GGUS summary (2 weeks)
description
Transcript of GGUS summary (2 weeks)
GGUS summary (2 weeks)
VO User Team Alarm Total
ALICE 1 0 1 2
ATLAS 14 116 6 136
CMS 4 1 1 6
LHCb 1 20 1 22
Totals 20 137 9 166
1
04/21/23 WLCG MB Report WLCG Service Report 2
Support-related events since last MB
• We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support.
•There were 9 ALARM tickets since the last MB (2 weeks), 5 of which were real, all submitted by ATLAS. Details follow…
ATLAS ALARM->CERN-CNAF TRANSFERS
•https://gus.fzk.de/ws/ticket_info.php?ticket=62761
04/21/23 WLCG MB Report WLCG Service Report 3
What time UTC What happened
2010/10/05 9:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_Italy.
2010/10/05 10:23 Site acknowledges ticket and finds a StoRM backend problem.
2010/10/05 12:03 Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.
2010/10/11 Submitter ‘verifies’ ticket GGUS:62745. Not sure how ‘symptomatic’ the solution was…
ATLAS ALARM->TRANSFERS TO .FR CLOUD
•https://gus.fzk.de/ws/ticket_info.php?ticket=62871
04/21/23 WLCG MB Report WLCG Service Report 4
What time UTC What happened
2010/10/08 5:56 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to NGI_France.
2010/10/08 6:31 Site acknowledges ticket and finds a network problem preventing all DB server access.
2010/10/08 7:29 Service restored.
2010/10/08 10:41 Site puts ticket to status ‘solved’.
ATLAS ALARM-> CERN SLOW LSF
•https://gus.fzk.de/ws/ticket_info.php?ticket=62467
04/21/23 WLCG MB Report WLCG Service Report 5
What time UTC What happened
2010/09/27 15:34
GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/09/27 16:01
Operator acknowledges ticket and contacts the expert.
2010/09/27 16:37 Expert’s 1st diagnosis. Too many queries.
2010/09/27 20:10 Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’.
ATLAS ALARM-> CERN SLOW AFS
•https://gus.fzk.de/ws/ticket_info.php?ticket=62662
04/21/23 WLCG MB Report WLCG Service Report 6
What time UTC What happened
2010/10/01 7:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/10/01 7:33 Operator acknowledges ticket and contacts the expert.
2010/10/01 9:37 IT Service manager re-classifies in CERN Remedy PRMS.
2010/10/11 15:33
Still ‘in progress’. Reminder sent during this drill.
ATLAS ALARM-> CERN CASTOR
•https://gus.fzk.de/ws/ticket_info.php?ticket=62688
04/21/23 WLCG MB Report WLCG Service Report 7
What time UTC What happened
2010/10/01 16:24
GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/10/01 16:41
Operator acknowledges ticket and contacts the expert.
2010/10/01 16:42
Expert starts investigation.
2010/10/01 17:23
Solved. PutDONE in SRM not propagated to CASTOR. Done by hand.