what were the issues in 2011, minimizing impact and improving recovery

30
what were the issues in 2011, minimizing impact and improving recovery M.Solfa roli Techni cal Stops: Acknowledgments: S.Claudet, K.Dahlerup, F.Duval, K.Foraz, M.Nonis, M.Poyer, H.Thiesen, J.Uythoven, W.Venturini Evian 2011

description

Evian 2011. Technical Stops:. what were the issues in 2011, minimizing impact and improving recovery. Acknowledgments: S.Claudet , K.Dahlerup , F.Duval , K.Foraz , M.Nonis , M.Poyer , H.Thiesen , J.Uythoven , W.Venturini. M.Solfaroli. Outline. General view Methodology - PowerPoint PPT Presentation

Transcript of what were the issues in 2011, minimizing impact and improving recovery

Page 1: what were the issues in 2011, minimizing impact and improving recovery

what were the issues in 2011, minimizing impact and improving

recovery M.Solfa

roli

Technical

Stops:Acknowledgments: S.Claudet, K.Dahlerup, F.Duval, K.Foraz, M.Nonis, M.Poyer, H.Thiesen, J.Uythoven, W.Venturini

Evian 2011

Page 2: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 2Evian - 12/12/11

Outline

General view

Methodology

The 5 TSs one by one

Pulling the numbers together

Conclusions

Page 3: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 3Evian - 12/12/11

General OverviewStart

dateEnd date Length Time before

next TS (days)

TS#1 28/03 31/03 4 days 38 (5.4 wk)

TS#2 09/05 12/05 4 days 52 (7.4 wk)

TS#3 04/07 08/07 5 days 51 (7.3 wk)

TS#4 29/08 02/09 5 days 65 (9.3 wk)

TS#5 07/11 11/11 5 days 26 (3.7 wk)

Time allocated for recovery: 1 day

23 % of global time = TUNNEL

INTERVENTIONS(including Xmas break)

Page 4: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 4Evian - 12/12/11

Methodology

HOW IT HAS BEEN DONE: All TS interventions screened and all recovery issues examined Some beam checks need to be done repetitively (i.e. loss

maps)...should they be considered as part of recovery time? YES! What does represent the limit between recovery and operation

(back in SB, back on program, first beam in,...)?

GOALS: Spot out possible correlations between

recovery time and TS interventions Identify major problems Investigate strategies for improvement

BEGINNING END

TS Mon @7am First HW re-commissioning test

MACHINE RECOVERY First HW re-commissioning test First beam commissioning test

BEAM COMMISSIONING First beam commissioning test GOAL of the week is accomplished

Page 5: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 5Evian - 12/12/11

WHERE WERE WE?

Slot of 1.38 TeV operation Last 3.5 TeV physics fill (1645):• 200 b (24 bpinj) - (ready for 296 b)• ~1.22E11 p per bunch• Peak lumi: 2.5E32 cm-2 s-1

TS#128 – 31 March – 4 days + 1 recovery day

60%

30%10%

3243 keys given

GOAL of the weekRecovery from TS and start

preparation for high intensity

Page 6: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 6

Sat 2nd ~2pm

Recovery

Evian - 12/12/11

TS#1 - Recovery

tThu 31st 6.25pm

CRYO

Fri 1st 10.29am

Fri 1st 9.59pm

Beam comm

Start of HWC

Global CRYO start First

pilot

Inj region aperture measurements for

higher intensity

Activity Duration [h]Tunnel activities (TS) 84

Recovery 31

Beam commissioning 12

TOT 127

MKB.B2

Sat 2nd 01.23am

Dump @450GeV

66%

10%24%

TOTAL NOT related HW SWTIME lost 14.5 h 48% 62% 38%

Mon 28th

7am

TS

Page 7: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 7Evian - 12/12/11

TS#2

WHERE WERE WE?

Last 3.5 TeV physics fill (1755):• 768 b (72 bpinj)• ~1.25E11 p per bunch• Peak lumi: 9E32 cm-2 s-1

o Quench propagation test on RB.A56o FPA on RQD/F.A67

2831 keys given

60%

16% 24%

09 - 12 May – 4 days + 1 recovery day

GOAL of the weekTS, re-establish physics conditions,

alignment of RPs, VdMs

Page 8: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 8

CRYOCRYO

Evian - 12/12/11

TS#2 - Recovery

tThu 12th 2.10pm

RecoverySat 14th 2.10pm

Beam commSun 14th

9am

Start of HWC

Global CRYO start First

pilotPreparation

for VdM scan

Activity Duration [h]Tunnel activities (TS) 79

Recovery 55(27h cryo stop not counted)

Beam commissioning 12

TOT 146

Thu 12th 6.31pm

Fri 13th 11am

Lost P8(communication)

Fri 13th 08.42am

Sat 14th 9pm

Pilots cycle

54%38%

8%

TOTAL NOT related HW SWTIME lost 15.5 h 94% 90% 10%

Mon 9th

7am

TS

~27 h

Page 9: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 9

04 - 08 July – 5 days + 1 recovery day

Evian - 12/12/11

TS#3 - The longest one

WHERE WERE WE?

Last 3.5 TeV physics fill (1901):• 1380 b (144 bpinj)• ~1.16 E11 p per bunch• Peak lumi: 1.26E33 cm-2 s-1

o Quench propagation test on RB.A56o Test of DQQBS boards in S81

3062 keys given

65%

26%9%

GOAL of the weekContinue with the satellite

collision scheme(264b and higher)

Page 10: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 10

CRYO

Evian - 12/12/11

TS#3 – Recovery

tFri 8th 7.19pm

RecoveryMon 11th 7.22pm

BeamThu 14th

7am

Start of HWC

Global CRYO start First

pilotEnd of loss

maps

Activity Duration [h]Tunnel activities (TS) 108

Recovery 22

Beam commissioning 22(18 + 4)

TOT 148

Sat 9th 8.05am

Sun 10th 11.11am

Sat 9th

5.27pmTue 12th 3.15pm

InjectionCRYOPower cut

When the power cut arrived, beam commissioning was almost completed

(a TCDQ-TCSG cross check was ongoing)

Recovery Beam comm

TOTAL NOT related HW SWTIME lost 19.5 h 46% 85% 15%

14.5%14.5%

71%

Mon 04th

7am

TS

Page 11: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 11Evian - 12/12/11

TS#4

WHERE WERE WE?

Last 3.5 TeV physics fill (2040):• 1380 b (144 bpinj)• ~1.29E11 p per bunch• Peak lumi: 2.1E33 cm-2 s-1

3645 keys given

69%

24%7%

o Quench propagation test on RQD/F.A56

29 August - 02 September – 5 days + 1 recovery day

GOAL of the weekTS, recovery from TS, 1 m

beta*, Alice polarity change

Page 12: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 12

CRYO

Evian - 12/12/11

TS#4 - Recovery

tMon 29th

7amFri 2nd

2.10pm

Beam commSat 3rd

10.59am

Start of HWC

Global CRYO start First

pilot

Start of fill that reached

beta* 1m

Activity Duration [h]Tunnel activities (TS) 106

Recovery 9

Beam commissioning 9

TOT 124

Fri 2nd

5.14pmFri 2nd

10.33pm

RecoverySat 3rd

1.41am

First pilots ramp

86%

7% 7%

~3 h without beam from injectors

TOTAL NOT related HW SWTIME lost 6.5 h 15% 8% 92%

TS

Page 13: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 13Evian - 12/12/11

TS#5

WHERE WERE WE?

Last 3.5 TeV physics fill (2267):• 1380 b (144 bpinj)• ~1.5E11 p per bunch• Peak lumi: 3.5E33 cm-2 s-1

o Quench propagation test on RQD/F.A56

3404 keys given

70%

24%6%

07 - 11 November – 5 days + 1 recovery day

GOAL of the weekTS, recovery, ions (stable

beams) over the weekend

Page 14: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 14

CRYO

Evian - 12/12/11

TS#5 - Recovery

tFri 11th

4.58pm

Beam commSat 12th

06.40am

Start of HWC

Global CRYO start First

pilotPhysics

conditions

Activity Duration [h]Tunnel activities (TS) 107

Recovery 7

Beam commissioning 6

TOT 120

Fri 11th 5.48pm

Sat 12th

00.13am

RecoverySat 12th

00.32am

Asynch BD @ 450 GeV

Pb commissioning included

Sat 12th

02.31am

SB

89%

6% 5%

TOTAL NOT related HW SWTIME lost 4 h 50% 62% 38%

Mon 7th

7am

TS

Page 15: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 15Evian - 12/12/11

Keys Maintenance Improving Problem fixing

TS#1 3243 60% 30% 10%

TS#2 2831 60% 24% 16%

TS#3 3062 65% 26% 9%

TS#4 3645 69% 24% 7%

TS#5 3404 70% 24% 6%

Tunnel activities vs TSs

Some numbers…

Page 16: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 16Evian - 12/12/11

Time lost [h] NOT related HW SW

TS#1 14.5 48% 62% 38%

TS#2 15.5 94% 90% 10%

TS#3 19.5 46% 85% 15%

TS#4 6.5 15% 8% 92%

TS#5 4 50% 62% 38%

Pretty low statistics to have meaningful conclusions......in general HW issues require more time to be fixed

…more…

Page 17: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 17Evian - 12/12/11

…and more!Recovery + Beam

commissioningTOT TS time

(x-1)*24 + 12 + 24Recovery coefficient

(theoretical)Recovery

coefficient (real)

TS#1 43 h 108 h 0.22 0.4

TS#2 40 h(67 h including cryo stop) 108 h 0.22 0.37

TS#3 44 h(130 h considering the power cut) 132 h 0.18 0.33

TS#4 18 h 132 h 0.18 0.13

TS#5 13 h 132 h 0.18 0.09

X = number of days allocated Allocated time for recovery = 24 h

Recovery time vs TSs Recovery coefficient

Page 18: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 18Evian - 12/12/11

• Cryo: no systematic maintenance anymore, it would be OK with 4 (or even 3) TSs provided a minimum of 5 days each (this year was good!)

• QPS: minimum of 3 stops (ideally in May, August and October). Minimum duration 4 ½ days as in 2011.

• EPC: not particular systematic intervention needed, but at least 1-2 days every 9 – 10 weeks

• EL: TSs needed to repair/reset/maintain some systems that are not in good shape. Nevertheless, till the machines work…

• CV: global time already at the limit. It would be difficult to keep for more than 8-9 weeks

• Experiments? (CMS: “we would be ready to go to 4/5d every 10 weeks if needed”)

• Injectors?

Client requests

Page 19: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 19Evian - 12/12/11

Conclusions Need to improve fault details recording

Most of activities is maintenance, can it be reduced?

No systematic source of trouble over the 5 TSs !!

It seems clear that we are improving in recovery… (“After TS, an increment in faults was observed. Effect

is decreasing along the run” Walter @Chamonix 2011) Need to apply a control for SW changes (through a

meeting to coordinate and create a list?) which could: Improve changes, by coordinating them Increase operational efficiency, by making easier

the identification of the source of problems Reduce impact of changes on other systems

4 TSs foreseen for 2012...can we push forward some maintenance and have 3 TSs of 5 days?

2012

Page 20: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 20Evian - 12/12/11

Back-up slides

Page 21: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 21Evian - 12/12/11

TS#1 – Issues 1/2

Issue Duration Reason Connected?

BLM sanity check failure ~12 h but temporary solution adopted Update of configuration DB Y

RB.A12 multiple trip 3 h Card to be changed Likely (active filter manipulation)

Vacuum valve control 4 h not continuously RBAC issue Likely (RBAC modification?)

RCS.A78B2 trip 2 h Power module to be changed N

QPS (B16.R1 opening quench loop) 1.5 h Noise coming from SPS quad N

Beam process regeneration failure 0.5 h Timeout in DB connection to

be increased N

Loading setting on PCs

0.5 h

Ref mismatch Likely (connected to DB

configuration update?)

...on COLL Position error

...on PCs Java exception

RCBXH3.L1 trip 1 h Manipulation on PC N

Page 22: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 22Evian - 12/12/11

TS#1 – Issues 2/2

Issue Duration Reason Connected?

Dump kicker fault 1 h Local intervention Likely

QPS on RQD/F.A81 0.5 h Expert reset N

RQTD.A45B1 trip 1.5 h Short circuit on the current lead Y

ROD./F.A34B1 trip 0.5 h Fault reset by EPC N

XPOC 1 h Expert check N

Orbit instability (not blocking) ~2 h Not fully understood but solved by

taking out RPLA.30L4.RCBV30.L4B1 N

RF cavity 6B2 trip 0.5 h Spurious interlock N

XPOC/IPOC failure on dump 0.5 h Expert check N

COLL dump 0.5 h TCDQ position interlock enabled during TS Y

Page 23: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 23Evian - 12/12/11

TS#2 – Issues 1/2

Issue Duration Reason Connected?

RB.A23 quench loop cannot be closed 3 h Broken relay to be changed N

RD3.R4 quench3.5 h was OK, but

further investigation done the day after

Reset solved the problem N

RD34 Few min Acc value was incorrect Possibly

DIP communication glitch Few min unknown N

HS failure Masked in SIS, but solution the day after Parameter manipulation on SIS Y

RF desynchronized 0.5 h “We do not know why (aging, temp, humidity...?)” N

CRYO loss 26.5 h Communication N

VAC valves cannot be opened 0.5 h Unknown N

Page 24: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 24Evian - 12/12/11

TS#2 – Issues 2/2

Issue Duration Reason Connected?

XPOC latch and IQC failure 0.5 h DB manipulation done during TS Y

Interlock on TCLIB.6R2B1 2 h Broken sensor N

S12 & S23 trip 4 h AUG TI2 N

DFB level manipulation in S78 1.5 h Level instable N

LBDS self-triggering 3.5 h Power supply of MKD generator M broke N

XPOC analysis repetitive faults n.a. Missing VAC data and wrong

dump patter Y

Page 25: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 25Evian - 12/12/11

TS#3 – Issues 1/2

Issue Duration Reason Connected?

Cannot drive settings on TDI in P8 1.5 h Left in expert mode Y

RQD/F.A78 switches cannot be closed 5h Fuse to be changed N

Beam dump faulty for both beams 0.5 h Not clear N

MCS checks on MKD2 and SMP E limit failure 2.5 h HW and DB inconsistent Y

RCO.A56B2 QPS not OK 0.5 h + 0.5 h Expert reset needed N

MKI1 fails during softstart 2 h Possibly

LBDS cannot be armed 0.5 h Possibly

XPOC expert signature needed 0.5 h Server to be reboot Probably

RQ10.L1B2 trip 3 h Water fault Y

Page 26: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 26Evian - 12/12/11

TS#3 – Issues 2/2

Issue Duration Reason Connected?RQ6.L8B1/B2 trip 0 h Water fault Y

Patrol lost 2 h PAD problem N

Problem with actual trim

0.5 h Re-deployment needed LikelyADT setting cannot be charged

Actual trim not working for lumi-knob

XPOC manual reset needed during test

Test was not needed for safe beam...initially delayed, then postponed to next TS

Trying to solve a bad timing problem in inject&dump.Not solved, rolled-back!

Y, but problem was already

present

RQ4.R6B2 trip 0.5 h FGC fault N

Page 27: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 27Evian - 12/12/11

TS#4 - Issues

Issue Duration Reason Connected?

COLL error 2 h CO modification to be rolled-back Y

DC BCT calibration failure Can live with Likely

BPM calibration failure 2 h Crate to be re-started Likely

HS not working properly Still working Not clear Possibly

XPOC EIC reset does not work 1 h Had to be rolled-back Y

Problem with FIDEL corrections

QFB switching OFF + chroma measurement problem

0.5 h Tune fitter settings re-deployed Likely

4 RQDFs trip 0.5 h QFB correction too high N

XPOC reset failure 0.5 h Not clear N

OP vistar not working Not blocking Not clear possibly

Page 28: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 28Evian - 12/12/11

TS#5 - Issues

Issue Duration Reason Connected?

Some settings don’t match DB Y

Some VAC valves don’t open 0.5 h VAC still not good Y

DIP and vistar problem 0 h

RF interlock cannot be reset 2 h

ITL1 wire position not OK Not blocking SIS masked Y

XPOC latched 1 h TSU failure Y

BSRA publishing not valid results 0 h “check mode” left (masked in

the XPOC Y

XPOC failure BLM values out of threshold

Page 29: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 29

CRYO

Evian - 12/12/11

TS#3 - Recovery

tMon 04th Fri 8th 7.19pm

TS RecoveryMon 11th 7.22pm

BeamThu 14th

7am

Start of HWC

Global CRYO start First

pilotEnd of loss

maps

Activity Duration [h]Tunnel activities (TS) 108

Recovery 92 (32h cryo stop)

Beam commissioning 38 (2 h SB not considered)

TOT 238

Sat 9th 8.05am

Sun 10th 11.11am

Sat 9th

5.27pmTue 12th 3.15pm

Injection

2 h SB with 2 bunches done meanwhile

CRYOPower cut

When the power cut arrived, beam commissioning was almost completed

(a TCDQ-TCSG cross check was ongoing)

Recovery Beam comm

16%45%

39%

Page 30: what were the issues in 2011, minimizing impact and improving recovery

M.Solfaroli - Technical Stops 30Evian - 12/12/11

General Overview

MD,

tech

nica

l sto

p

MD,

tech

nica

l sto

p

MD,

tech

nica

l sto

p, S

QU

EEZE

MD,

tech

nica

l sto

p, s

crub

bing

75 ns50 ns

Smaller emittance

Increased n. bunches Increased bunch

intensityBeta* = 1m

28 % of global time = TUNNEL INTERVENTIONS