Operation in AB-CO 2005 & Beyond
-
Upload
asher-mcgee -
Category
Documents
-
view
33 -
download
1
description
Transcript of Operation in AB-CO 2005 & Beyond
Scope Scope
How to ensure a support to operation with How to ensure a support to operation with the right quality of servicesthe right quality of services
Domains Are:Domains Are: PS Complex : Linac2, Linac3, PSB, PS, AD, Isolde (+
REX), LEIR SPS & Transfer lines Experimental area CTF3 LHC Hardware commissioning
Cryogenic systems Beam interlock & Powering interlock Systems QPS Vacuum, PO
LHC
ObjectivesObjectives
Homogenize principles through the Homogenize principles through the different domains different domains
Include the new requirements Include the new requirements Hardware commissioning LHC commissioning & operation
Identify and Agree with partners on Identify and Agree with partners on responsibility limitsresponsibility limits
Emit recommendations on, organization Emit recommendations on, organization tools, procedures,tools, procedures,
AB CO working groupAB CO working group
Section AP Section DM Section FC Section HT Section IN Section IS
Eugenia Hatziangeli
Ronny Billen
Nicolas de Metz-Noblat
Jean-Claude
Bau
Alastair Bland
Philippe Gayet
Franck Di Maio
PierreCharrue
Frank Locci
PlanningPlanning
15 Octobre first meeting15 Octobre first meeting End of December proposals forEnd of December proposals for
2005
End of april proposals forEnd of april proposals for 2006-2010 reminding :
2006-2007 Hardware commissioning 2006 LEIR run 2007-2008 LHC Commissioning 2008-2009 first phase of LHC operation
Recommendations for 2005Recommendations for 2005
LINAC, BOOSTER, ISOLDE, LINAC3.LINAC, BOOSTER, ISOLDE, LINAC3. As it is now with CO internal adjustments
LEIR.LEIR. During commissioning PL will organize support After acceptation same as above with enforced
support for new technology
SPSSPS No piquet support, Only insfrastructure support
during working time
LHC hardware CommissioningLHC hardware Commissioning Each PL organize the support for his project (PIC,
QPS,CRYO,….) Infrastructure support for Servers, FIP, PVSS, FESA,
CMW, Laser, logging,
CO Software (app, components) CO Software (app, components)
FESA
CMW
UNICOS (Cryo)
UNICOS (PIC,QPS,CRYO)
PIC
UNICOS (PIC,QPS,CRYO)
LASER, Logging,… CO
DIA
G
LASER, Logging , BIC …
PIC,BIC,QPSCryo Ring
CM, JAPC…
Tim
ing
FESA
Tools for Hardware installation & Tools for Hardware installation & Operation Operation
Naming ConventionNaming Convention
Layout DBLayout DB Two layers of descripition
System (PLC, VWE, GATEWAY, FIP segment, Server,..) Functional Component (slot) of systems (board, Power
Supply CPU,…) Connection to functional slots (timing, PIC, Power,
Ethernet
ABCAMABCAM Asset management tools describe all physical
equipment associated to a functional slots
VME-VXIVME-VXI
Failure typesFailure types Power/Network failure RACK top : Power supply, timing fan out, RF repeater (local
diagnostic) intervention by trained team with procedure CPU : (monitored by Xcluc), intervention by trained team
with procedure CO Board : (all CO board does not contains remote
monitoring mechanism or if they exist they are not homogeneous) intervention by trained team with procedure
1553 Fieldbus and serial link (not always monitored) intervention by trained team with procedure
Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators
VME-VXIVME-VXI
Problems to address Problems to address 450 units , Several back planes type
BDI,RF types cannot be maintained by CO PS complex equipments to be transferred from
configuration DB to the Hardware maintenance tools Different monitoring & remote action methods
Huge investment (money & manpower) to be done to homogenize
Some equipments does not have monitoring capabilities (racks)
Cohabitation of CO non CO managed board PB of differential diagnostics Who is doing the intervention
FIP FIP
Failure types Failure types Power (disseminated power supplies along the network) Ethernet only for gateway Gateway (150) components failures (diagnostic on Xcluc)
gateway replacement by trained team with procedure, soft reloading by operators
Mother board, power supply, FIP Board,Timing cards Segment (585)Component failures (diagnostic via FIP diagnostic
tool) component replacement by trained team with procedure Copper/ Fiber coupler, Cu/Cu repeater,FIP DIAG
Agent failures (diagnostic via FIP diagnostic tool or supervision/expert application) equipment group responsibility
Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators
FIP FIP
Problem to addressProblem to address CO declare all components/architectures/layout in the
maintenance/operation tools Provide homogeneous Tools for Diagnostic & remote action
Remote reset . Restart gateway Make difference between agent (equipment) and FIP (CO) problem Agent diagnostics
PLCPLC
Failure typesFailure types Power / Network Back plane power supply , PLC Ethernet board, CPU board (no
remote differential diagnostic possible) intervention by trained team with procedure
IO board or field bus board failure (monitored by PLC console software) intervention by trained team with procedure
Instruments or electronic failure (PIC)(monitored trough PLC/PVSS) intervention by specialist
Application failure (seen in supervision system) action via PLC console software by specialist
PLCPLC
Problem to AddressProblem to Address PLC owned BY CO (Cryo(125), PIC/WIC(44), RR(??))
Different projects with different constraint and principles For PIC CO is also responsible for electronic equipments monitored via
PLC/PVSS PLC owned by Equipment group (BT, PO, VAC,RF(20)some PLC in
between (30) We have to determine limit of CO responsibilities & services Centralize all PLC related information in tools accepted by the community
Abcam, LayoutDB
Common Diagnostics principles to be established Generalize and complete IEPLC diagnostics methodology to all PLCs Remote reset/action are not always a good strategy (disastrous for Cryo
PLC with a Ethernet PB) Action possible only after a local diagnostics Intervention procedures need to be establish by CO and followed by a
trained (on PLC) team After a CPU replacement application reload needed in some cases
The support need to know how to use PLC console program Identify who can perform these task and train them
TIMINGTIMING
Failure typeFailure type GMT Distribution
Power failure failure of a Timing component (Coupler, repeater, Timing
Board) trained team Cable or Fiber disconnection/cut trained team Timing board failure on client unit (VME, Gateway) trained team
Timing Distribution Connection /repeaters trained team Event timing disabled by user : should be treated by operators
MTG sequencer Hardware failure specialists Error in programming operation timing specialist
Timing reception via Ethernet in work stations (video)
TIMINGTIMING
Problem to AddressProblem to Address Introduce GMT layout & Timing distribution Layout DB
Back log of “PS complex” Difficult to sort Software/User error & hardware for normal
operation crew Several tools for timing diagnostics for different PB
CTRtest, TG8test timing board reception check Video: telegram reception (In FE and WS) TestTGM : availability of services
Necessity to have a real timing competence always available in OP
First diagnostic and solution of softwar&user errors Timing related work is part of the normal Operator Work but it’s
not tracked as it should be by OP
ServersServers
Failure types Failure types Power/network (all systems grouped in restricted area) Loss of a system resource
CPU, Power supplies, disk Repair operator Hardware intervention (specialist)
Configuration Loss : Repair /reboot does not solve PB
restore from a backup (specialist) Application
Diag In application itselfRepair from xcluc (operator)
Problems to addressProblems to address OS Configuration homogenization
Still some PS/SL way of life to migrate toward AB Procedure & training for operator intervention
What is the task of the operator How to do it in a proper way
Power DependencePower Dependence
Identify a power Failure on all Process Identify a power Failure on all Process Control devicesControl devices
All systems must be entered in layout DB Connection to power supply known
All power units must be monitored What does that mean ?? Is the granularity achieved by TS-EL compatible with
our needs ???
How to make the link between TS-EL monitoring system And CO equipment
GTPM (data collection nee to be organized) ANOTHER TOOL…
Intervention should be done by OP/TS-EL
Network DependenceNetwork Dependence
Identify a Network Failure on all Process Identify a Network Failure on all Process Control devicesControl devices
All systems must be entered in layout DB Link to be establish to Netops
All network components are monitored How to link the NETOPS/spectrum information to the
CO diagnostic tools
Java Applications : Situation Java Applications : Situation
Legacy softwareLegacy software Known by CO : One member can maintain them Orphans Applications : ??? Both case : Phasing out “Moyen terme” .
New application or new component (library)New application or new component (library) Developed by CO or CO/OP team , this team develops according to
common rules Diagnostic tools available in CCC to make distinction between
application failure or external Problem Software Component List necessary for the application Hardware dependence List
Technical contact list.
Failure TypesFailure Types Controlled process (application) Process expert Control system (application, xcluc ,…) control Specialist
Front end communication, application server. CMW server…) Application (Xcluc) repair and if not efficient application Specialist Config error for data driven application (process expert)
No efficient Intervention on application Software can be No efficient Intervention on application Software can be done by a non expertdone by a non expert
Java application Java application
Problem to addressProblem to address For legacy software
Identify and plan all legacy and Orphans applications upgrade
If no upgrade (not possible or non useful) or before upgrade identify an expert or a support team per application (team can be a mix OP/CO/… Staff)
For new software Identify the expert team per application (OP/CO/…) Include in application documentation or online :
List of dependencies to other applicationList of hardware dependencies
DM applicationDM application
Failure typeFailure type Oracle server IT Applications server see server page Logging application : A monitoring tool exists for
logging on a web based access page. Can be seen & corrected by CCC operator
Config DB : ???
Problem to address Problem to address Ensure the guaranty of services 365/24 by IT for oracle
server Prepare procedure for CCC operator on reference
server web based intervention.
PVSS ApplicationPVSS Application
No automatic control actions performed in PVSS No automatic control actions performed in PVSS applications:applications:
Monitoring, Operator command request, Interface to LASER/logging
All applications Based on JUNICOS frameworksAll applications Based on JUNICOS frameworks Same principles of monitoring through all applications Failure types are not applications dependant
Failure TypesFailure Types Controlled process (via application & SMS) Process expert Control system (via PVSS monitoring tool) PVSS Specialist
Front end comunication,Data server CPU disk usage,Archive monitoring,,Logging exchange monitoring..
PVSS manager (auto repair in case of failure Xcluc) PVSS Specialist
Problems to addressProblems to address Backup/Restore policy to be established Integration with existing tools
Operation ResponsibilitiesOperation Responsibilities
APJava Applications framework
High level applications for :
-LEAR
-LHC HC
LASER
HTTiming /Sequencing
Remote reset
FE
FCCMW
FE
INServers
FE (via xcluc)
PIC/WIC
ISPVSS
IEPLC
CRYO
FIP
Test bench
DMLogging
Configuration DB
ABCAM
LAYOUT DB
All sections will have activities related to operation in 2006
Present piquet know HowPresent piquet know How
APJava Applications framework
Legacy Application
High level application :
-LEAR
-LHC HC
LASER
HTTiming /Sequencing
Remote reset
FE
FCCMW
FE
INServers
FE (via xcluc)
PIC/WIC
ISPVSS
IEPLC
CRYO
FIP
Test bench
DMLogging
Configuration DB
ABCAM
LAYOUT DB
Some remarksSome remarks
We have a large diversity of systems and We have a large diversity of systems and only a small part is integrated today only a small part is integrated today
The Present piquet team is not tailored to The Present piquet team is not tailored to take over the entire operation duty of the take over the entire operation duty of the CO group CO group
1 team leader , 4 experts ,2 new comers “new” technologies not mastered by existing team Geographical dispersion of equipement
In 2006 /2007 Operation activity will have to In 2006 /2007 Operation activity will have to “Cohabite” with installation/commissioning “Cohabite” with installation/commissioning activitiesactivities
Firsts Proposals Firsts Proposals
For hardware system use systematically the layout For hardware system use systematically the layout DB and ABCAM toolsDB and ABCAM tools
Together with OP clean the Power/Network IssuesTogether with OP clean the Power/Network Issues Transmit to OP the Timing software managementTransmit to OP the Timing software management Clarify responsibilities with equipments in all grey Clarify responsibilities with equipments in all grey
areas.areas. Prepare & execute the legacy software upgradePrepare & execute the legacy software upgrade Integrate all existing diagnostic tool Integrate all existing diagnostic tool
LASER (AP),GTPM (OP),XCLUC (IN),Spectrum (IT -CS),TIM (OP),PVSS UNICOS integrated diagnostics (IS/IN),Application integrated diagnostics (AP) ,DiagCMW (FC), TIMING Tools (HT), PLC consoles Tools (IS), FIP diagnostic Tool (IS), Logging monitoring (DM)
Tracks Tracks
All sections must organize (alone, in synergy with other, via a All sections must organize (alone, in synergy with other, via a reorganization,…) the operation support of the systems or reorganization,…) the operation support of the systems or applications they deploy.applications they deploy.
Not systematic organization (PIQUET OR LIST) intervention team can be grouped
IE : hardware for VME, gateway, FIP, PLC PVSS/PLC & PVSS/FEC applications support
Create an operation coordination (a Person or a Team)Create an operation coordination (a Person or a Team) Makes the interface toward OP Coordinates the control system integration
Requesting procedure/documentation to system teams Coordinating the diagnostic tools development Requesting from the different team the functionalities necessary to operation
Create a Real Operation Oriented policy within the entire groupCreate a Real Operation Oriented policy within the entire group
Possible Operation Team Possible Operation Team Duties/Limits for 2006Duties/Limits for 2006
No installationNo installation No configurationNo configuration No application No application
modificationsmodifications No application bug fixingNo application bug fixing No timing user error No timing user error
fixingfixing No intervention on No intervention on
commissioning systemcommissioning system No intervention on No intervention on
Power/network PBPower/network PB
For system in operationFor system in operation HardwareHardware
Remote diagnostic Local diagnostic Reboot, or reinitialize
communication Hardware intervention
(with limitations) Application reloading
(with limitation) Call Equipment specialists
SoftwareSoftware Refine diagnostic Reboot application
(operators) Call specialists
ManagementManagement Tracks problems Requests & obtain
improvements