ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014...
-
Upload
willa-cook -
Category
Documents
-
view
218 -
download
2
Transcript of ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014...
ALMA Integrated Computing Team
ICT Coordination and Planning Meeting #2Santiago 28-29 January 2014
Alarm system
A.Caproni
ICT-CPM2 28-29 January 2014
Alarm system status
According to operators the alarm panel is useless Too many alarms Stale alarms False alarms Result of a 4h profiling by Patricio (mid Nov 2013)
~31k alarms ACTIVE 16103 TERMINATE 15407 Pri 0: 41 PRi 1: 1820 Pri 2: 500 Pri 3: 29149
Insufficient coverage: Scripts and tools not provided by ALMA computing
ICT-CPM2 28-29 January 2014
Snapshot - 1
ICT-CPM2 28-29 January 2014
Snapshot - 2
ICT-CPM2 28-29 January 2014
Snapshot - 3
ICT-CPM2 28-29 January 2014
AS improvement plan (proposal)
Show only “real alarms”, remove the others (trust) Useful documentation in panel (twiki?) Fix most chattering alarms
DGCK:*:1, DGCK:*:4 FLOOG,*,7
Fix stale alarms Manager,*,1 LO2BBpX:*:1, LO2BBpX:*:10, LO2BBpX:*:11 WCA:*:1
Improve system startup and device initialization Profile during operations like array creation/destruction, total power… TMCDB configuration (input from System Engineering for BACI props)
ICT-CPM2 28-29 January 2014
AS improvement plan (proposal)
ACS next improvements Alarm server to dump alarms on files (ICT-1908)
Offline profiling Correlate alarms and logs while debugging (?) After the facts GUIs and tools
Alarm panel to group alarms belonging to the same array (ICT-1760)
Nominate a “Alarm System Manager” Regularly profile the AS Check and update the documentation
ICT-CPM2 28-29 January 2014
ACS handed over to OSF after fixing persistence and NCs RTI/DDS tested with 48 antennas
Number of alarms expected to grow having more antennas Alarm system performance
AS persists alarms in memory Already decoupled from source NC
ACS “new” AlarmSource API avoid resending a alarm if its state did not change Enable/disable alarm sending Queuing of alarms
Scalability