Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and...

28
Zabbix in safety critical environments Zabbix conference / Riga Günther Sommer Senior IT Architect / „Zabbix Evangelist“ Business Unit Integration Projects

Transcript of Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and...

Page 1: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

Zabbix in safety critical environments

Zabbix conference / Riga

Günther SommerSenior IT Architect / „Zabbix Evangelist“Business Unit Integration Projects

Page 2: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 2

Overview

� The why and how of using ZABBIX in safety critical environment

� Marketing ☺ – Who are we

� Part I – Why Zabbix / safety critical usage

� Part II – Zabbix usage

� Part III – Zabbix toolbox we used

Page 3: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 3

Marketing ☺

Marketing ☺

Page 4: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 4

About FREQUENTIS & ZABBIX

� FREQUENTIS is a premium partner of ZABBIX

� Using it as a monitoring part for some of our systems

� Certified for an ED109 – AL3 environment

Page 5: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 5

We develop and market high reliable communication and information systems for mission critical applications

in the fields of Air Traffic Management and Public Safety & Transport.

Worldwide Control Centres develop towards the same standards.

More than 60 years of innovation & expertise in mission critical applications

ATMAir Traffic Management

PSTPublic Safety & Transport

ATM Civil Defence Public Safety Public Transport Mariti me

Page 6: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 6

Company Overview

� Established in 1947

� 154 Mio. EUR Turnover 2010

� Corporate headquarters in Vienna• Subsidiaries and regional offices in over 50 countries

� about 980 Employees

� Outstanding Engineering Capacity

• more than 600 highly-qualified engineers (HW/SW/PM) at FREQUENTIS headquarter and subsidiaries

� Export Quota > 90%

� R&D Quota > 12%

Global Market Leader in ATC Voice Communication Sys tems

First Air Traffic Control System in Austria,Vienna / Schwechat, 1955

Breakthrough in the US: FAA Command Centre / Herndon, Virginia, 2003

Company Headquarters on Wienerberg,relocation in 2006

Frequentis Group 2010

Page 7: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 7

Worldwide References

[Excerpt 05/2011]

Page 8: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 8

Customers using Zabbix

� Major air traffic control center in Europe

� Major UK police force

� Several airports worldwide

� And getting more

� Used in some of the most complex projects

Page 9: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 9

Part I –safety critical / Why Zabbix

Safety critical / Why Zabbix

Page 10: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 10

What means safety critical?

� Organisations

– Air traffic managment

– Public safety control centers – Police, Ambulance, Fire brigade

� Technical

– Redundancy and resiliance

– Severity levels (multiple for same problem) and priorities

– Time to alarm

– Configureable screens (Text and Graphs!) for critical situations

– Minimization of false alarms

Page 11: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 11

Ensuring that the system is safe

� A lot of design! A lot of paperwork! A lot of testing!

� Requirements – Writing and tracking / testing

� Configuration managment / Deployment – this is an essential part of it - not a bit of change without tracking

� Testing

� Test procedures (up to several hundreds of testcases for every release)

� System tests – complete failovers, powerplug, ...

� And more testing – two years of it and still doing ...

Page 12: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 12

Why Zabbix?

� Evaluated several competitors:

– Nagios

– GWOS (Nagios++)

– Zenoss

– Hyperic

– OpenNMS

– Commercial ones like WhatsUp, SNMPc

� Focus on different aspects

– Some on Java / Middleware

– SNMP support

Page 13: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 13

Advantages of Zabbix / technical

� Templates – makes mass-administration easy

� Stability

� Fully GUI configureable!

� Trigger „language“

– Filtering out of CPU spikes

– Severity escalations with dependencies possible

� Agent command line

– Try to query NTP sync via SNMP ☺

� Nagios „plugin compatible“

– You can use outputs from Nagios checks (ie „check_hpasm“)

Page 14: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 14

Advantages of Zabbix / technical II

� Queue – gives you an idea, if it doesn‘t work as expected

� Seperation of App and Config – RPM & Config only in DB

� Proxy & distributed monitoring concept for nationwide systems including a full standalone remote node

Page 15: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 15

Advantages of Zabbix / business

� Matching to our SI business

� Commercial support options

� Trainings

� Engineers „love“ open source – „Patch hack“ support

� Release policy

Page 16: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 16

Disadvantages of Zabbix

� Templates ☺ – high learning curve

� Configuration GUI (but improved a lot since 1.8.1...)

� No full config export possible via XML (only on DB)

� No templateable multi-node status – „Service view“

Page 17: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 17

Part II – Zabbix usage

Part II – Zabbix usage

Page 18: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 18

HA setup of Zabbix

� Shared nothing architecture

Page 19: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 19

Cluster stack

� Monitoring - Zabbix �

� Database - MYSQL ��

– Sometime quite long startup time (Cluster timeout)

� Replication - DRBD �

� Cluster - LINUX-HA based on Pacemaker �

– Lots of issues until failover & failback works

– Very bad documentation at 2009 for Pacemaker and espical XML, Corosync/OpenAIS was not ready for production use

� Fencing - HP ILO ��

– productive network has dual redundancy, ILO just one

Page 20: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 20

Screen layouts

Page 21: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 21

Features developed out of our work

� Map & Screen Import/Export

� Internal Zabbix monitoring – had some issues with pollers getting frozen and missing poller resources

� Configurable severity naming and coloring

� Hostgroup filter on status of triggers in screen

� Improved SNMP trap support (in future)

� Event export (in future)

� Script confirmation (in future)

� and more coming ...

� Lots of minor stuff, bug fixes, ...

Page 22: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 22

If you need something...

� Then ask ZABBIX

� Pay for it, it‘s a good price for value

� We got a lot for 1.8 and for 2.0, and 2.x+

� And if you are completly new, there is consulting available

Page 23: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 23

Part III – Zabbix toolbox

Part III – Zabbix toolbox

Page 24: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 24

Zabbix „Parrot“

� Requirments:

– Sound alarm indiciation for Zabbix

– Browser is not really 100% trustworthy

– Shall be independent of Zabbix server HW & SW

� Solution:

– External PC or embedded device

– „Parrot“ gets triggerd by Zabbix

– Loops the sound until EOL or stop event

– Stop can be done by server or local contact

Page 25: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 25

Zabbix Backup

� Problem

– Normal backup can take very long at DB sizes > 100GB

– No maintenance window existing

– Can only be fast done on storage level – SAN / LVM?

– Can block DB access in example for MYSQL

– Historical data is „always outdated“

� Solution

– Fully backup configuration tables only

– Create only DB structure for historic tables

– Backup needs some seconds and is about 1M size

Page 26: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 26

Zabbix API uses

� Command line tool for mass actions

– Use existing groups and items for command line display

– Can do the same things as on GUI

� Umbrella monitoring

– Possible to extend it to interface to another system via API

Page 27: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.2File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 27

Q & A

� Any questions ?

Page 28: Zabbix conference / Riga Günther Sommer Senior IT ......in the fields of Air Traffic Management and Public Safety & Transport. ... – Zenoss – Hyperic – OpenNMS – Commercial

© FREQUENTIS 2011 Date: 2011-09-30 Rev.1File: Zabix – Safety Critical.PPTX Author: G. Sommer Page: 28

Thank you