Jeremy Nowell EPCC, University of Edinburgh [email protected] A Standards Based Alarms Service...

22
Jeremy Nowell EPCC, University of Edinburgh [email protected] http://www.npm-alarms.org/ A Standards Based Alarms Service for Monitoring Federated Networks Kostas Kavoussanakis, Jeremy Nowell , Charaka Palansuriya, Florian Scharinger, Arthur Trew ICNS 2009 Valencia 24 April 2009

Transcript of Jeremy Nowell EPCC, University of Edinburgh [email protected] A Standards Based Alarms Service...

Page 1: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

Jeremy NowellEPCC, University of Edinburgh

[email protected]://www.npm-alarms.org/

A Standards Based Alarms Service for

Monitoring Federated Networks

Kostas Kavoussanakis, Jeremy Nowell, Charaka

Palansuriya, Florian Scharinger, Arthur Trew

ICNS 2009

Valencia

24 April 2009

Page 2: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 2

Project Background

• EPCC is supercomputing centre at University of Edinburgh– Host UK national academic HPC service

– Academic and industrial consultancy

– http://www.epcc.ed.ac.uk/

• EPCC has been working in area of network monitoring for Grids for 5 years– First within EGEE project, now more widely

Page 3: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 33

Overview

• Challenges of monitoring federated networks

• Standards-based network monitoring

• Why an Alarms Service

• Architecture

• Examples

• Future Work

Page 4: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 4

Federated Networks

Page 5: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 5

Network Monitoring Challenges

Network

Monitoring

Types Tools

User

Groups

Data

Formats

Administrative

Domains

NOC

backbone iperf ping

netflow

RRD

SQL

Flat file

GOC

End user

project

NREN

MAN

end-to-end

perfSONAR

Page 6: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 7

Federated Networks for Grids

GÉANT2

NREN NREN

MAN MAN

Campus Campus

• For Grids need– unified view

– end-to-end performance

• real achievable application performance

Page 7: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 8

Federated Network Monitoring Strategy

• Use existing tools and data– Do not try and force adoption of single tool across large multi-

administrative domains

– Instead provide framework for accessing distributed data

• Use standards-based solutions where possible– Access wide range of data

– Allow interoperability between grids, projects and networks

Page 8: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 9

Standards-Based Network Monitoring

• Data federation through use of schema provided by Open Grid Forum (OGF) Network Measurements Working Group (NM-WG)

End Users of Network DataResource-brokering

Middleware

NOC/GOCUser

NM-WG Clientsand Services

Monitoring Frameworks

NREN using perfSONAR

End-site using perfSONAR

End-site using e2emonit

Home-grown Framework

NM-WG Schema allows interoperability between clients and measurement frameworks

Page 9: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 10

Standards Based Network Monitoring

• EPCC has developed tools for accessing historical network performance data from multiple measurement frameworks

• e2emonit

– End-to-end metrics (TCP/UDP achievable bandwidth, RTT, packet loss, OWDV)

– Active measurement tools (iperf, ping, udpmon)

• perfSONAR

– Developed by collaboration including GÉANT2, ESnet, Internet2

– Passive data for router interfaces

• Utilisation, input errors, output drops

– Traceroute information

Page 10: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 11

But…

• Historical data only useful for diagnosing problems when you already know something is wrong

• What users really needed are…

ALARMS

Page 11: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 12

Requirements

• A network Alarms Service

– Allows the timely detection of problems

– Notifies users

– Gives an “at a glance” view of network status

Page 12: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 13

– perfSONAR based monitoring solution deployed and operated by DANTE

• Need following alarms as minimum– Unexpected path changes

– Routing out of private network

– Router Interface Congestion

• Packets lost

Specific Requirements

• Motivated by the LHCOPN– 10 Gb/s private network for moving

data generated by the LHC

Page 13: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 14

Strategy

• Query

• Detect

• Notify

Page 14: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 15

Architecture

MA QueryInterface

Current StatusAnalyser

MA Notification

Interface

ConfigurationParser

StatusNotifiers

MeasurementArchive

MeasurementArchive

Alarms Archive

ConfigurationFiles

Page 15: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 16

Details

• Query– NM-WG standard queries to perfSONAR RRD and HADES

Measurement Archives

• Passive Router Data – interface errors, drops, utilisation

• Traceroute Information

• Detect– Rules based mechanism to process data against rules defined in

configuration files

• DROOLS library

• Notify– Output status in form usable by Nagios

• Status display, notifications, history

– Easily implement more status notifiers

Page 16: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 17

Examples

Page 17: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 18

Examples

Page 18: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 19

Examples

Page 19: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 20

Current Status

• Prototype is currently being used by DANTE to monitor some LHCOPN paths and interfaces, for the required alarm conditions– Test functionality

– Gather feedback from users

• Will be further developed and deployed to monitor whole of LHCOPN during this year

• Actively looking for other users

Page 20: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 21

Further Work

• Implement more alarm conditions

• Send status information to other consumers, eg network weather map

• Think about data processing– eg “cleaning” of data to remove bad data points

– Statistical processing etc

Page 21: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 22

Summary

• Monitoring of federated networks is a challenge

• An Alarms Service is critical for problem discovery

• The LHCOPN is being monitored using an initial version– and will be developed further to be deployed to monitor the whole

network

Page 22: Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk  A Standards Based Alarms Service for Monitoring Federated Networks.

24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 23

• Acknowledgements– Funding

• UK Joint Information Systems Committee (JISC)• EGEEII (INFSO-RI-031688)• DEISA2 (RI-222919)

– Collaboration• DANTE• DFN WiN-Labor Erlangen• LHC-OPN