An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to...

20
An introduction to network performance monitoring with perfSONAR www.geant.org Szymon Trocha (Poznań Supercomputing and Networking Center) WP6T3 DeiC, Frederica (DK), 31 October 2019 Public

Transcript of An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to...

Page 1: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

An introduction to network performance monitoring with perfSONAR

www.geant.org

Szymon Trocha (Poznań Supercomputing and Networking Center)WP6T3

DeiC, Frederica (DK), 31 October 2019

Public

Page 2: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

2 www.geant.org

• Identify problems, when they happen or (better) earlier

• The tools must be available (at campus endpoints, demarcations between networks, at exchange points, and near data resources such as storage and computing elements, etc)

• Access to testing resources

Motivations

Source: https://www.deic.dk/

Page 3: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

3 www.geant.org

Heterogeneous world

• The global Research & Education network ecosystem is comprised of hundreds of international, national, regional and local-scale networks

• While these networks all interconnect, each network is owned and operated by separate organizations (called “domains”) with different policies, customers, funding models, hardware, bandwidth and configurations

• This complex, heterogeneous set of networks must operate seamlessly from “end to end” to support yourscience and research collaborations that are distributed globally

Page 4: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

4 www.geant.org

Challenges

• Delivering end-to-end performance

• Get the user, service delivery teams, local campus and metro/backbone network operators working together effectively• Have tools in place• Know your (network)

expectations• Be aware of network

troubleshootingSource: https://www.deic.dk/

Page 5: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

5 www.geant.org

What is perfSONAR?

• It’s infeasible to perform at-scale data movement all the time – as we see in other forms of science, we need to rely on simulations

• perfSONAR can be used to to:• Set network performance expectations• Find network problems (“soft failures”)• Help fix these problems• All in multi-domain environments

• These problems are all harder when multiple networks are involved

• perfSONAR is provides a standard way to publish monitoring data

• This data is interesting to network researchers as well as network operators

Page 6: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

6 www.geant.org

The good old times of iperf

• A good TOOL• But a server has to be started on

the remote end

• perfSONAR is a frameworkusing the set of tools• Including iperf

Source: https://www.deic.dk/

Page 7: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

7 www.geant.org

The Toolkit

• Network performance comes down to a couple of key metrics:• Throughput (e.g. “how much can I get out of the network”)• Latency (time it takes to get to/from a destination)• Packet loss/duplication/ordering (for some sampling of packets, do

they all make it to the other side without serious abnormalities occurring?)

• We can get many of these from a selection of measurement tools – the perfSONAR Toolkit• Plus more like , disk-to-disk transfer, HTTP or DNS request time

• The “perfSONAR Toolkit” is an open source implementation and packaging of the perfSONAR measurement infrastructure and protocols

• All components are available as RPMs, DEBs, and bundled as a CentOS ISO

• Very easy to install and configure (usually takes less than 30 minutes for default install)

Page 8: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

8 www.geant.org

The importance of regular testing

• We can’t wait for users to report problems and then fix them

• Things just break sometimes• Bad system or network tuning• Failing optics• Somebody messed around in a patch panel and kinked a fiber• Hardware goes bad

• Problems that get fixed have a way of coming back

• System defaults come back after hardware/software upgrades

• New employees may not know why the previous employee set things up a certain way and back out fixes

• Important to continually collect, archive, and alert on active test results

Page 9: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

9 www.geant.org

perfSONAR deployment possibilities

• Dedicated server (example)• A single CPU with multiple cores

(2.7 GHz for 10Gps tests)

• 4GB RAM

• 1Gps onboard (mgmt + delay)

• 10Gps PCI-slot NIC (throughput)

• Low cost – small PC• e.g. GIGABYTE BRIX GB-BACE-

3150, Intel NUC

• Ad hoc testing• Docker

Page 10: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

10 www.geant.org

Deployment styles

Beacon

Island

Mesh

Page 11: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

11 www.geant.org

Measurement node location criteria

• Where it can be integrated into the facility software/hardware management systems?

• Where it can do the most good for the network operators or users?

• Where it can do the most good for the community?

EDGE

NEXT TO SERVICES

NEXT TO SERVICES

Page 12: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

12 www.geant.org

Mesh building and results visualization

• Key challenges• Scheduling the tasks you want to run at each location• Visualization components to display results of the measurements from

multiple hosts

• pSConfig is a template framework for describing and configuring a topology of tasks. If you manage more than one perfSONAR host itassists with above by providing tools to automate each of the configuration tasks listed above.

• pSConfig Web Admin (PWA) is a web-based UI for perfSONAR administrators to define and publish pSConfig meshes, which automates tests executed by test nodes, and provides topology information to various services, such as MadDash.

• MadDash collects and presents two-dimensional monitoring data as a set of grids referred to as a dashboard.

Page 13: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

13 www.geant.org

perfSONAR ow-to in 3 minutes

Choose yourhome

• Connect to network

Install Toolkit software

• By siteadministrator

Configure hosts

• Networking

• 2 interfaces

• Visibility

Point to central host

• To consumecentral meshconfiguration

Install and configure

• By meshadministrator

• Central data storage

• Dashboard GUI

• Home for meshconfiguration

Configure mesh

• Who, whatand when

• Every 6 hours(bandwidth)

• Be careful 10G -> 1 G

Publish meshconfiguration

• To be consumed by measurementhosts

Run dashboard

• Observethresholds

• Look for errors

HO

STS

CEN

TRA

L SE

RV

ER

Page 14: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

14 www.geant.org

Performance Measurement Platform (PMP) – an Examplemanaged service

• Consists of set of low-cost hardware nodes with preinstalled perfSONAR software

• The central components that manage the platform elements, gather, store and represent the performance data, are operated and maintained by the GÉANT project

• Coupled with GEANT MPs to create a partial mesh for NRENs• Small nodes users can shape the predefined setup and

configure additional measurements to their needs and get more familiar with the platform• Can become example measurement experimentation and training

platform about network measurement, network management, network performance

• Can provide an easy way to setup a new perfSONAR small nodes on new small devices (not tied to the hardware) through providing ways for image creation and guidelines

• Can serve as an example of managed service

Page 15: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

15 www.geant.org

A Production GÉANT Service

Throughput

Latency / packet loss

IPv6 (and IPv4)

Page 16: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

16 www.geant.org

PMP update: current coverage

• +3 in AfricanNRENs

• Somecontrieshave >1

UK

PT

ES

NL

BY

EE

LT

DEBE

RO

CY

ATHU

RS

IT

ME

GE

AM

DK

PL

SIHR

AL

FR

IE

LU

UA

GR

Page 17: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

17 www.geant.org

• 4.2.0 contains a number of changes including:• New pScheduler Disk-to-Disk test• pScheduler Task Priorities• pSConfig Web Admin (PWA) RPMs• perfSONAR Ansible Roles• MaDDash ServiceNow Integration

• 4.2.1, 4.2.2• Bug fix release

• 4.3 (1Q2020) plans• Transition to Python 3

• 4.4 (2H2020) plans• Improved archiving to non-Esmond sources (e.g. Elasticsearch)• Grafana integration support

perfSONAR latest changes (4.2.0 - 4.2.2) and near future

Page 18: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

18 www.geant.org

Get Involved!

• http://www.perfsonar.net/

• http://docs.perfsonar.net/

• http://www.youtube.com/perfSONARProject/

• perfSONAR Consultancy and Expertise service• What can we help you with?

[email protected]

Page 19: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

Thank you

www.geant.org

Any questions?

[email protected]

© GÉANT Association on behalf of the GN4 Phase 3 project (GN4-3).The research leading to these results has received funding fromthe European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 856726 (GN4-3).

Page 20: An introduction to network performance monitoring with ... · 10/31/2019  · An introduction to network performance monitoring with perfSONAR Szymon Trocha (Poznań Supercomputing

An introduction to network performance monitoring with perfSONAR

www.geant.org

Szymon Trocha (Poznań Supercomputing and Networking Center)WP6T3

DeiC, Frederica (DK), 31 October 2019

Public

The scientific/academic work is financed from financial resources for science in the years 2019 - 2022 granted for the realization of the international project co-financed by Polish Ministry of Science and Higher Education.