Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS...

18
Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008

Transcript of Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS...

Page 1: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Network Monitoring, WAN Performance Analysis,

& Data Circuit Support at Fermilab

Phil DeMar

US-CMS Tier-3 Meeting

Fermilab

October 23, 2008

Page 2: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Active Wide-Area Network Monitoring

PerfSONAR: distributed network monitoring infrastructure Supported by US-LHC T1 sites and Internet2 community

PerfSONAR-PS: Active monitoring package Web services collection built on trusted monitoring tools:

• ping, BWCTL(iperf), owamp, NPAD, NDT toolkit• Web service interface for pulling data into other monitoring tools

Zero configuration; out of box deployment• Based on Knoppix Live CD bootable disk• Optional software bundle deployment

Modest hardware requirements for on-site deployment

Page 3: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

PerfSONAR Deployment Status

US-Atlas moving ahead with perfSonar-PS at T1 & T2s: Two dedicated systems per site; one each for latency & b/w testing Systems are spec’ed devices, $628 each (Koi computer) Utilize Knoppix disks & standard configurations

We’ve recommended the same model for US-CMS

Current PerfSONAR-PS deployment: Both US-LHC Tier-1s (FNAL & BNL) UNL (CMS), U-Mich (ATLAS); U-Delaware; Internet-2; ESnet Complete active monitoring matrix of the above

Page 4: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Background information

PerfSONAR-PS project -http://code.google.com/p/perfsonar-ps/

Tour of perfSONAR-PS service is available - http://code.google.com/p/perfsonar-ps/wiki/CodeTour

Knoppix Live CD bootable disk info - http://code.google.com/p/perfsonar-ps/wiki/NPToolkit

Appliance PCs: Vendor: KOI Computing – (630) 627-8811 Spec: 1U Intel Pentium Dual-Core E2200 2.2GHz System Cost: $628/each

Page 5: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Performance Analysis Support

In 1999, Matt Mathis coined the term ‘Wizard’s Gap’ Today, it’s still an issue

Users often don’t know about: Common OS tuning issues for

WAN data movement Wide-area network path, its characteristics, available tools

Its still an end-to-end problem And the world is still short on wizards

Our structured analysis methodology seeks to put some of the wizardry into structured process

Page 6: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Disks Operating System

ApplicationsCPU

MEM

Disks Operating System

Network Applications

CPUMEM

Network

R/S

Router RouterCable

NIC NIC

12

3

4

5

6

78

9

Network Application Performance Factors !!!

1’2’

3’

4’

5’

6’

7’

• Network Delay• Bandwidth• Packet Drop Rate

• CPU speed• MEM Size• System Load• Disk I /O Speed • Operating System

• R/W buffer size• Disk cache size

• NIC Speed

End System

R/S

LAN

WAN

Disks Operating System

ApplicationsCPU

MEM

Disks Operating System

Network Applications

CPUMEM

Network

R/S

Router RouterCable

NIC NIC

12

3

4

5

6

78

9

Network Application Performance Factors !!!

1’2’

3’

4’

5’

6’

7’

• Network Delay• Bandwidth• Packet Drop Rate

• CPU speed• MEM Size• System Load• Disk I /O Speed • Operating System

• R/W buffer size• Disk cache size

• NIC Speed

End System

R/S

LAN

WAN

Find the performance problem area(s)

Page 7: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Performance Analysis Methodology

Structured approach to performance analysis

Model the process like medical diagnosis Collect the physical characteristics Run diagnostic tests Record everything; develop a history of the analysis

Strategic approach: Sub-divide problem space:

• Application-related problems• Host diagnosis and tuning• Network path analysis

Then divide and conquer

Page 8: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

7’R/S

Router RouterCable

8

9

WAN

`

Network End System

`

BR

BREnd-to-end Path

LAN LANPacket Trace

Diagnosis Server

WAN

Network End System Diagnosis Server

Network PathDiagnosis ServerNESDS

NPDS

PTDS

NES NESDS NPDSNES

PTDS

NESDS

NPDS

NPDS

NES

BR Border Router

Network Performance Analysis Architecture

PTDS

Page 9: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Host diagnosis Script that pulls system configuration Network Diagnostic Tool (NDT)

• Faulty network connections & NICs, duplex mismatches

Network path diagnosis OWAMP to collect and diagnose one-way network path statistics.

• Packet loss, latency, jitter Other tools such as ping, traceroute, as needed

Packet trace diagnosis Port mirror on border router(s) Tcpdump to collect packet traces Tcptrace to analyze packet traces Xplot for visual examination.

Performance Analysis Tools…

Page 10: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Round-trip time Sequence of routers along the paths One-way delay, delay variance One-way packet drop rate Packet reordering

Network path characteristics collected

Page 11: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Step 1: Definition of the problem space

Step 2: Collect host information & network path characteristics

Step 3: Host tuning & diagnosis

Step 4: Network path performance analysis Route changes frequently? Network congestion: delay variance large? Infrastructure failures: examine the counter one by one Packet reordering: load balancing? Parallel processing?

Step 5: Evaluate packet trace pattern

Network Performance Analysis Methodology

Page 12: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Tier2/Tier3 Sites worked with

UERJ (Brazil) IHEP (China) RAL (UK) University of Florida IFCA (Spain) TTU (Texas) CIEMAT (Spain) Belgium OWEA (Austria) CSCS (Swiss)

Page 13: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

An available service for CMS Tier-2/3 sites A work-in-progress at this point Focus is on process as well as results Willing to work with others in this area

Future areas of effort: Incorporate into work flow & content management system Make use of perfSonar monitoring infrastructure

https://plone3.fnal.gov/P0/WAN/netperf/methodology/

How to get hold of us: Send email to [email protected] Wide Area Work Group video-conf meetings every other Friday

Performance Analysis Status & Summary

Page 14: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Strategic Direction Toward Circuits

DOE High Performance Network Planning Workshop established a strategic model to follow: High bandwidth backbones for

reliable production IP service• ESnet

Separate high-bandwidth network paths for large scale science data flows • Science Data Network

Metropolitan Area Networks (MAN) for local access• Fermi LightPath a cornerstone

for Chicago area MAN

Page 15: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

ESnet4: Core networks 50-60 Gbps by 2009-2010 (10Gb/s circuits)

Cle

vela

nd

Europe(GEANT)

Asia-Pacific

New York

Chicago

Washington DC

Atl

anta

CERN (30+ Gbps)

Seattle

Albuquerque

Au

str

ali

a

San Diego

LA

Denver

South America(AMPATH)

South America(AMPATH)

Canada(CANARIE)

CERN (30+ Gbps)Canada(CANARIE)

Asi

a-Pac

ific

Asia Pacific

GLORIAD (Russia and

China)

Boise

HoustonJacksonville

Tulsa

Boston

Science Data Network Core

IP Core

Kansa

s

City

Au

str

ali

a

Sunnyvale

Production IP core (10Gbps)

SDN core (20-30-40-50 Gbps)

MANs (20-60 Gbps) or backbone loops for site access

International connections

USLHCNet

Page 16: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

Topology of circuit connections

Circuits utilize MAN infrastructure: 10GE channel(s) reserved for routed IP

service (purple) LHCOPN circuit (orange) to CERN SDN channels for E2E circuits to CMS

Tier-2/3 (shades of green)

Circuits based on end-to-end vLANs Direct BGP peering with remote site

Multiple provider domains is the norm Deployed technology varies by

domains involved Complexity is higher than IP service

Page 17: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

FNAL Alternate Path Circuits

Supported since 2004

Serve a wide spectrum of experiments CMS Tier-2s are heavy

users

Implemented on multiple technologies But based on end-to-end

layer-2 paths

Usefulness has varied

Page 18: Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008.

E2E Circuit Summary

FNAL currently supporting E2E circuits to Tier0 & Tier2s A few Tier3s

Today, circuits are largely static configurations

Dynamic circuit services are becoming available Driven largely by Internet2 DCN services

Alternate path support services also emerging Lambda Station (FNAL) TeraPaths (BNL)

Contact [email protected] for help or information