Detailed diagnosis in enterprise networks
description
Transcript of Detailed diagnosis in enterprise networks
![Page 1: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/1.jpg)
Detailed diagnosis in enterprise networks
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik (UCSD), Sharad Agarwal, Jitu Padhye, Victor Bahl
![Page 2: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/2.jpg)
Network diagnosis
Explaining faulty behavior
ratul | sigcomm | '09
![Page 3: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/3.jpg)
Current landscape of network diagnosis systems
ratul | sigcomm | '09
Big enterprisesLarge ISPs Network size
Small enterprises
?
![Page 4: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/4.jpg)
Why study small enterprise networks separately?
ratul | sigcomm | '09
Big enterprisesLarge ISPs
Small enterprises
Less sophisticated adminsLess rich connectivity
Many shared components
IIS, SQL, Exchange, …
![Page 5: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/5.jpg)
Our work
1. Shows that small enterprises need “detailed diagnosis”• Not enabled by current systems that focus on scale
2. Develops NetMedic for detailed diagnosis• Diagnoses application faults without application knowledge
ratul | sigcomm | '09
![Page 6: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/6.jpg)
Understanding problems in small enterprises
ratul | sigcomm | '09
100+ cases
Symptoms, root causes
![Page 7: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/7.jpg)
7
Symptom App-specific 60 %
Failed initialization
13 %
Poor performance
10 %
Hang or crash 10 %
Unreachability 7 %
Identified cause Non-app config (e.g., firewall) 30 %
Software/driver bug 21 %
App config 19 %
Overload 4 %
Hardware fault 2 %
Unknown 25 %
And the survey says …..
Detailed diagnosis
Handle app-specific as well as generic faults
Identify culpritsat a fine granularity
![Page 8: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/8.jpg)
Example problem 1: Server misconfig
ratul | sigcomm | '09
Web server
Browser
Browser
Server config
![Page 9: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/9.jpg)
Example problem 2: Buggy client
ratul | sigcomm | '09
SQL server
SQL client C2
SQL client C1
Requests
![Page 10: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/10.jpg)
Current formulations sacrifice detail (to scale)
Dependency graph based formulations (e.g., Sherlock [SIGCOMM2007])
• Model the network as a dependency graph at a coarse level• Simple dependency model
ratul | sigcomm | '09
![Page 11: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/11.jpg)
Example problem 1: Server misconfig
ratul | sigcomm | '09
Web server
Browser
Browser
Server config
The network model is too coarse in current formulations
![Page 12: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/12.jpg)
Example problem 2: Buggy client
ratul | sigcomm | '09
SQL server
SQL client C2
SQL client C1
Requests
The dependency model is too simple in current formulations
![Page 13: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/13.jpg)
A formulation for detailed diagnosis
Dependency graph offine-grained components
Component state is a multi-dimensional vector
ratul | sigcomm | '09
SQL svr
Exch.svr IIS
svr
IIS config
ProcessOS
Config
SQL client
C1
SQL client
C2
% CPU timeIO bytes/sec
Connections/sec404 errors/sec
![Page 14: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/14.jpg)
The goal of diagnosis
ratul | sigcomm | '09
Svr
C1
C2
Identify likely culprits for components of interest
Without using semantics of state variables No application knowledge Process
OS
Config
![Page 15: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/15.jpg)
Using joint historical behavior to estimate impact
ratul | sigcomm | '09
D S
d0a d0
b d0c s0
a s0b s0
c s0d
dna dn
b dnc
. . .
. . .
. . .
. . .
. . .d1
a d1b d1
c
sna sn
b snc sn
d
. . . .
. . . .
. . . .
. . . .
. . . .s1
a s1b s1
c s1d
Identify time periods when state of S was “similar”
How “similar” on average states of D are at those times
Svr
C1
C2
Request rate (low)Response time (high)
Request rate (high)Response time (high)
Request rate (high)H
HL
![Page 16: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/16.jpg)
Robust implementation of impact estimation
• Ignore state variables that represent redundant info• Place higher weight on state variables likely related
to faults being diagnosed• Ignore state variables irrelevant to interaction with
neighbor• Account for aggregate relationships among state
variables of neighboring components• Account for disparate ranges of state variables
ratul | sigcomm | '09
![Page 17: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/17.jpg)
Diagnose a. edge impactb. path impact
Implementation of NetMedic
ratul | sigcomm | '09
Target componentsDiagnosis timeReference time
Monitor components
Component states
Ranked list of likely culprits
![Page 18: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/18.jpg)
Evaluation setup
ratul | sigcomm | '09
IIS, SQL, Exchange, …
.
.
.
10 actively used desktops
Diverse set of faults observed in the logs
#components ~1000
#dimensions per component (avg)
35
![Page 19: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/19.jpg)
NetMedic assigns low ranks to actual culprits
ratul | sigcomm | '09
0 20 40 60 80 1000
20
40
60
80
100
NetMedicCoarse
Rank of actual culprit
Cum
ulat
ive
% o
f fa
ults
![Page 20: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/20.jpg)
NetMedic handles concurrent faults well
ratul | sigcomm | '09
2 simultaneous faults
0 20 40 60 80 1000
20
40
60
80
100
NetMedicCoarse
Rank of actual culprit
Cum
ulat
ive
% o
f fau
lts
![Page 21: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/21.jpg)
Other results in the paper
Netmedic needs a modest amount (~60 mins) of history
It compares favorably with a method that understands variable semantics
ratul | sigcomm | '09
![Page 22: Detailed diagnosis in enterprise networks](https://reader034.fdocuments.net/reader034/viewer/2022051700/56815f57550346895dce37dd/html5/thumbnails/22.jpg)
Conclusions
NetMedic enables detailed diagnosis in enterprise networks w/o application knowledge
Think small: Small enterprise networks deserve more attention
ratul | sigcomm | '09