1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.
-
date post
22-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.
![Page 1: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/1.jpg)
1
AI Approaches to Network Fault Management
Andrew Learn
29 Nov 2001
![Page 2: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/2.jpg)
2
Outline
• Fault Management Process
• AI Approaches– Expert Systems– Neural Networks– Case-based Reasoning
![Page 3: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/3.jpg)
3
Network Faults
• Hardware– Wear and tear– Cut cables– Improper installation
• Software– Incorrect design– Bugs– Incorrect data (e.g. routing tables)
![Page 4: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/4.jpg)
4
Fault Management Process
1. Collect alarms
2. Filter and correlate alarms
3. Diagnose faults
4. Restoration and repair
5. Evaluate effectiveness
![Page 5: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/5.jpg)
5
1. Collect Alarms
• Types of alarms– Physical: Failure in communication
• e.g. loss of signal, CRC failure
– Logical: Statistical values exceed threshold• e.g. number of packets dropped
• Communication with components– Control protocol: Simple Network Management
Protocol (SNMP)– Data format: Management Information Base (MIB-
II, 1990) has ~170 manageable objects
![Page 6: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/6.jpg)
6
• Sample MIB Entry
• Sample SNMP “get” call
ipInReceives OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "The total number of input datagrams received from interfaces, including those received in error." ::= { ip 3 }
snmpget netdev-kbox.cc.cmu.edu public system.sysUpTime.0
Name: system.sysUpTime.0 Timeticks: (2270351) 6:18:23
![Page 7: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/7.jpg)
7
2. Filter and Correlate Alarms
• Filter– Eliminate redundant alarms– Suppress noncritical alarms– Inhibit low-priority alarms in presence of
high-priority alarms
• Correlate– Analyze and interpret multiple alarms to
assign new meaning (derived alarm)
![Page 8: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/8.jpg)
8
3. Diagnose Faults
• May require additional tests/diagnostics on circuits or components– Automated or manual
• Analyze all info from alarms, tests, performance monitoring
• Identify smallest system module that needs to be repaired or replaced
![Page 9: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/9.jpg)
9
4. Restoration and Repair
• Restoration: Continue service in presence of fault
– Switch over to spares– Reroute around trouble spot– Restore software or data from backup
• Repair– Replace parts– Repair cables– Debug software
• Retest to verify fault is eliminated
![Page 10: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/10.jpg)
10
5. Evaluate Effectiveness
• Questions to answer :– How often do faults occur?– How many faults affect service?– How long is service interrupted?– How long to repair?
• Provides assessment of:– Performance of fault management system– Reliability of equipment
![Page 11: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/11.jpg)
11
AI Approaches to Fault Management
• Well-developed approach:– Expert systems
• New approaches:– Neural networks– Case-based reasoning– Other
![Page 12: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/12.jpg)
12
Why AI?
• Need for intelligence– Data analysis– Pattern recognition– Clustering and categorization– Problem solving
• Need for automation– Manual analysis/solution takes time– Limited manpower– Limited expertise
![Page 13: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/13.jpg)
13
Well-developed approach: Expert Systems
• Expert systems = Rule-base + Working Memory• Three parts to rules:
1. Context trigger (when should rule be considered)2. Condition ( if X . . . )3. Conclusion ( . . . then Y)
• Used since 1980’s by major telecomm companies– Bell: Automated Cable Expertise (ACE) system– GTE: Central Office Maintenance Printout Analysis &
Suggestion System (COMPASS)– AT&T: Network Management Expert System
(NEMESYS)
![Page 14: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/14.jpg)
14
Need for New Approaches
• Weaknesses of expert systems– Brittle in unforeseen situations– Cannot learn from experience– Hard to maintain (adding/deleting/modifying rules)– Knowledge acquisition bottleneck– Can’t handle incomplete or probabilistic data
• Factors driving new approach– Rapidly changing technology– Dynamic network topology– Network complexity– Competition, demand for QoS
![Page 15: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/15.jpg)
15
Neural Nets
• Structure: input, hidden, output layers
• Training– Supervised: Input pattern & desired output– Unsupervised: Clustering of similar inputs
Input
Hidden
Output
weights
![Page 16: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/16.jpg)
16
Neural Nets• Advantages
– Pattern matching & generalization– Fast & efficient– Trainable– Handles incomplete, ambiguous data
• Disadvantages– Black box– Lack of training data
![Page 17: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/17.jpg)
17
Neural Net Example
• Example: Alarm correlation in cell phone networks (Univ of Hannover, Germany)
Base Stations
Mobile units
Base Station Controller
Switching Centers
BS2
BS1 MC
BSCMicrowave Links
Maintenance Center
![Page 18: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/18.jpg)
18
Neural Net Example
BSC alarms
Initial Cause
• Test Results:
– 94 alarms
– 99.76% correct classification with up to 25% noise
ML-1 fault
ML-2 fault
BS-2 alarms
BS-1 alarms
.
.
.
.
.
.
![Page 19: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/19.jpg)
19
Case-Based Reasoning
• Case-based reasoning = matching previous examples– Case library: Set of previous faults, diagnoses,
solutions– Usually based on “trouble ticket” help-desk
databases
• Design considerations:– What are key attributes of a case?– What attributes will be used to index & access a
case?
![Page 20: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/20.jpg)
20
Case-Based Reasoning
• Advantages– Easier knowledge acquisition than expert
systems– Can learn by adding new cases– Doesn’t require extensive maintenance
• Disadvantages– Requires time-consuming user interaction – No help for first-time problems
![Page 21: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/21.jpg)
21
Case-Based Reasoning ExampleCase 134
Problem Type: Performance
Description: High error rate in comm between POA-SP & DF
No access: Intermittent
Retrieval: Case 103 [Similarity = 0.69]
Description: 64kb line from VendorX drops big datagrams.
Additional Info requested: Is there loss of big datagrams in ping test? (Result: Yes)
Cause: Link 34 inside Bldg 207 was defective
Solution: Vendor replaced cabling.
![Page 22: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/22.jpg)
22
Summary of 3 AI Methods• Expert systems
– If / then rules– Well-developed technology– Brittle, hard to maintain
• Neural networks– Output = weighted transform of inputs– Fast pattern matching, robust to noise– Black box, lack of training data
• Case-based systems– Trouble-ticket retrieval– Easy to build, maintain– Slower diagnosis, takes time to build
![Page 23: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/23.jpg)
23
Other Approaches
• Bayesian networks– Model statistical probabilities and
dependence of faults
• Mobile intelligent agents– Independent software agents cooperate to
collect info, suggest solutions
![Page 24: 1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d805503460f94a63996/html5/thumbnails/24.jpg)
24
Future Trends
• Proactive fault detection– Recognizing trouble signs and taking
corrective action before service degrades
• Hybrid systems– Multiple AI methods integrated