NM functions Configuration, Performance, Fault, Accounting, Security.
-
date post
20-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of NM functions Configuration, Performance, Fault, Accounting, Security.
NM functions
Configuration, Performance, Fault,
Accounting, Security
Configuration Management• Middle and long range activities for
controlling Physical, electrical and logical inventoriesMaintaining vendor files and trouble ticketsSupporting provisioning and order processingDefining and supervising service level
agreementsManaging changesDistributing software
• Configuration management is central to all other network management functionsAll other management are supported by
configuration details Enhances control over configuring the network
and devices Quick access to vital configuration data Helps initialization, maintenance and shutdown
of individual components and logical subsystems
Primary Information• Actual configuration • Attributes of network elements • Generated configuration • Status indicators of network elements • Vendor data • Change requests and record • Order data • Actual inventory • Status of service-level indicators
Secondary Information • Traffic Volumes
• More details on indicators
• Performance indicators of the network elements
• etc
Configuration management functions
• Inventory management
• Network topology services
• Service Level agreements
• Designing, implementing and processing trouble tickets
• Order processing and provisioning
• Change Management
Inventory management• Automated inventory – online record of
currently implemented components and spares, contact vendors, location of components, maintenance requirements for certain
equipment classes, service statistics like
• number of outages, • response for repair, • repair time distribution
Good Inventory Management• less redundancy
if same information is stored in different data bases- wastage of resource, processing time to back up the data bases
• synchronized change management • unique names and addresses
Helps during troubleshooting
• Efficient troubleshooting• Better capacity and contingency planning
Network Topology Services• Requires current and historical
configurations
• Layered configuration displays at network and component level of Electrical layouts PhysicalLogical
Display of configuration details
Network Backbone T1
T1/T3
T3
T3
Network details – click on icon
node Network details
M
M
M
M
Protocol level
PHY
DLC
Protocol
level
Auto Discovery tool• Auto- discovery tool can discover devices
on the network ( periodically)
• Auto mapping produces the network map
• Takes up bandwidth to execute all this
SLA• Need to evaluate long-term service levels• Consistency in customer service level• Increased planning and decreased crisis
management • Service levels
Responsiveness, accuracy, availability
• Performance reporting Planned and actual workload characteristics
and service levels during report period
trouble tickets • Linking trouble-tickets • Information in a trouble tickets
Time reportedTime received by responsible groupTime network service restoredTime vendor notifiedTime vendor respondedTime vendor restored serviceTotal vendor timeTotal user non-availabilityTotal service outage
Change Management
User request
Study Impact
Plan Change
Schedule
Request OK
Execute
Document
Configuration and inventory database
Tools for configuration management
• Simple tools Provide simple storage for all network related
information Manually collecting and entering data
• Complex tool Automatically gather data – latest information on
configuration Compare current configuration with stored conf Change a device’s configuration while running Specify configuration errors that should generate
warning messages –
Performance Management• Activities required to continuously evaluate
principal performance indicators to check Service level maintenance Identify potential bottlenecks Establish trend reports Network utilization and error rates
Contd..• Involves
Collection of data on current utilization of network devices and links
Analyze data to discern high utilization trendsSetting utilization thresholdsUsing off-line simulation and or analytical
studies on how to maximize performance
Primary Information• Actual Configuration • Generated configuration • Performance indicators in real-time or in near-
real-time Response time Congested channels Resource utilization
• Selected vendor data • Performance histories for selected facilities • Operational procedures
Performance Indicators• Availability• Response time• Throughput• Utilization – channel occupancy• Grade of service• Transmission volumes• Offered load • Accuracy
Indicators• Service oriented indicators
Have priority
• Efficiency oriented indicators
Service Oriented Indicators • Availability
Customers perspective depends on technical reliability of componentsRedundancy?
• Cost benefitTotal Costs = costs of redundancy + cost of
cosnequences
AvailabilityMTBF
__________________________________
MTBF+MTTD+MMTR+MTOR • MTBF – Mean time between failures• MTTD – Mean time to diagnose• MTTR- Mean time to Repair (or report)• MTOR – Mean time of Repair • Better Availability, keep MTTD, MTTR, MTOR
low,
Response Time
• Propagation Delays, Processing delays, Transmission delays, Protocol delays
User
System think time
enter time
network System response
time
output response time
End user response time
Contd..• Total Response Time
• Network Delays
• Processing delays
• Protocol delays – time outs
• Response time consideration depend onProtocols and their behaviorJob prioritiesLoads in the system
Accuracy• Accuracy can be affected by
Erroneous transmission (wireless & fiber)Characters transmitted but not deliveredCharacters received which were not sentCharacters duplicated
Residual Error RateCHE+CHV+CHN+CHD
______________________________
CHT
• CHE = erroneous characters due to media & processing
• CHV = transmitted but not received
• CHN = extra characters received
• CHD = duplicated characters
• CHT = total characters
Efficiency oriented indicators• Efficiency oriented indicators - Represent
interest of the organization
• Service oriented monitoring and and efficiency oriented monitoring conflicts?
Efficiency vs service
CPU Busy
Channel Busy
Line Busy
Service L3
Service L2
Service L1
30% 40% 70% efficiency
serv
ice
Throughput• Measure of a server’s capacity - MIPS
• Line throughput – kilobits/sec
• Application oriented Number of transaction / unit timeNumber of customer sessions per applicationNumber of calls servicedNumber of jobs provided by a node
Utilization • Dynamic measure of resources used
• Puts a practical limits on the throughput under operational conditions
• Helps study overlap among component processing, mutual waits etc.
Utilization • Utilization vs Accuracy
• Utilization vs throughput• Utilization vs Goodput
Lin
k ut
iliza
tion
Err
ors
per
seco
nd
Time in seconds Time in seconds
100 10
Overlap effects
Input Subsystem
Output Subsystem
CPU Output Link
Slow link?
Availability• Availability of system depends on
availability of individual components (Very difficult to measure and report on
availability)Check on each component and compare with
configurationDepends on how components are connected
Example
• Each Component availability = 0.98
• Availability of the serial combination is 0.98 * 0.98 = 0.96
Example : 2 modems . Serial processing of data
A A
Configuration 1
• Prob 1 link is not available = 0.02• Prob both links are no available is
0.02 * 0.02 = 0.0004
• Availability = 1- 0.0004 = 0.9996
A
A Configuration 2
Performance measurements• Data Gathering
Exhaustive Statistical
• Distribution for sampling times
• Correlation effects
• Performance AnalysisData presentationInterpretation
Contd..• Historical trends
• Real time trends
• Graphical presentation and comparison
• Linking different performance indicatorsThen set thresholds
Simulation studies
• To improve the performance or identify bottlenecks – model the network and components – (primary)Study effects of changes in the modelTarget Optimal performanceRequires Synthetic traffic generation
• Analytical and simulation tools
Simple tools for PM• Provides real-time information on network
componentsGraphical – bars, histograms
• Can help find bottlenecks• Main information
Processor utilizationMemory utilizationLink – pkts/sec, bits/sec Bit error rates
Complex Tools• Set threshold
• Take action once thresholds exceedAlarm Enable backup
• Near threshold warning
• Store historical daya
A complex tool at work• Performance problem
• Brief periods on interrupted service between systems – no information passes through –3 pm and 12 am
Daisy Gatsby
Mainframe
PM tool at work• Check error rates in the network
Normal
• Check utilizationPeaks at 3pm and 12 am – times of back up
• Check Gatsby and Daisy utilizationPeaked to 100% at the specified times
• Check for processor intensive applicationsnegative
Contd..• Check network traffic type
Located an unknown protocol packetFlooding the network – locating serversCheck originatorSend message to himOr block his traffic
Fault management • Activities needed to dynamically maintain
the network service level
• High network availability
Primary Information• Actual configuration• Generated configuration• Event reports and alarms• Status indicators of network elements• Performance indicators• Spare components and their status• Backup routes and their status• Vendor data for problem dispatch• Global traffic volumes• Progress of trouble resolution
Steps in FM• Identify the occurrence of fault
• Isolate the cause of fault
• Correct the fault if possible
• First is difficult, second is very difficult!
Network Status Supervision• Layered configuration maps (status)
(Tightly coupled to topology display)
• Zoom in on parts to isolate problems• Real time traffic status displays• Good monitoring devices/sensors• Monitored information to be passed on to
agents, or management elements • Process and distribute messages, events and
alarms
Status• Is a measurement of the behavior of an object at a specific
instance in time Represented by a set of status information items and
their values at a specific time Network
Status Element Status
CSU1 down
CSU2 down
No Carrier
Element 0
Element 1
Element 2
Event• Change in the status of the element – which justifies
notification i.e. significant to fault management• Event report can be generated
Type of eventChange in statusTime stampReporting entity -Object or process that generated eventManaged object whose status changedManaged object informationProbable causeEffect of event on the managed object
Event Filtering• Multi-layered filtering
E
E
E E E E
E E
E
P
E
E E
1
2
3
Activity on Network
Threshold Filter
Grouping Filter
Prioritizing Filter
Prioritized problems
Filtering Process
time
Bit
err
ors
Investigated, no action
investigated
Action taken
Action effective
Filtering Process• Global filtering
First process on an event – is the event serious and does it have to be processed
Use a set of criteria for this assessmentCan not be function specific
Filtering Process• Distribution Filtering
An event processor selects the event it wishes to receive
There are various event processes running simultaneously
• Event process filteringFiltering done by the event processorSpecific to the functional
Event Processor• Examine and process event reports
• Passive processingSampling and logging
• Proactive processingTakes automatic corrective action
Process for filtering
Event Distribution Unit
Event Reports
Q Distribution Q
Q
Q Event Processor
Event Processor
Event Processor
Distribution Subscription
services Global Filtering
events
Event effect• Permanent – external action required
• Temporary – will correct automatically
• Impending – will result in failure soon
• Impaired – services can be provided at reduced levels
• Inhibited – services stopped
Dynamic Troubleshooting• Opens trouble tickets, links them, dispatches to the
proper vendors, checks on-line progress of trouble tickets
• Problem detection – Is something wrong?
• Problem determinationWhat is wrong and where is the problem in the
network?
• Problem diagnosis & resolutionTo isolate, fix or provide backup and fix
End-to-end testing• To verify dynamically correct network
operationConducted during normal network operation,
without affecting it
• Can we have over-head free testing?• What components should be tested?• How should tasks be assigned?
Local sitesCentral sites
Contd..• When to monitor and test?
Continually, periodically, on demand
• How to monitor and testDisruptive, non-disruptive
• What indicators to monitor and test?Service level, efficiency, loops, circuits
• What instruments to use?Hw, sw, analog, digital
• What reports are to be generated?Standard, adhoc with special evaluations
• What are the triggering events?Time, single or combined events, alarms
Types of faults• Unobservable
Deadlocks between processesInstrument not capable of recording the events
• Partially observableNode failure – actual reason – low level
protocol
• Uncertainty in observationLack of device response
• Device is down, network partitioned, congestion delays, local timer faulty
Issues in isolating faults• Multiple potential faults
Number of elements failing
• Too many related observationsOne fault manifests itself as various events
• Interference between diagnosis and local recovery proceduresError recovery sets in before diagnosis
• Absence of automated tools
Example FM• Problem scenario – sergeant fails due to buffer overflow
Sergeant
LAN2
Pepper
Network Management System
LAN3
LAN1
Contd..• Buffer is sergeant is well provisioned for
Fails due to traffic surge
• Pepper reports link failure to LAN3Message sent to NM system
• NMS asks pepper to check on carrier presence in Link to LAN3Carrier Absence reported
• NMS ask Pepper to perform loopback on link3ok
Contd..• NM resets Sergeant
• ?
• Actual reason for failure not identified
• This could have been avoided if there was an event from sergeant of utilization in excess of 80% or 90%
Simple tool• Points out problem existence
Eg ICMP ping tells you about the existence of a system
• Complex tool may perform all functions shown in the previous example