Performance and Capacity Management Guideline Perform... · Performance and Capacity Management...
Transcript of Performance and Capacity Management Guideline Perform... · Performance and Capacity Management...
Performance and Capacity
Management Guideline
Trunked Network (GRN)
March 2015
Document Reference: 4-3-2-27
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 1
Contents
1 Introduction ............................................................................................................................ 2
1.1 Purpose ............................................................................................................................................................... 2
1.2 Audience ........................................................................................................................................................... 2
1.3 Scope .................................................................................................................................................................. 2
1.4 Objective and guiding principles .................................................................................................................. 2
2 Performance and Capacity Management .................................................................... 3
2.1 Overview ................................................................................................................................. 3
2.2 Metrics and thresholds ..................................................................................................................................... 4
2.3 Reporting Requirements.................................................................................................................................. 6
2.4 Investigate and Implement ............................................................................................................................ 7
2.5 Approved solutions .......................................................................................................................................... 8
2.5.1 Short tem ............................................................................................................................................................ 8
2.5.2 Medium term ..................................................................................................................................................... 9
2.5.3 Long term ........................................................................................................................................................... 9
3.0 Emergency Incidents and special Events ........................................................................ 9
3.1 Overview ............................................................................................................................................................ 9
3.2 Temporary solutions implementation and rollback ................................................................................... 10
3.2.1 Agency operational modification ................................................................................................................ 10
3.2.2 Talkgroup re-prioritisation and reconfiguration .......................................................................................... 10
3.2.3 Site preferencing .............................................................................................................................................. 11
3.2.4 Control channel power re-configuration .................................................................................................... 11
3.2.5 Installing temporary capacity ........................................................................................................................ 11
4 Glossary and Acronyms ....................................................................................................... 12
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 2
1 Introduction
1.1 Purpose
This guideline establishes a set of metrics, associated thresholds, reporting requirements and processes for
performance and capacity management of government owned trunked radio networks. This guideline
will ensure the availability of services provided by the Government Radio Network (GRN) for NSW
Government Agencies and ensure services meet customer performance expectations.
It also covers the day to day network performance and capacity, including long term trend analysis and
forecasting. This document also provides information for a number of scenarios and solutions. These
solutions will cover standard temporary configuration changes that are approved for use to address
emergency incidents, special event and ad-hoc performance rectification requirements. Additionally
preferred actions and standardised solutions will be provided to address forecasted capacity issues.
1.2 Audience
The intended audience of this guideline is limited to:
the engineering and operations teams of the NSW Telecommunications Authority (TA)
third party operations teams responsible for managing the GRN (Airwave and Motorola)
NSW Government agencies who use the GRN
This document should be used as a baseline for the management of network performance and capacity
and associated reporting requirements in parallel to existing contractual requirements.
1.3 Scope
The scope is limited to the performance and capacity management of the NSW GRN Astro. The scope of
this document will be limited to defining metrics that relate to the performance and capacity
management of the network. It does not include metrics relating to the performance and capacity
management of supporting networks and systems (backhaul, Operational Support Systems etc). Where it
is identified that the root cause of a performance or capacity issue is a result of the performance of a
supporting network, system or service the issue shall be referred to the responsible party.
1.4 Objective and guiding principles
The objectives and guiding principles of this guideline are to:
ensure capacity augmentation occurs prior to customer impacting degradation in network
performance;
establish an optimal balance between network performance and network CAPital EXpenditure
(CAPEX);
achieve optimal efficiency by maximising the use of existing network resources prior to making
recommendations for additional capacity
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 3
ensure a consistent end user experience network wide that is independent of the geographic
location; and,
ensure a consistent network wide architecture, simplifying operational management of the
network.
2 Performance and Capacity Management
2.1 Overview
The following section identifies metrics and Key Performance Indicators (KPIs) that will be used to measure
performance and available capacity. Thresholds will be defined to be used as a baseline for identifying
sites that require further investigation. The process, approved actions and accountabilities will also be
identified for each metric to ensure that issues are resolved according to a standard process in a prompt
and efficient manner.
A general process description for managing the performance and capacity of the trunked network is
outlined in Figure 1 below. These steps provide a high level overview of the approach that will be taken to
manage the performance and capacity across the GRN network. However, not all issues can be
addressed via a standard approach. As such, the process must allow for some flexibility where required to
address unique or unusual issues by exception.
Figure 1: Process description
Define
•Identify metrics, KPIs and thresholds
Report
•Report on the metrics, KPIs and threshold breaches that have been defined in the above and run reports at agreed periods relevant to the specific metric
Investigate
•Investigate performance and capacity issues at a site, cluster and network level
•Apply standard solutions and use engineering judgement to provide solutions to address performance and/or capacity issues
Implement
• Submit change request to implment network configuration changes
•Instantiate projects to implement network changes (physical)
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 4
2.2 Metrics and thresholds
Metrics used to define the performance of the shared trunked network are listed in the following table. They have been categorised according to performance driver.
Category Metric Threshold Threshold
Aggregation Definition Reference table, counters, formula Spatial Aggregation Temporal Aggregation
Service
Agreement
Network
availability
>99.95% network,
monthly
site, monthly
The proportion of the time that all the base stations
comprising the GRN are available, not including
scheduled maintenance in each calendar month.
𝐴 =(((𝐵 𝑥 𝑇) − 𝑃) − 𝑈) 𝑥 100%
(𝐵𝑥𝑇) − 𝑃
Where:
B = total base stations in the network
T = total minutes in the month
P = sum of planned base station outage minutes
U = sum of unplanned base station outage minutes
network monthly
Service
Agreement
Grade of
Service (GoS)
<0.1% site, monthly The proportion of voice calls placed from any GRN site
within the client GRN coverage area which will
experience a network busy condition over any given
calendar month.
𝐺𝑂𝑆 =𝐶𝑛
𝑇𝑛 𝑥 100%
Where:
Cn = total number of busy calls for site n in the
calendar month
Tn = total number of calls for site n in the calendar
month
site monthly
Accessibility PTT network
access
<2% site, hourly The proportions of talkgroup calls that are queued
(experience a "busy"). This is an engineering metric utilised
to manage network performance above the levels in the
current service agreement.
AirCall site
zone
network
hourly
daily
monthly
Accessibility PTT network
access
queue
duration
<3s site, hourly The duration of the hold time for queued calls is greater
than the defined threshold.
AirCall
Duration and Airsec
Busy time = Duration - AirSec
Busy time (zone) = Average of busy time
occurrences per zone
Busy time (network) = Average of busy time
occurrence across the network
site
zone
network
hourly
daily
monthly
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 5
Category Metric Threshold Threshold
Aggregation Definition Reference table, counters, formula Spatial Aggregation Temporal Aggregation
Retainability Abnormal
disconnectio
n
<10% site, daily The proportions of talkgroup calls that disconnect due to
a reason other than disconnect complete. Signalling
between end user and network is expected to
disconnect gracefully.
CallEndReason
Disconnect Complete (18), Due to Emergency
knock down (33)
Abnormal disconnection (%) = [total (AirCall) -
[Disconnect Complete (18) + Due to Emergency
knock down (33)]]/ [total (AirCall)
Site
zone
network
hourly
daily
monthly
Utilisation Protected
base
utilisation
<25% site, hourly The percentage utilisation of the 4th protected base as
configured in the network.
tbc site
zone
network
hourly
daily
monthly
Table 1: Metrics, KPIs and threshold definitions
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 6
2.3 Reporting Requirements
Each of the metrics outlined in Table 1: Metrics, KPIs and Threshold definitions above are to be reported in
line with the requirements detailed in Table 2: Metric and KPIs reporting requirements below. The period of
reporting is tailored to the purpose of the individual metric (whether it is performance or capacity
related). In addition the sites for which the defined thresholds are exceeded; are to be provided in a
separate report with a summary of the period, metric and duration for which the threshold crossing
occurred. Finally a report identifying the “top 10” sites closest to but not yet exceeding any thresholds and
“top 10” sites with the greatest deviation (+ or -) between the current and previous monthly averages,
should be used as an indicator of network performance and changes.
Metric Spatial Agg Temp Agg Reporting
Period
Trending
Period Extrapolation
Network
availability network monthly monthly 12 months
Linear, 6
months
Grade of Service
(GoS) site monthly monthly 12 months
Linear, 6
months
PTT network
access site hourly monthly 6 months n/a
daily 12 months n/a
zone hourly 6 months n/a
daily 12 months n/a
network hourly 6 months n/a
daily 12 months n/a
PTT network
access queue
duration
site hourly monthly 6 months n/a
daily 12 months n/a
zone hourly 6 months n/a
daily 12 months n/a
network hourly 6 months n/a
daily 12 months n/a
Abnormal
disconnect site hourly monthly 6 months n/a
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 7
Metric Spatial Agg Temp Agg Reporting
Period
Trending
Period Extrapolation
daily 12 months n/a
zone hourly 6 months n/a
daily 12 months n/a
network hourly 6 months n/a
daily 12 months n/a
Protected base
utilisation (base 4) site hourly monthly 6 months n/a
daily 12 months n/a
zone hourly 6 months n/a
daily 12 months n/a
network hourly 6 months n/a
daily 12 months n/a
Table 2: Metrics and KPIs reporting requirements
2.4 Investigate and Implement
The process below will be used to identify the preferred solution (if any) to the performance or capacity
issue when thresholds are exceeded; or trending indicates that a threshold crossing is likely in the given
forecast period.
Identify root cause of threshold crossing
o Rule out any temporary or semi temporary data collection or reporting issues, incidents or
events that may have impacted the reported results. If the root cause of the threshold
crossing is due to an anomaly, emergency incident or special event but does not warrant
any further changes, mark the site as reviewed and continue to monitor for a further three
reporting periods.
o Use engineering judgement and determine if the threshold crossing requires further
investigation.
o Review reports related to the specific threshold crossing across different temporal and
spatial aggregation levels and provide a summary of the issue and root cause.
Solution development
o Review approved permanent solution options and use engineering judgement if the
options will address the performance or capacity issue experienced.
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 8
o Where possible, solutions to reduce demand should be investigated in preference to
capacity augmentation.
o Use trend forecast to establish whether a project needs to be instantiated to address the
issue this reporting period. Lead times associated with the implementation of additional
channel capacity or new site build should be assumed to be 4 months and 24 months
respectively.
Solution implementation
o Instantiate a project or activity to address the performance or capacity issues that have
been identified. Ensure the above lead times are taken into account.
2.5 Approved solutions
This section provides the approved short, medium and long term solutions to resolve specific performance
or capacity problems. Note, not all issues can be addressed with a standardised solution. Flexibility
remains for the development of bespoke solutions to address specific issues, however this approach is
expected to occur by exception. Standardising the number of solutions is considered desirable to ensure
consistent network architecture is maintained.
2.5.1 Short tem
There are two approved short terms measures to alleviate performance or capacity issues prior to
implementing of a long term solution. These are a subset of options available during emergency incidents
or special events. Some of the available options under the emergency incident and special events
section are deemed unsuitable for longer duration implementation (i.e. reduction in control channel
transmit power).
2.5.1.1 Talkgroup re-prioritisation and reconfiguration
Review the agency talkgroup priorities currently configured for day to day use. Where it is deemed
suitable, re-prioritise key agencies to minimise the impact of performance or capacity degradation for
key government radio users. The configuration changes should be reverted back to the original settings
post implementation of the long term solution. Changes to talkgroup attributes should be in accordance
with each agencies operational talkgroup plan if one exists.
2.5.1.2 Site preferencing
Site preferencing can be used to “steer” traffic between adjacent sites. Four parameters are configurable
into the subscriber radios:
Always preferred
Preferred
No preference
Least preferred.
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 9
It should be noted that site preferencing is only a contributing factor to the overall site selection process.
The site quality metric is a combination of other factors including received signal strength indication (RSSI).
As such using this method for congestion relief is expected to yield minimal benefit and is likely to take
many months to implement. This method is only recommended for use where significant overlapping
coverage is present and traffic distribution between adjacent sites is poor. However the overall capacity
between sites is sufficient to support the current traffic profile.
2.5.2 Medium term
It is recommended that additional base stations and associated passive and active equipment are
installed to alleviate performance or capacity issues related to the number of base stations (channels)
available. However, four months lead time is required to design, procure and implement base stations
and needs to be considered when reviewing the capacity requirements.
2.5.3 Long term
The long term solution is to implement a new site to address performance and capacity issues. However,
this is the least preferred solution as long lead times are required to design, acquire, construct and
commission a new site. Where this is the preferred network augmentation path, the instantiation of the
project needs to occur with sufficient time to allow the integration of the new site prior to the thresholds
being exceeded. The new site design must ensure that sufficient traffic will be off loaded from the
congested site to improve the necessary metrics.
3.0 Emergency Incidents and special Events
3.1 Overview
The GRN is dimensioned to cater for traffic generated during emergency incidents and special events.
However, not all peak demand scenarios can reasonably be met due to the variability in the type and
location of emergency incidents.
During emergency incidents and special events, it is expected that ad-hoc reporting at an interval
agreed between the Authority and the Network Manager will occur to closely monitor the network
performance and capacity. When issues are identified the same principles outlined in Section 2.5 are to
be applied to resolving the issues with additional options available for temporary relief.
In addition data related to emergency incidents and special events is to be captured and stored to
facilitate and support future network dimensioning and forecasting of peak incidents. The data capture
requirements for emergency incidents and special events for all sites affected are:
Raw data for all counters should be stored for the incident or special event covering the duration
of the event and the week prior and following
Hourly data should be stored for the incident or special event covering the duration of the event
and the month prior and following
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 10
Daily data should be stored for the incident or special event covering the duration of the event
and the year prior and following.
The data captured for emergency incidents and special events are in addition to the normal data
capture and reporting requirements.
3.2 Temporary solutions implementation and rollback
A number of options exist for implementing temporary solutions to alleviate performance and capacity
issues. The implementation of these network changes often has other trade-offs associated with them.
Therefore, any deviation from the standard network configuration for this purpose is to be signed off by
the Authority with an agreed rollback plan prior to implementation.
The process to be followed is outlined below:
Identify root cause of threshold crossing
o Use engineering judgement and make a determination if the threshold crossing requires
further investigation
o Review network statistics to identify the root cause
Review approved temporary solution options and use engineering judgement if the options will
address the performance or capacity issue being experienced
Propose solution and seek approval from the Telco Authority
Implement changes via the network change request process
Monitor network performance and capacity pre and post change implementation, confirm
changes and network behaviour is in line with expectations
Recommend additional changes if warranted
Post emergency incident or special event revert changes to standard configuration.
3.2.1 Agency operational modification
During periods of extended network performance degradation and/or congestion the Authority will
review the causes of congestion to establish the agencies and specifically talkgroups that have the
highest contribution to the traffic profile. Agencies will be requested to review their operational talkgroup
plan and where possible to optimise (minimise) their incident or special event talkgroup utilisation. Where
possible agencies should use the Emergency Service Organisation (ESO) talkgroups and limit the number
of talkgroups in use at a single incident through operational procedure or other means (multi groups or
agency groups).
3.2.2 Talkgroup re-prioritisation and reconfiguration
Another option is to review the agency talkgroup priorities currently configured for day to day use. The
deemed lead agency for a particular incident or special event can be given the highest priority for the
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 11
duration of that event (either top of queue or ruthless pre-emption). Changes to agency talkgroups’
priorities will be in accordance with each agencies Operational Talkgroup Plan.
3.2.3 Site preferencing
Site preferencing can be used to “steer” traffic between adjacent sites and it is only a contributing factor
to the overall site selection process. There are four parameters configurable into the subscriber radios.
They are:
Always preferred
Preferred
No preference
Least preferred.
The site quality metric is a combination of other factors including RSSI. As such using this method for
congestion relief is expected to yield minimal benefit and would require reprogramming of radios during
an emergency incident or prior to a special event. This method is only recommended for use where
significant overlapping coverage is present and traffic distribution between adjacent sites is poor.
However the overall capacity between sites is sufficient to support the current traffic profile.
3.2.4 Control channel power re-configuration
An alternative method to 3.2.1 - 3.2.3 to reduced traffic from an individual site is to reduce the control
channel transmit power. This technique should only be used where incident or special event specific
traffic will still be carried by the preferred serving site. This change will reduce the coverage footprint of
the site for call setups and will likely lead to a reduction in the overall GRN coverage footprint, the impact
of this should be carefully reviewed and all agencies should be notified prior to implementation.
3.2.5 Installing temporary capacity
Temporary capacity (additional base stations) are available for installation at existing GRN sites (inclusive
of the necessary multi-coupling) to provide temporary capacity relief at sites that are affected by a
prolonged emergency incident (i.e. large bush fire). The suitability of this solution is to be assessed by the
TA on a site by site basis including the co-ordination and assignment of the necessary frequencies to
support operation of the new base stations. Approval for usage of this solution to provide temporary
capacity relief is to be sought from the TA team prior to installation. The Authority is also responsible for the
co-ordination of spectrum through the SMO and installation (via the Network Manager).
4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 12
4 Glossary and Acronyms
Term Definition
Spatial Aggregation
Spatial aggregation provides a definition between various levels of possible
aggregation relative to the topology of the network. Valid spatial aggregation
levels for the GRN are:
Site
Zone
Core
Network.
All reports should meet the above spatial aggregation requirements
Temporal
Aggregation
Defines the period over which the metric, KPI, counter etc is recorded, stored or
reported. Valid temporal aggregation values for the GRN network are:
15min (raw)
Hourly
Daily
Hourly
Monthly.
All reported data should meet the above aggregation definitions.
GRN Government Radio Network
NSWTA New South Wales Telecommunications Authority
OSS Operational Support Systems
KPIs Key Performance Indicators
talkgroup talkgroup is an assigned group on a trunked radio system