Performance and Capacity Management Guideline Perform... · Performance and Capacity Management...

14
Performance and Capacity Management Guideline Trunked Network (GRN) March 2015 Document Reference: 4-3-2-27

Transcript of Performance and Capacity Management Guideline Perform... · Performance and Capacity Management...

Performance and Capacity

Management Guideline

Trunked Network (GRN)

March 2015

Document Reference: 4-3-2-27

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 1

Contents

1 Introduction ............................................................................................................................ 2

1.1 Purpose ............................................................................................................................................................... 2

1.2 Audience ........................................................................................................................................................... 2

1.3 Scope .................................................................................................................................................................. 2

1.4 Objective and guiding principles .................................................................................................................. 2

2 Performance and Capacity Management .................................................................... 3

2.1 Overview ................................................................................................................................. 3

2.2 Metrics and thresholds ..................................................................................................................................... 4

2.3 Reporting Requirements.................................................................................................................................. 6

2.4 Investigate and Implement ............................................................................................................................ 7

2.5 Approved solutions .......................................................................................................................................... 8

2.5.1 Short tem ............................................................................................................................................................ 8

2.5.2 Medium term ..................................................................................................................................................... 9

2.5.3 Long term ........................................................................................................................................................... 9

3.0 Emergency Incidents and special Events ........................................................................ 9

3.1 Overview ............................................................................................................................................................ 9

3.2 Temporary solutions implementation and rollback ................................................................................... 10

3.2.1 Agency operational modification ................................................................................................................ 10

3.2.2 Talkgroup re-prioritisation and reconfiguration .......................................................................................... 10

3.2.3 Site preferencing .............................................................................................................................................. 11

3.2.4 Control channel power re-configuration .................................................................................................... 11

3.2.5 Installing temporary capacity ........................................................................................................................ 11

4 Glossary and Acronyms ....................................................................................................... 12

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 2

1 Introduction

1.1 Purpose

This guideline establishes a set of metrics, associated thresholds, reporting requirements and processes for

performance and capacity management of government owned trunked radio networks. This guideline

will ensure the availability of services provided by the Government Radio Network (GRN) for NSW

Government Agencies and ensure services meet customer performance expectations.

It also covers the day to day network performance and capacity, including long term trend analysis and

forecasting. This document also provides information for a number of scenarios and solutions. These

solutions will cover standard temporary configuration changes that are approved for use to address

emergency incidents, special event and ad-hoc performance rectification requirements. Additionally

preferred actions and standardised solutions will be provided to address forecasted capacity issues.

1.2 Audience

The intended audience of this guideline is limited to:

the engineering and operations teams of the NSW Telecommunications Authority (TA)

third party operations teams responsible for managing the GRN (Airwave and Motorola)

NSW Government agencies who use the GRN

This document should be used as a baseline for the management of network performance and capacity

and associated reporting requirements in parallel to existing contractual requirements.

1.3 Scope

The scope is limited to the performance and capacity management of the NSW GRN Astro. The scope of

this document will be limited to defining metrics that relate to the performance and capacity

management of the network. It does not include metrics relating to the performance and capacity

management of supporting networks and systems (backhaul, Operational Support Systems etc). Where it

is identified that the root cause of a performance or capacity issue is a result of the performance of a

supporting network, system or service the issue shall be referred to the responsible party.

1.4 Objective and guiding principles

The objectives and guiding principles of this guideline are to:

ensure capacity augmentation occurs prior to customer impacting degradation in network

performance;

establish an optimal balance between network performance and network CAPital EXpenditure

(CAPEX);

achieve optimal efficiency by maximising the use of existing network resources prior to making

recommendations for additional capacity

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 3

ensure a consistent end user experience network wide that is independent of the geographic

location; and,

ensure a consistent network wide architecture, simplifying operational management of the

network.

2 Performance and Capacity Management

2.1 Overview

The following section identifies metrics and Key Performance Indicators (KPIs) that will be used to measure

performance and available capacity. Thresholds will be defined to be used as a baseline for identifying

sites that require further investigation. The process, approved actions and accountabilities will also be

identified for each metric to ensure that issues are resolved according to a standard process in a prompt

and efficient manner.

A general process description for managing the performance and capacity of the trunked network is

outlined in Figure 1 below. These steps provide a high level overview of the approach that will be taken to

manage the performance and capacity across the GRN network. However, not all issues can be

addressed via a standard approach. As such, the process must allow for some flexibility where required to

address unique or unusual issues by exception.

Figure 1: Process description

Define

•Identify metrics, KPIs and thresholds

Report

•Report on the metrics, KPIs and threshold breaches that have been defined in the above and run reports at agreed periods relevant to the specific metric

Investigate

•Investigate performance and capacity issues at a site, cluster and network level

•Apply standard solutions and use engineering judgement to provide solutions to address performance and/or capacity issues

Implement

• Submit change request to implment network configuration changes

•Instantiate projects to implement network changes (physical)

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 4

2.2 Metrics and thresholds

Metrics used to define the performance of the shared trunked network are listed in the following table. They have been categorised according to performance driver.

Category Metric Threshold Threshold

Aggregation Definition Reference table, counters, formula Spatial Aggregation Temporal Aggregation

Service

Agreement

Network

availability

>99.95% network,

monthly

site, monthly

The proportion of the time that all the base stations

comprising the GRN are available, not including

scheduled maintenance in each calendar month.

𝐴 =(((𝐵 𝑥 𝑇) − 𝑃) − 𝑈) 𝑥 100%

(𝐵𝑥𝑇) − 𝑃

Where:

B = total base stations in the network

T = total minutes in the month

P = sum of planned base station outage minutes

U = sum of unplanned base station outage minutes

network monthly

Service

Agreement

Grade of

Service (GoS)

<0.1% site, monthly The proportion of voice calls placed from any GRN site

within the client GRN coverage area which will

experience a network busy condition over any given

calendar month.

𝐺𝑂𝑆 =𝐶𝑛

𝑇𝑛 𝑥 100%

Where:

Cn = total number of busy calls for site n in the

calendar month

Tn = total number of calls for site n in the calendar

month

site monthly

Accessibility PTT network

access

<2% site, hourly The proportions of talkgroup calls that are queued

(experience a "busy"). This is an engineering metric utilised

to manage network performance above the levels in the

current service agreement.

AirCall site

zone

network

hourly

daily

monthly

Accessibility PTT network

access

queue

duration

<3s site, hourly The duration of the hold time for queued calls is greater

than the defined threshold.

AirCall

Duration and Airsec

Busy time = Duration - AirSec

Busy time (zone) = Average of busy time

occurrences per zone

Busy time (network) = Average of busy time

occurrence across the network

site

zone

network

hourly

daily

monthly

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 5

Category Metric Threshold Threshold

Aggregation Definition Reference table, counters, formula Spatial Aggregation Temporal Aggregation

Retainability Abnormal

disconnectio

n

<10% site, daily The proportions of talkgroup calls that disconnect due to

a reason other than disconnect complete. Signalling

between end user and network is expected to

disconnect gracefully.

CallEndReason

Disconnect Complete (18), Due to Emergency

knock down (33)

Abnormal disconnection (%) = [total (AirCall) -

[Disconnect Complete (18) + Due to Emergency

knock down (33)]]/ [total (AirCall)

Site

zone

network

hourly

daily

monthly

Utilisation Protected

base

utilisation

<25% site, hourly The percentage utilisation of the 4th protected base as

configured in the network.

tbc site

zone

network

hourly

daily

monthly

Table 1: Metrics, KPIs and threshold definitions

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 6

2.3 Reporting Requirements

Each of the metrics outlined in Table 1: Metrics, KPIs and Threshold definitions above are to be reported in

line with the requirements detailed in Table 2: Metric and KPIs reporting requirements below. The period of

reporting is tailored to the purpose of the individual metric (whether it is performance or capacity

related). In addition the sites for which the defined thresholds are exceeded; are to be provided in a

separate report with a summary of the period, metric and duration for which the threshold crossing

occurred. Finally a report identifying the “top 10” sites closest to but not yet exceeding any thresholds and

“top 10” sites with the greatest deviation (+ or -) between the current and previous monthly averages,

should be used as an indicator of network performance and changes.

Metric Spatial Agg Temp Agg Reporting

Period

Trending

Period Extrapolation

Network

availability network monthly monthly 12 months

Linear, 6

months

Grade of Service

(GoS) site monthly monthly 12 months

Linear, 6

months

PTT network

access site hourly monthly 6 months n/a

daily 12 months n/a

zone hourly 6 months n/a

daily 12 months n/a

network hourly 6 months n/a

daily 12 months n/a

PTT network

access queue

duration

site hourly monthly 6 months n/a

daily 12 months n/a

zone hourly 6 months n/a

daily 12 months n/a

network hourly 6 months n/a

daily 12 months n/a

Abnormal

disconnect site hourly monthly 6 months n/a

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 7

Metric Spatial Agg Temp Agg Reporting

Period

Trending

Period Extrapolation

daily 12 months n/a

zone hourly 6 months n/a

daily 12 months n/a

network hourly 6 months n/a

daily 12 months n/a

Protected base

utilisation (base 4) site hourly monthly 6 months n/a

daily 12 months n/a

zone hourly 6 months n/a

daily 12 months n/a

network hourly 6 months n/a

daily 12 months n/a

Table 2: Metrics and KPIs reporting requirements

2.4 Investigate and Implement

The process below will be used to identify the preferred solution (if any) to the performance or capacity

issue when thresholds are exceeded; or trending indicates that a threshold crossing is likely in the given

forecast period.

Identify root cause of threshold crossing

o Rule out any temporary or semi temporary data collection or reporting issues, incidents or

events that may have impacted the reported results. If the root cause of the threshold

crossing is due to an anomaly, emergency incident or special event but does not warrant

any further changes, mark the site as reviewed and continue to monitor for a further three

reporting periods.

o Use engineering judgement and determine if the threshold crossing requires further

investigation.

o Review reports related to the specific threshold crossing across different temporal and

spatial aggregation levels and provide a summary of the issue and root cause.

Solution development

o Review approved permanent solution options and use engineering judgement if the

options will address the performance or capacity issue experienced.

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 8

o Where possible, solutions to reduce demand should be investigated in preference to

capacity augmentation.

o Use trend forecast to establish whether a project needs to be instantiated to address the

issue this reporting period. Lead times associated with the implementation of additional

channel capacity or new site build should be assumed to be 4 months and 24 months

respectively.

Solution implementation

o Instantiate a project or activity to address the performance or capacity issues that have

been identified. Ensure the above lead times are taken into account.

2.5 Approved solutions

This section provides the approved short, medium and long term solutions to resolve specific performance

or capacity problems. Note, not all issues can be addressed with a standardised solution. Flexibility

remains for the development of bespoke solutions to address specific issues, however this approach is

expected to occur by exception. Standardising the number of solutions is considered desirable to ensure

consistent network architecture is maintained.

2.5.1 Short tem

There are two approved short terms measures to alleviate performance or capacity issues prior to

implementing of a long term solution. These are a subset of options available during emergency incidents

or special events. Some of the available options under the emergency incident and special events

section are deemed unsuitable for longer duration implementation (i.e. reduction in control channel

transmit power).

2.5.1.1 Talkgroup re-prioritisation and reconfiguration

Review the agency talkgroup priorities currently configured for day to day use. Where it is deemed

suitable, re-prioritise key agencies to minimise the impact of performance or capacity degradation for

key government radio users. The configuration changes should be reverted back to the original settings

post implementation of the long term solution. Changes to talkgroup attributes should be in accordance

with each agencies operational talkgroup plan if one exists.

2.5.1.2 Site preferencing

Site preferencing can be used to “steer” traffic between adjacent sites. Four parameters are configurable

into the subscriber radios:

Always preferred

Preferred

No preference

Least preferred.

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 9

It should be noted that site preferencing is only a contributing factor to the overall site selection process.

The site quality metric is a combination of other factors including received signal strength indication (RSSI).

As such using this method for congestion relief is expected to yield minimal benefit and is likely to take

many months to implement. This method is only recommended for use where significant overlapping

coverage is present and traffic distribution between adjacent sites is poor. However the overall capacity

between sites is sufficient to support the current traffic profile.

2.5.2 Medium term

It is recommended that additional base stations and associated passive and active equipment are

installed to alleviate performance or capacity issues related to the number of base stations (channels)

available. However, four months lead time is required to design, procure and implement base stations

and needs to be considered when reviewing the capacity requirements.

2.5.3 Long term

The long term solution is to implement a new site to address performance and capacity issues. However,

this is the least preferred solution as long lead times are required to design, acquire, construct and

commission a new site. Where this is the preferred network augmentation path, the instantiation of the

project needs to occur with sufficient time to allow the integration of the new site prior to the thresholds

being exceeded. The new site design must ensure that sufficient traffic will be off loaded from the

congested site to improve the necessary metrics.

3.0 Emergency Incidents and special Events

3.1 Overview

The GRN is dimensioned to cater for traffic generated during emergency incidents and special events.

However, not all peak demand scenarios can reasonably be met due to the variability in the type and

location of emergency incidents.

During emergency incidents and special events, it is expected that ad-hoc reporting at an interval

agreed between the Authority and the Network Manager will occur to closely monitor the network

performance and capacity. When issues are identified the same principles outlined in Section 2.5 are to

be applied to resolving the issues with additional options available for temporary relief.

In addition data related to emergency incidents and special events is to be captured and stored to

facilitate and support future network dimensioning and forecasting of peak incidents. The data capture

requirements for emergency incidents and special events for all sites affected are:

Raw data for all counters should be stored for the incident or special event covering the duration

of the event and the week prior and following

Hourly data should be stored for the incident or special event covering the duration of the event

and the month prior and following

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 10

Daily data should be stored for the incident or special event covering the duration of the event

and the year prior and following.

The data captured for emergency incidents and special events are in addition to the normal data

capture and reporting requirements.

3.2 Temporary solutions implementation and rollback

A number of options exist for implementing temporary solutions to alleviate performance and capacity

issues. The implementation of these network changes often has other trade-offs associated with them.

Therefore, any deviation from the standard network configuration for this purpose is to be signed off by

the Authority with an agreed rollback plan prior to implementation.

The process to be followed is outlined below:

Identify root cause of threshold crossing

o Use engineering judgement and make a determination if the threshold crossing requires

further investigation

o Review network statistics to identify the root cause

Review approved temporary solution options and use engineering judgement if the options will

address the performance or capacity issue being experienced

Propose solution and seek approval from the Telco Authority

Implement changes via the network change request process

Monitor network performance and capacity pre and post change implementation, confirm

changes and network behaviour is in line with expectations

Recommend additional changes if warranted

Post emergency incident or special event revert changes to standard configuration.

3.2.1 Agency operational modification

During periods of extended network performance degradation and/or congestion the Authority will

review the causes of congestion to establish the agencies and specifically talkgroups that have the

highest contribution to the traffic profile. Agencies will be requested to review their operational talkgroup

plan and where possible to optimise (minimise) their incident or special event talkgroup utilisation. Where

possible agencies should use the Emergency Service Organisation (ESO) talkgroups and limit the number

of talkgroups in use at a single incident through operational procedure or other means (multi groups or

agency groups).

3.2.2 Talkgroup re-prioritisation and reconfiguration

Another option is to review the agency talkgroup priorities currently configured for day to day use. The

deemed lead agency for a particular incident or special event can be given the highest priority for the

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 11

duration of that event (either top of queue or ruthless pre-emption). Changes to agency talkgroups’

priorities will be in accordance with each agencies Operational Talkgroup Plan.

3.2.3 Site preferencing

Site preferencing can be used to “steer” traffic between adjacent sites and it is only a contributing factor

to the overall site selection process. There are four parameters configurable into the subscriber radios.

They are:

Always preferred

Preferred

No preference

Least preferred.

The site quality metric is a combination of other factors including RSSI. As such using this method for

congestion relief is expected to yield minimal benefit and would require reprogramming of radios during

an emergency incident or prior to a special event. This method is only recommended for use where

significant overlapping coverage is present and traffic distribution between adjacent sites is poor.

However the overall capacity between sites is sufficient to support the current traffic profile.

3.2.4 Control channel power re-configuration

An alternative method to 3.2.1 - 3.2.3 to reduced traffic from an individual site is to reduce the control

channel transmit power. This technique should only be used where incident or special event specific

traffic will still be carried by the preferred serving site. This change will reduce the coverage footprint of

the site for call setups and will likely lead to a reduction in the overall GRN coverage footprint, the impact

of this should be carefully reviewed and all agencies should be notified prior to implementation.

3.2.5 Installing temporary capacity

Temporary capacity (additional base stations) are available for installation at existing GRN sites (inclusive

of the necessary multi-coupling) to provide temporary capacity relief at sites that are affected by a

prolonged emergency incident (i.e. large bush fire). The suitability of this solution is to be assessed by the

TA on a site by site basis including the co-ordination and assignment of the necessary frequencies to

support operation of the new base stations. Approval for usage of this solution to provide temporary

capacity relief is to be sought from the TA team prior to installation. The Authority is also responsible for the

co-ordination of spectrum through the SMO and installation (via the Network Manager).

4-3-2-27 Perf And Cap Mgmt Guideline (GRN) V1.0.Docx 12

4 Glossary and Acronyms

Term Definition

Spatial Aggregation

Spatial aggregation provides a definition between various levels of possible

aggregation relative to the topology of the network. Valid spatial aggregation

levels for the GRN are:

Site

Zone

Core

Network.

All reports should meet the above spatial aggregation requirements

Temporal

Aggregation

Defines the period over which the metric, KPI, counter etc is recorded, stored or

reported. Valid temporal aggregation values for the GRN network are:

15min (raw)

Hourly

Daily

Hourly

Monthly.

All reported data should meet the above aggregation definitions.

GRN Government Radio Network

NSWTA New South Wales Telecommunications Authority

OSS Operational Support Systems

KPIs Key Performance Indicators

talkgroup talkgroup is an assigned group on a trunked radio system

END OF DOCUMENT