IPCA Quality Awareness Technology White Paper

12
 iPCA Quality Awareness Technology White Paper

description

Quality Awareness

Transcript of IPCA Quality Awareness Technology White Paper

  • iPCA Quality Awareness Technology White Paper

  • iPCA Quality Awareness Technology White Paper

    1 IP/Ethernet Networks Cannot Measure Service Quality ..........................2

    1.1 iPCA Overview ...............................................................................................................2

    1.2 iPCA Benefits .................................................................................................................3

    2 iPCA Applications ..................................................................................3

    2.1 Device-Level Measurement ............................................................................................3

    2.2 Network-Level Measurement .........................................................................................5

    3 Traditional Service Quality Measurements..............................................8

    3.1 Factors Affecting Service Quality ....................................................................................8

    3.2 Quality Guarantee and Fault Location Difficulties on IP Networks ...................................9

    3.3 Problems in Traditional IP Network Quality Measurement Technologies .......................10

    4 Summary .............................................................................................11

    4.1 iPCA Identifies and Locates Problems ...........................................................................11

  • 11.1 iPCA Overview

    IP and Ethernet have been widely used as basic network technologies. Both IP and

    Ethernet networks are connectionless networks. Connectionless networks feature

    good scalability and service transparency but, in contrast to connection-oriented

    networks, can only identify ingress and egress of data flows and do not provide

    any connection information. Therefore, service quality monitoring and guarantees

    are difficult.

    Packet Conservation Algorithm for Internet (iPCA) provides quality measurement

    at the device and network levels. Device-level measurement can be implemented

    when all devices in an area are Huawei agile devices. In this case, Quality

    Awareness and accurate fault location can be provided for the entire network

    simply by enabling device-level measurement on each agile device. Agile device-

    level measurement encompasses cards, switch fabric units, and links, and can

    accurately detect any problems that affect user experience, no matter the cause.

    Packet conservation means that the number of packets leaving a system (network,

    link, device, or card) equals the number of packets arriving at the system. If data

    flows passing through a system comply with packet conservation, packet loss does

    not occur and packet transmission quality is ensured. iPCA monitors packet loss,

    which is a major factor affecting user experience on IP networks.

    Figure 1-1 iPCA diagram

    1 IP/Ethernet Networks Cannot Measure Service Quality

    Monitored system (network/device/card/link)

    Arriving packets

    Internally generated packets

    Absorbed packets

  • 2The iPCA quality measurement mechanism is simple. A monitored system is normal

    if the following condition is met: Number of packets arriving at the system +

    Number of internally generated packets = Number of packets departing the system

    + Number of packets absorbed by the system. If this condition is not met, it means

    some packets have been dropped. However, quality measurement is complex

    because many factors must be considered such as packet counter synchronization

    and specific situations of the monitored system. The detailed measurement process

    is outside the scope of this document.

    1.2 iPCA BenefitsiPCA helps you monitor network quality in real time to find and solve network

    problems in a timely manner. This technology ensures good user experience and

    quick identification of failure points. To implement device-level measurement, you

    only need to enable the device-level measurement capability on each device. To

    implement network-level measurement, you need to define a monitored domain

    and enable an iPCA instance. After you configure the iPCA domain and alarm

    thresholds, iPCA can detect problems in the domain. Then you can handle the

    problems before these problems degrade the user experience.

    When Huawei agile devices establish a network with Huawei non-agile devices

    or third-party devices, you can enable network-level measurement in addition to

    device-level measurement on the agile devices. In this way, the agile devices can

    monitor the quality of non-agile devices and the network while distinguishing

    the problems in agile and non-agile areas. To implement this function, familiarize

    yourself with iPCA implementation and properly define the monitored flows and

    package conservation domain.

    If iPCA does not cover the entire network or the alarm thresholds are not properly

    set, network problems are often discovered through user complaints but not real-

    time alarms. In this case, historical device-level or network-level measurement data

    provided by iPCA can be used to locate faults. For example, if you do not know where

    a fault has occurred, you can check iPCA records to find the possible failure points.

  • 32.1 Device-Level Measurement

    iPCA device-level measurement redefines a Multiple-Input-Multiple-Output (MIMO)

    system, and monitors MIMO line cards, links, and switch fabric units to measure

    transmission quality in each device as well as links between devices. It not only

    quickly measures network quality, but also identifies the specific line card, switch

    fabric unit, or link to replace, helping you quickly identify and handle network

    problems. After iPCA device-level measurement is deployed on a network, network

    administrators only need to check alarms against the predefined quality parameter

    thresholds.

    As shown in the following figure, iPCA monitors loss of incoming and outgoing

    packets in area 1 (ENP cards), area 2 (switch fabric units), and area 3 (links), to

    measure network quality and accurately identify failure points.

    In area 1, iPCA treats each ENP card as an independent MIMO area, and measures

    the packet loss ratio on each ENP card. Normally, the number of packets leaving an

    ENP card is equal to the number of packets arriving at the ENP card. If the number

    of outgoing packets is smaller than the number of incoming packets, packets have

    been dropped.

    The switch fabric units in area 2 are not programmable, but can be monitored

    using ENP cards. Each switch fabric unit is connected to multiple ENP cards.

    Figure 2-2 Typical networking for device-level measurement

    2 iPCA Applications

    1

    2

    3

    CPU

    Switch fabric

    ENP card ENP card ENP card Non-ENPcard

    Switch fabric

  • 4Packet loss on the switch fabric units and links between them can be measured by

    counting incoming and outgoing packets on the ENP cards.

    If all devices in the network or an area are Huawei agile devices, you only need to

    enable device-level measurement on the agile devices. Then the agile devices can

    provide Quality Awareness and accurate fault location.

    2.2 Network-Level Measurement

    iPCA network-level measurement applies to networks established by agile and non-

    agile devices. To implement network-level measurement, you should be familiar

    with iPCA implementation and be able to properly define the monitored domain.

    The following conditions must be met to implement network-level measurement:

    The monitored object is a network established by multiple network devices. The

    network can be a single-input-single-output system or a multiple-input-multiple-

    output system.

    The monitored flow must traverse the monitored network, and cannot be

    generated or terminated in the network.

    The edge devices (measurement systems) of the network must support iPCA.

    Typical networking for network-level measurement works in the following way:

    1. The access devices for video and VoIP services are Huawei agile devices.

    Network-level measurement is deployed to monitor the quality and faults of

    Huawei non-agile devices or third-party devices. In this way, problems in the

    Branch

    Headquarters

    iPCA device-level measurement is enabled

    eSight

    Branch

    WAN

  • 5agile and non-agile domains can be distinguished and end-to-end service quality

    can be monitored.

    Non-agile devices and agile devices connected to them form a MIMO domain.

    iPCA measurement is enabled on interfaces of Huawei agile devices. Based on the

    packet conservation principle, input traffic volume is the total number of packets

    sent from Huawei agile devices to non-agile devices in the domain, and output

    traffic volume is the total number of packets that Huawei agile devices receive

    from the non-agile devices. If input traffic volume is larger than output traffic

    volume, packets have been dropped in the domain.

    2. Measurement of the WAN (outside the campus network) is implemented as

    follows:

    Huawei agile devices at the edge of the WAN form a measurement domain.

    The agile devices monitor all the interfaces through which packets are sent to

    or from the WAN.

    The agile devices count the total number of incoming and outgoing packets

    to determine the domains packet loss ratio.

    Branch

    Headquarters

    Device with iPCA enabled

    eSight

    Branch

    WAN

  • 63.1 Factors Affecting Service Quality

    Packet loss: Packet loss is the most important factor affecting service quality. For

    Transmission Control Protocol (TCP) flows on an IP network, packet loss causes

    retransmission and fast TCP convergence, which greatly reduces transmission

    speed, increases response time, and lowers bandwidth utilization. For User

    Datagram Protocol (UDP) flows (mainly video and voice flows), packet loss severely

    affects service quality. Most quality issues in data, voice, and video services such

    as slow access speeds, delayed response, video pixelation, and fuzzy voice, are

    caused by packet loss. The following table summarizes the causes of packet loss.

    Table 3-1 Possible causes of packet loss

    3 Traditional Service Quality Measurements

    Types of Packet Loss Cause

    Controlled packet loss on network devicesNetwork devices drop some packets according to certain rules. When the network is properly planned and network devices are correctly configured, controlled packet loss is mainly triggered by network attacks.

    Packet loss in port queues: link bandwidth is insufficient

    ACL-triggered packet loss: the packets do not meet ACL rules, the ACL configuration is incorrect, or the network is attacked

    CAR-triggered packet loss: the traffic rate exceeds the CAR limit, the CAR configuration is incorrect, or the network is attacked

    Loss of packets with TTL of 0: a routing loop may exist on the network

    Packet loss caused by route absence: route calculation or routing configuration is incorrect, or the network is attacked

    Error packet loss: the configuration is incorrect or the network is attacked

    Unexpected packet lossPackets are dropped due to link or hardware failures. Such packet loss is uncontrollable, and packets of any priority may be dropped. Unexpected packet loss is the major factor that affects service quality.

    Small buffer: switches with small buffer sizes are used at incorrect positions and cannot handle heavy traffic on the network

    Link failure: optical fibers are broken, optical transceiver parameters are set incorrectly, or network cables are not properly connected

    Hardware failure: hardware components are aging or affected by harsh environments

  • 7Service Flows Impact of Insufficient Bandwidth

    TCP flow (data services, streaming media, desktop cloud )

    Transmission speed is low. Packets normally are forwarded on devices with large buffer sizes, while devices with small buffer sizes drop a large number of packets.

    UDP flow (video, IPTV, and voice services)

    A large number of service packets are dropped, and the network cannot deliver these services.

    Bandwidth: Network bandwidth is another fundamental element that affects

    services. A service cannot be provided if the available bandwidth is lower than

    the minimum bandwidth required by the service. Traditional Quality of Service

    (QoS) designs focus on allocation of link bandwidth. TCP flows dominant on an

    IP network adapt to network bandwidth. That is, TCP can reduce the traffic rate

    when bandwidth is sufficient, but insufficient bandwidth will not cause obvious

    packet loss in TCP flows if network devices have a large buffer size. However,

    insufficient bandwidth will cause severe packet loss in port queues for UDP flows

    that carry video and voice services. When this occurs, the network cannot deliver

    normal video and voice services. It should be noticed that sufficient bandwidth

    does not necessarily mean high service quality, because quality is also affected by

    other factors like network failures, misuse, or misconfigurations.

    Table 3-2 Impact of insufficient bandwidth on services

    Latency/jitter: Packet loss and retransmission are the major factors that cause

    latency and jitter in service transmission. A networks own latency and jitter is only

    significant to voice, video, or desktop cloud services that have high requirements

    for real-time transmission.

    End-to-end network latency = Signal transmission latency + Device

    forwarding latency + Port buffering latency: signal transmission and device

    forwarding latency are almost constant and only need to be measured at the

    early stage of network construction. If signal transmission and device forwarding

    latency cannot meet service requirements, you need to adjust network planning,

    including the latency on primary and backup paths. Real-time monitoring of signal

    transmission and device forwarding latency are not very helpful for improving

    service quality.

    The only factor that can cause changes in network latency is port buffering latency,

    which is mostly caused by network overloads. Variable port buffering latency

    depends on link loads. The only way to shorten variable latency is to avoid link

    overloads by properly planning traffic transmission.

    Timely identification and resolution of user experience issues: Service quality

    guarantees are important measures for identifying and resolving problems that

    affect user experience. A "problem" may include but is not limited to a failure. A

    failure is a problem that interrupts services. Actually, there are many problems that

  • 8will not cause service interruption but can degrade service quality, for example,

    packet loss on links, hardware-triggered packet loss, incorrect configuration,

    incorrect network planning, and network attacks.

    3.2 Quality Guarantee and Fault Location Difficulties on IP Networks

    On traditional connection-oriented networks such as Asynchronous Transfer Mode

    (ATM) and Frame Relay (FR) networks, each Virtual Channel (VC) has an identifier.

    Quality monitoring and guarantees and fault location are performed based on the

    Operation, Administration and Maintenance (OAM) settings of each VC.

    Currently, most IP networks use coarse-grained bandwidth management polices

    and do not have quality monitoring and guarantee mechanisms. As a result, the

    networks provide only connectivity and cannot ensure good user experience.

    However, the networks and network administrators are unaware of these issues

    because there is no system to monitor service quality on the entire network.

    Administrators try to locate network problems only after receiving complaints from

    users. Even then, it often takes a long time to locate and solve a problem due to

    lack of real-time monitoring mechanisms and effective problem location methods.

    This problem location process is inefficient and severely affects user experience.

    3.3 Problems in Traditional IP Network Quality Measurement Technologies

    Some technologies have been developed to monitor network quality and check

    connectivity on IP networks such as ping, BFD/NQA, and Y.1731. However, these

    technologies require point-to-point connections, and many connections need to be

    created to monitor transmission quality of all services. In addition, the application

    of these technologies is restricted by their disadvantages such as inconsistent

    service paths and small scope of fault detection. For these reasons, they are mainly

    used for quality monitoring or fault location between a few nodes or on private

    links, and cannot provide effective quality measurement and fault detection.

    Traditional network quality measurement technologies have the following problems:

    N 2 connection issue of point-to-point monitoring: N nodes on a network must

    set up Nx(N-1)/2 bidirectional connections or Nx(N-1) unidirectional connections.

    When there are many nodes on a network, network expansion is difficult.

  • 9Inconsistent service paths: If load balancing functions such as Eth-Trunk+VSS

    and ECMP are configured, NQA/Y.1731 packets are not transmitted over the

    same paths as the service packets. Therefore, NQA/Y.1731 cannot accurately

    monitor service flows.

    Inability to detect some failure points: For example, NQA and BFD probe packets

    are transmitted along the path of CPU control channel uplink forwarding

    channel uplink interface. Service packets, however, are transmitted along the

    path of access interface access forwarding channel switch fabric uplink

    forwarding channel uplink interface. Therefore, NQA and BFD cannot detect

    failures of access links, access interface cards, or switch fabric units. Measured

    latency and jitter are mostly caused by CPU processing latency, but not real

    network latency and jitter.

    Inability to simulate real service traffic through out-of-band measurement: NQA

    and BFD use out-of-band traffic to simulate service traffic, but out-of-band traffic

    is only a sample of real service traffic and is much slower than real service traffic.

    Therefore, NQA and BFD can only detect network disconnections or severe

    packet loss. Increasing out-of-band traffic only worsens network congestion.

    4.1 iPCA Identifies and Locates ProblemsiPCA technology provides quality and problem detection capabilities in connectionless

    networks. If packet loss is detected in an iPCA domain, network devices can trigger

    alarms according to the preconfigured alarm threshold (packet loss ratio and

    duration). In this way, the fault location scope is narrowed from the entire network to

    an iPCA domain, greatly improving the efficiency of fault location and rectification. To

    analyze the specific cause of packet loss, network administrators need to check the

    packet counters of iPCA and other device information.

    iPCA significantly improves fault location efficiency while ensuring network quality

    and user experience. Network administrators must have certain network knowledge

    and fault location experience to find the specific causes of network problems.

    4 Summary

  • Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved.

    No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

    General Disclaimer

    The information in this document may contain predictive statements including,

    without limitation, statements regarding the future financial and operating results,

    future product portfolio, new technology, etc. There are a number of factors

    that could cause actual results and developments to differ materially from those

    expressed or implied in the predictive statements. Therefore, such information

    is provided for reference purpose only and constitutes neither an offer nor an

    acceptance. Huawei may change the information at any time without notice.

    Trademark Notice

    , HUAWEI, and are trademarks or registered trademarks of Huawei Technologies Co., Ltd.

    Other trademarks, product, service and company names mentioned are the property of their respective owners.

    HUAWEI TECHNOLOGIES CO., LTD.

    Huawei Industrial Base

    Bantian Longgang

    Shenzhen 518129, P.R. China

    Tel: +86-755-28780808

    Version No.: M3-032102-20140218-C-1.0

    www.huawei.com