Voice Quality Impairments Detection. Recommended Call Quality Metrics.

download Voice Quality Impairments Detection. Recommended Call Quality Metrics.

of 4

Transcript of Voice Quality Impairments Detection. Recommended Call Quality Metrics.

  • 8/12/2019 Voice Quality Impairments Detection. Recommended Call Quality Metrics.

    1/4

    Voice Quality Impairments Detection

    IntroductionThe purpose of this document is to define a vocabulary that can be used to discuss

    symptoms of voice quality problems detection.

    This document is intended to be a living resource in that the detection of symptoms listed are

    expected to be revised as new problems arise and additional information becomes available.

    Signal-to-noise ratio

    Signal-to-noise ratio (often abbreviated SNR or S/N) is a measure used in science and engineering that comparesthe level of a desired signal to the level of background noise. It is defined as the ratio of signal power to the

    noise power. A ratio higher than 1:1 indicates more signal than noise. While SNR is commonly quoted for

    electrical signals, it can be applied to any form of signal (such as isotope levels in an ice core or biochemical

    signaling between cells).

    The signal-to-noise ratio, the bandwidth, and the channel capacity of a communication channel are connected by

    the ShannonHartley theorem.

    Signal-to-noise ratio is sometimes used informally to refer to the ratio of useful information to false or irrelevant

    data in a conversation or exchange. For example, in online discussion forums and other online communities,

    off-topic posts and spam are regarded as "noise" that interferes with the "signal" of appropriate discussion.

    Telecommunication systems strive to increase the ratio of signal level to noise level in order to effectively

    transmit data. In practice, if the transmitted signal falls below the level of the noise (often designated as the

    noise floor) in the system, data can no longer be decoded at the receiver. Noise in telecommunication systems is

    a product of both internal and external sources to the system

    We recommend SNR to be not lower than 25dB in speech signal.

    Absolute silence

    This type of impairment relates to silence between speech whenone cannot recognize whether the

    other person is still there because there is no sound on the line.

    A common cause for this problem is Voice Activity Detection (VAD) without comfort noise. In order to

    experience this symptom, usually the background noise is loud enough for the silence insertion to be

    noticeable but soft enough so that VAD is engaged.

    Silence appearing during a phone call is considered an artifact that is associated with connection loss.

    Therefore one can set energy threshold on a frame and when it goes below threshold value one starts

    Copyright Sevana, 2013

    Sevana Oy Sevana OAgricolankatu 11 Rohtlaane 1200530 Helsinki 76911 Huuru kulaFinland Estonia (Harjumaa)Phone: +358 9 2316 4165 Phone: +372 53485178

  • 8/12/2019 Voice Quality Impairments Detection. Recommended Call Quality Metrics.

    2/4

    calculating duration of the silent fragment of the signal. If one receives a loud frame then the

    counter for silence is reset to zero. If counter value becomes greater than f.e. 1 second then we can

    notify about detecting absolute silence impairment and quality loss due to silent fragments..

    Loudness

    This impairment is related to too loud or too silent calls. In case signal energy changes significantly

    one may consider the usual call quality has been degraded. Most recent version of the library works

    together with VAD to detect too loud call fragments by calculating average energy values from active

    signal fragments (frames), and when the average exceed predefined threshold this indicates that the

    signal is too loud.

    Amplitude clipping

    Amplitude clipping impairment or the so called buzziness is related to the fact if the signal amplitude

    is too high at some point along the analog voice path, when the voice signal is converted to a digital

    form amplitude clipping can occur. Users report that speech may seem excessively loud and potentially"buzzy" or "fuzzy". One can find a sample of amplitude clipped audio at this link:

    http://www.voiptroubleshooter.com/sound_files/amplitude_clipping.wav

    In case amount of clipped samples is higher than 2% the audio quality gets considerably lower:

    1) To take integral result over a frame one must check dClpLevel and dClpLevelWide. We may consider

    single clipped frames as non-critical impairments for overall quality, because it may be due to energy

    normalization only. However, if at the same time we have clipped sequences of samples then we face

    real clipping impairment. We must alert if dClpLevel > 2% and dClpLevelWide > 0. One can also set a

    threshold for dClpLevelWide, f.e. 5% from dClpLevel.

    2) For real-time monitoring one should check dFrameClpLevel&dFrameClpLevelWide and

    dFlyClpLevel&dFlyClpLevelWide. If we have clipping on a single frame and there are no clipped samples

    to the left and to the right from it, this impairment we may identify as a click. Temporary quality

    degradation is characterized by clipping in longer parts of the signal. These parts one may

    characterize as significant increase of the input signal loudness.

    Clicking

    Clicking impairment is related to a short time period energy increase - click. If clicks appear more often

    than in 3-5 seconds then we have audio quality degradation, what should cause a clicking alert.

    Stuck

    Stuck means appearance of a relatively constant amplitude level of the signal. Stuck signal one

    percepts as absolute silence, what is not typical for speech. Depending on energy change it may also

    be percepted as a click. We recommend to set the same threshold for Stuck as for Clicking: not more

    than 1 stuck impairment during 3-5 seconds. However, if stuck duration is more than 10% of the whole

    audio this is also a signal of significant quality degradation with stuck impairment.

    VAD clipping

    This impairment detects incorrect work of Voice Activity Detector (VAD). Detector finds edges of

    active and inactive fragments of the signal considering VAD worked too late (in the beginning of the

    speech) or too early (in the end of the speech).

    Copyright Sevana, 2013

    Sevana Oy Sevana OAgricolankatu 11 Rohtlaane 1200530 Helsinki 76911 Huuru kulaFinland Estonia (Harjumaa)Phone: +358 9 2316 4165 Phone: +372 53485178

    http://www.google.com/url?q=http%3A%2F%2Fwww.voiptroubleshooter.com%2Fsound_files%2Famplitude_clipping.wav&sa=D&sntz=1&usg=AFQjCNFAT2WhjgjflmcbN-hyAeQ3pt73vg
  • 8/12/2019 Voice Quality Impairments Detection. Recommended Call Quality Metrics.

    3/4

    Let us calculate number of changes of VAD (i.e. voice/no voice) and consider that number is X, then

    the following formular

    100 * dNumClpFrames/X

    calculates a metric, which should not exceed 10% for acceptable speech quality.

    Echo

    Signal reflection (echo) occurs when a signal is transmitted along a transmission medium, such as a

    copper cable or an optical fiber. Some of the signal power may be reflected back to its origin rather

    than being carried all the way along the cable to the far end. This happens because imperfections in

    the cable cause impedance mismatches and non-linear changes in the cable characteristics. These

    abrupt changes in characteristics cause some of the transmitted signal to be reflected. The ratio of

    energy bounced back depends on the impedance mismatch. Mathematically, it is defined using the

    reflection coefficient.

    In telecommunications, the reflection coefficient is the ratio of the amplitude of the reflected wave to

    the amplitude of the incident wave. In particular, at a discontinuity in a transmission line, it is the

    complex ratio of the electric field strength of the reflected wave ( ) to that of the incident wave (

    ). This is typically represented with a (capital gamma) and can be written as:

    The reflection coefficient may also be established using other field or circuit quantities.

    The reflection coefficient can be given by the equations below, where is the impedance toward

    the source, is the impedance toward the load:

    Notice that a negative reflection coefficient means that the reflected wave receives a 180, or ,

    phase shift.

    The absolute magnitude (designated by vertical bars) of the reflection coefficient can be calculated

    from the standing wave ratio, SWR:

    The reflection coefficient range is from -1 to +1

    There are two algorithms to detect echo implemented: correlation based and echo compensation

    based. Selecting one of them is possible during library compilation.

    In case of echo compensation based algorithm one should compare echo energy versus signal energy

    and if echo energy is more than 20% from the signal energy then we detect echo presence in the

    speech signal. One can also check VAD and compare energy values only when VAD is active.

    Copyright Sevana, 2013

    Sevana Oy Sevana OAgricolankatu 11 Rohtlaane 1200530 Helsinki 76911 Huuru kulaFinland Estonia (Harjumaa)Phone: +358 9 2316 4165 Phone: +372 53485178

  • 8/12/2019 Voice Quality Impairments Detection. Recommended Call Quality Metrics.

    4/4

    In case of correlation based algorithm one should consider similarity. We can say echo is present if

    correlation is higher than 0.7 and at the same time check signal energy level: in case its low then

    echo is not present (false positive). One can also consider VAD instead of energy.

    Appendix 1: Audio compatibilityWe understand acceptable audio quality level if the following conditions are applied to the analyzed

    audio:

    Average loudness varies from -30dB to 0dB

    Number of clipped audio samples does not exceed 2%

    SNR is at least 20dB for regular calls and at least 24dB in case of using a loudspeaker

    Non-speech signal presence is strictly prohibited

    Speech tempo is maximum of 250%

    Presence of noise reduction and echo compensation algorithms allowed only if comply with

    Loudness, clipping and SNR requirements.

    Appendix 2: Call quality metrics table

    (recommended)

    Metric Units Max Min Critical Major Minor Warning Excellent

    Mean

    OpinionScore

    (audio,

    Sevana

    AQuA)

    - 5 1