IKAR Lab 3

1

WWW.SPEECHPRO.COM

Authenticity verification | Identification of persons | Speech recognition* | Sound cleaning

A new generation

IKAR Lab 3of the forensic audio analysis sostware and hardware suite

2 *available for Russian, English, Spanish, Kazakh, and Arabic

3

The new generationof IKAR Lab 3

Reliable methods for assessing the authenticity and examinability of an audio recording

Quick access to the audio tracks of video recordings

Reliable, high-speed identification of persons, groups, and the lines spoken by each

Processing of large volumes of audio recordings

New, fast, and precise methods for sound cleaning

Meets the world-class requirements of leading forensic experts: it provides for a highly reliable evidence base and automates the process as much as possible, speeding up the work and simplifying the process.

All operations are performed in a unified work environment with a user-friendly interface. This enables specialists to focus on decision making and providing their expert assessments promptly.

4

Main components

SIS 5Audio forensic software

Sound Cleaner 20Noise reduction and audio enhancement software

CaesarAudio transcription software 22

STC-H246Audio hardware 23

5

SISAudio forensic software

SIS is the core software of the IKAR Lab 3 forensic audio kit. It includes powerful tools for speech signal research and enhanced speech visualisation and analysis, including speech segmentation, text transcription, automatic and semi-automatic identification tools and many others.

Methods• Visualisation

• Editing and processing

• Detecting speech and noises

• Text transcription and speech segmentation

• Separation of speakers in a dialogue/polylogue

• Multi-window interface

• Signal comparison

• Signal analysis

• Managing projects and creating reports

• Identification

• Automatic comparison

• Comparison of formants

• Pitch comparison

• Identification Wizard

• Overall conclusion

• Analysis of an audio recording extracted from a video file

• EdiTracker and the diagnostic module

Audio recording research for speaker identification based on speech samples(formants and pitch of a speech signal analysis)

SIS

| Met

hods

6

Visualisation

Editing and processing

Detecting speech and noises

The algorithms used for the spectral representation of the signal ensure the highest possible quality and clarity of visible speech. The user selects the optimal display parameters on the fly or uses presets for various types of spectral analysis.

SIS provides a wide variety of expert editing and signal processing tools that improve the intelligibility of recorded speech and prepare audio recordings for further analysis.

The speech detector automatically marks speech fragments in the audio signal that are suitable for identification. The module can also be configured to detect noisy areas: dial tones, clipped fragments and clicks.

• Waveform

• FFT and LPC spectograms

• Medium and instant spectrum

• Cepstogram

• Autocorrelation

• Pitch Extractor

• Formants extractor

• Signal energy

• Histogram and histogram correlation

• Amplitude normalisation

• Linear transformation

• DC Offset Suppression

• Mixing

• Modulation

• Tempo correction*

• Resampling

• Bit depth conversion

• Stereo separation and merging two mono signals to stereo

• Phase change

• Adaptive inverse filter

• Adaptive tone suppressor

• Adaptive broadband noise filter

*Without pitch distortion

SIS

| Met

hods

7

Text transcription and speech segmentation

Separating speakers in a dialogue/polylogue

In manual mode, selected audio fragments can easily be assigned to particular categories (e.g., different speakers, sounds or noises) with text comments while the general text will be exported to MS Word. If there are two files of transcribed text, the programme can automatically search for all matching words in the audio recordings compared.

The module automatically marks lines according to speakers. Its reliability is up to 95% with a signal-to-noise ratio of at least 20 dB and the duration of each speaker's speech of at least 16 seconds.

SIS

| Met

hods

The speech-to-text plugin allows to automatically obtain the text content of a speech signal of an audio recording in Russian, English, Spanish, Kazakh, and Arabic. Additionally, the transcription is accompanied by word-to-word segmentation indicating the location of spoken words. This functionality allows the expert to work effectively with large amounts of audio recordings.

Using built-in algorithms, the module allows segmentation of the lines spoken by up to 5 speakers.

NEW

NEW

Automatic text transcription with the segmentation of lines spoken by speakers

8

Multi-window interface

Signal comparison

SIS allows several audio files to be opened in one or several windows at the same time. The windows can be positioned according to a particular task: vertically for identification purposes or horizontally to compare copies of audio recordings or the various sound cleaning options. Signals can be opened in several layers in one window, and their colours and transparency can be changed for better visualisation.

Windows can be connected according to time and spectral domain, which makes measurement easier using vertical and horizontal cursors. The instant spectrum can be overlaid for better visual comparison.Pitch histograms can be compared visually or numerically using values of minimum, maximum, median, asymmetry and general correlation.

Working with audio recordings in a multi-window interface

SIS

| Met

hods

9

Signal analysis

Working with projects and creating reports

Identification

SIS automatically calculates the signal characteristics, based on which the expert arrives at a conclusion if the recording is suitable for the identification analysis.

IKAR Lab 3 organises the expert's workflow efficiently. The project opens files that are related to examination directly from SIS, whether they are audio, text, video or photographic files. These files and identification results can be saved in a structured way, as can reports created in MS Word. The report can be supplemented with information on the settings for illustrations and visible representations of speech, screenshots of the working screen or its area.

This unique tool based on biometric algorithms and expert modules is made to automate and formalise the processes involved in audio forensics identification research: searching for comparable words and sounds, selecting sounds and melodic fragments to be compared,comparing speakers’ formants and pitches, and performing speech analysis. The results are presented as numerical indicators to contribute to the overall identification conclusion.

• Frequency response

• Signal-to-noise ratio

• Reverberation time

• Clipping and tonal noises

• Clear speech duration

Signal characteristics assessment

SIS

| Met

hods

10

Automatic comparison

The module’s machine learning process involved tens of thousands of speakers to make the engine train on the audio recordings made by speakers of different genders, ages, ethnicities, and languages. The varied types of speech material were captured in various channels and in multiple sound recording sessions. The high reliability of the biometric engine has been confirmed in NIST testing.

More methods of comparison: cxvector (a development on xvector) is used as the main method, and, in addition, smart-speaker and gen6-v3 (when the clear speech content in an audio recording is from 1.5 to 5 seconds). The new functionality offers faster and more secure identification.

NEW

The module performs 1:1 voice signal comparison. The method it uses depends on the speech signal characteristics of the audio recordings studied. All results are based on the extraction of voice biometric traits and calculations regarding their similarity.

Automatic identification results

SIS

| Met

hods

11

Comparison of formants

The process of comparing formants with the module involves two stages.

Additional features:

• Visual comparison of selected sounds on a vowel chart

• Comparison of the average formant values for selected sounds of two speakers

• Specifying words or triads as textual comments on reference fragments

• Exporting tables of reference fragments and results to MS Word

1. Search and selection of reference sound fragments for known and unknown speakers:

• using the scatter plot with vowel triangle and highlighting the searching area

• specifying the frequency range of formants search

• by the position of horizontal marks indicating the limits in hertz and percentage

• using a graphical vowel chart

2. Expert comparison. The module automatically calculates FR, FA and LR for the sounds selected and decides whether the outcome of identification is positive, negative or undefined

Speaker identification using the expert method of formant comparison

SIS

| Met

hods

12

Pitch comparison

Identification wizard

The pitch comparison module compares the specificities of speakers’ melodic patterns. The module enables melodic fragments to be selected, attributes them to 1 of 18 possible melodic types and compares them according to 15 parameters, including maximum, average and minimum pitch values, rate of pitch change, skewness, kurtosis and others.

The algorithm generates results in the form of a match percentage for each parameter and delivers an overall identification/elimination conclusion or an inconclusive result. All data can be easily exported as text reports.

This plugin offers a step-by-step identification process, displays the stages of research, and visualises the results for any comparison made.

Speaker identification using pitch comparison

SIS

| Met

hods

13

Overall conclusion

Analysis of an audio track extracted from a video

The outcome of each method can be saved in a given project. The programme is designed to bring the results from each module into account when making an overall conclusion. The expert can adjust the relative weight of each method in the overall conclusion or their significance can be automatically assigned through a calculation of the qualitative and quantitative characteristics of the audio recordings being compared. Based on the results, the expert can automatically generate a detailed report.

With the new SIS method, the expert gets immediate access to the audio track of a video file without requiring any additional editors. Just, upload the video file and SIS will automatically extract the audio track from the video and open it in a separate window.

The module allows work to be simultaneously done on a video in the video player and an audio track in the editor. The video and the sound are synchronised, and the video is automatically modified while the audio track is being edited.

NEW

Extraction and analysis of the audio track from a video

14

EdiTracker

The plugin performs diagnostics of the authenticity of analogue and digital audio recordings and greatly simplifies expert analysis using SIS by providing the user with manual and automatic analysis methods.

EdiTracker analysis methods

• Specifying the recording device parameters

• Identifying traces of previous digital signal processing

• Auditory analysis

• Detecting traces of tampering through phase shifts in the harmonics and phase scanning

• Scanning background noise

Authenticity check for the use of digital preprocessing of the audio recording

SIS

| Met

hods

15

Every analogue recording device has unique characteristics, such as frequency response, total harmonic distortion, pitch variation, effective frequency range, tempo deviation, etc.

EdiTracker automatically assesses these characteristics using a test signal. A mismatch between recording device parameters and characteristics of a signal allegedly recorded with that unit may be an indication of tampering.

Digital processing of analogue signals always requires a specific sample rate. During the digitising process, a phenomenon known as aliasing occurs. Aliasing degrades the audio quality as high-frequency components are superimposed on low-frequency ones.

The vast majority of analogue-to-digital and digital-to-analogue converters use anti-aliasing filters. EdiTracker automatically detects traces of such filters, the presence of which may suggest that the audio has been digitised.

EdiTracker automatically scans audio for technical narrow-band signals which normally come from an electrical network (ENF), batteries, nearby electrical appliances, etc.,

and estimates their phase continuity. An unjustified phase break can be interpreted as potential evidence of audio editing.

Specifying the recording device parameters

Identifying traces of digital preprocessing

Detecting traces of tampering through phase shifts in the harmonics

SIS

| Met

hods

16

Background scanning detects dramatic changes in the spectrum that are unnoticeable on the waveform and which may be signs of audio editing.EdiTracker also automatically scans the integrity of background noises and marks any abrupt change in noise level.

During the playback of an original audio recording, the entirety of the audio communication—including verbal and nonverbal speaker output and additional background interference—come together to form a complete and integrated picture of the audio and speech environment. Auditory analysis of these events based on the known characteristics of the recording

equipment and methods used can reveal possible violations in the intergrity of the overall audio picture and identify the location, facts and methods of such violations.EdiTracker provides an extended list of auditory and linguistic indicators that may indicate breaches in the authenticity of a recording. These resources can be used to create a textual report.

Scanning background noise

Auditory analysis

Authenticity check for hidden editing of an audio recording based on the uniformityof the background noise

SIS

| Met

hods

17

The spoofing detector searches for traces of spoofing attacks in the audio recording, such as replays, speech synthesis and voice disguising. This algorithm is based on a neural network trained on various types of spoofing. As a result, it can conclude whether or not the audio recording is masquerading as the authentic recording of a speaker.

Spoofing detection

Diagnostic module

A new SIS module for a more reliable assessment of the authenticity and examinability of an audio recording. The module detects various signal features that explain the nature of its origin or possible processing methods, which may either be unknown or deliberately hidden. In addition to EdiTracker, it detects the application of certain operations on a signal using the following methods:• Spoofing detection• DC offset analysis• Analysis of A/μ encoding traces• Analysis of MP3 encoding traces

NEW

Expert spoofing detection analysis

SIS

| Met

hods

18

This module analyses the audio recording to identify any dramatic change in DC offset, as this may be a sign of integrity violation. If such a violation is detected, the module highlights the corresponding areas.

This module analyses the audio recording to detect areas with signs of A/μ encoding. The possibility that an audio recording has been processed using these codecs is not indicated by the recording format. In the event of the detection of such coding, the module highlights the corresponding areas or the entire audio recording.

DC offset analysis

Detection of A/μ coding

Detecting the disturbance of DC offset uniformity iin two areas in the audio recording

Detecting A/μ coding areas

SIS

| Met

hods

19

This module analyses the audio recording to identify signs of MP3 coding. The possibility that an audio recording has been processed using this codec is not indicated by the recording format. In the event of the detection of such MP3 coding, the module displays a message describing the signs detected. Additionally, spectrograms, graphs and histograms are displayed, explaining the decision made by the algorithm.

Detection of MP3 coding

Detecting MP3 coding

SIS

| Met

hods

20

Sound CleanerNoise reduction and audio enhancement software

Usually, the examination of audio recordings requires the creation of a verbatim record, or transcription, thereof. Since audio recordings obtained in an operational context are often recorded in difficult conditions and not readily intelligible, the first step is to clean the sound of noise. To do this, the IKAR Lab 3 suite is optionally equipped with Sound Cleaner. It includes modern signal processing algorithms that are effective at suppressing broadband noise, tonal interference and pulses, while performing frequency response correction, equalising the signal, etc.

To determine the characteristics of noise and interferences, it is possible to build spectrograms, including ones in 3D FFT format. This increases the speed and accuracy of noise reduction.

NEW

A waveform and stereo signal processing scheme with filters and parameters pre-set by an expert

Simultaneous display of the spectrogram and waveform for quick assessment of signal and frequency processing needs

Soun

d Cl

eane

r

21

All filters work in real time — the result can be heard immediately after the filter has been added to the processing chain so that the user can select the optimal parameters by ear.

Sound Cleaner saves its processing results in WAV format and automatically generates comprehensive textual reports that log the process. The programme is compatible with any sound editors using VST 3 format.

STC Auto FilterSignificantly reduces the level of the most common types of noise using a single controller.

Cell phone Noise FilterReduces interference from the characteristic intermittent sounds of incoming mobile phone calls.

Broadband Noise Filter Reduces the noise level from rooms and streets and interference from communication channels or recording equipment. Such noises take the form of a hiss and cannot be suppressed by other methods, since the interference spectra intersect or coincide with the useful signal spectra.

Inverse FilterEqualises the frequency response of the communication channel in which the recording was created. The filter has two settings: amplification of the weak and suppression of the strong spectral components of the signal (flattening the average spectra).

Tone SuppressorSuppresses the stationary narrow-band and regular interferences (vibrations, network pickup, noises from home appliances, slow music, the sound of a car passing, noise from water or a room, reverberation, etc.).

Reverb SuppressorIncreases the intelligibility of speech, decreases the level of reverberation in recordings and reduces user fatigue by making the perception of a useful reverberated speech signal easier despite the presence of additional noise.

Click SuppressorAutomatically restores the speech or music signals distorted by pulse noises (clicks, radio noises, knocks, crackles, etc.).

Dynamic Range control Improves intelligibility when large drops occur in signal level. For example, amplifying a weak signal and suppressing a strong signal to balance the amplitude of the output signal.

EqualiserA 4096-band graphic equaliser with a built-in spectrograph for detailed spectrum correction in distorted recordings.

Clip RestorerRestores overloaded recording fragments by reconstructing their waveforms.

Reference Noise SuppressorSuppresses any noise from the main channel present in the reference channel (for example, TV or radio broadcastings, music, etc).

DTMF SuppressorProcesses the phone dialling signals, which is a sequence of short rectangular pulses with dual-frequency filling.

Soun

d Cl

eane

r

22

CaesarAudio recording transcription module

The module is designed to produce a verbatim transcription of recorded speech. The text is output into MS Word and automatically synchronised with the audio recording. This process simplifies the search for the corresponding audio fragment and text editing. The ability to playback the recording and transcribe it in a single, offline interface makes the expert’s work easier.

Audio recording text transcription

Caes

ar

23

STC-H246Audio hardware

To guarantee the high quality of input and output signals, the IKAR Lab 3 suite is equipped with a professional STC-H246 audio hardware device.

STC-H246 is perfect for setting up a workstation for digitising analogue audio recordings.The device is designes to measure parameters and generating electrical signals in the audio frequency range.

Parameter Value

Sampling rate 8–200 kHz

Resolution ADC/DAC 16-bit, 24-bit

Signal-to-noise ratio in the end-to-end channel, in the frequency band from 20 to 20 kHz

105 dB

Input/output channel connector types XLR, RCA, S/PDIF, TRS 6.3

Number of channels 2

Power 110/220 V 60/50 Hz

Case Metal

Size 111×166×190 mm

OS Windows 7, 8, 10

STC-

H24

6

24

Speech Technology Center is a global developer of products and solutions based on intelligent speech technologies and voice and facial biometrics. Over the last 30 years, we have accumulated highly sought-after expertise in artificial intelligence and machine learning.

We hold a leading position in the NIST, ASVspoof Challenge, VOiCES and CHiME world ratings. So far, we have implemented projects for more than 5,000 clients in 75 countries.

Many of our solutions have been applied in the public and commercial sectors to great advantage, from small expert laboratories to complex national security systems all around the world, includ-ing those in the USA, Latin America, the Middle East and Europe.

About the company

Images in this document are for example purposes only and may differ from the actual product. Full product specifications are available on request. Speech Technology Center Limited reserves the right to modify product characteristics and/or withdraw any product without prior notice

St. PetersburgVyborgskaya Embankment 45, Bldg. EUnit 1-Н, Off. 133, 194044 +7 812 325 8848 [email protected]

Moscow3 Marksistskaya St., Bldg. 2109147+7 495 669 [email protected]

IKAR Lab 3

Documents

Transcript of IKAR Lab 3