IKAR Lab 3
Transcript of IKAR Lab 3
1
WWW.SPEECHPRO.COM
Authenticity verification | Identification of persons | Speech recognition* | Sound cleaning
A new generation
IKAR Lab 3of the forensic audio analysis sostware and hardware suite
2 *available for Russian, English, Spanish, Kazakh, and Arabic
3
The new generationof IKAR Lab 3
Reliable methods for assessing the authenticity and examinability of an audio recording
Quick access to the audio tracks of video recordings
Reliable, high-speed identification of persons, groups, and the lines spoken by each
Processing of large volumes of audio recordings
New, fast, and precise methods for sound cleaning
Meets the world-class requirements of leading forensic experts: it provides for a highly reliable evidence base and automates the process as much as possible, speeding up the work and simplifying the process.
All operations are performed in a unified work environment with a user-friendly interface. This enables specialists to focus on decision making and providing their expert assessments promptly.
4
Main components
SIS 5Audio forensic software
Sound Cleaner 20Noise reduction and audio enhancement software
CaesarAudio transcription software 22
STC-H246Audio hardware 23
5
SISAudio forensic software
SIS is the core software of the IKAR Lab 3 forensic audio kit. It includes powerful tools for speech signal research and enhanced speech visualisation and analysis, including speech segmentation, text transcription, automatic and semi-automatic identification tools and many others.
Methods• Visualisation
• Editing and processing
• Detecting speech and noises
• Text transcription and speech segmentation
• Separation of speakers in a dialogue/polylogue
• Multi-window interface
• Signal comparison
• Signal analysis
• Managing projects and creating reports
• Identification
• Automatic comparison
• Comparison of formants
• Pitch comparison
• Identification Wizard
• Overall conclusion
• Analysis of an audio recording extracted from a video file
• EdiTracker and the diagnostic module
Audio recording research for speaker identification based on speech samples(formants and pitch of a speech signal analysis)
SIS
| Met
hods
6
Visualisation
Editing and processing
Detecting speech and noises
The algorithms used for the spectral representation of the signal ensure the highest possible quality and clarity of visible speech. The user selects the optimal display parameters on the fly or uses presets for various types of spectral analysis.
SIS provides a wide variety of expert editing and signal processing tools that improve the intelligibility of recorded speech and prepare audio recordings for further analysis.
The speech detector automatically marks speech fragments in the audio signal that are suitable for identification. The module can also be configured to detect noisy areas: dial tones, clipped fragments and clicks.
• Waveform
• FFT and LPC spectograms
• Medium and instant spectrum
• Cepstogram
• Autocorrelation
• Pitch Extractor
• Formants extractor
• Signal energy
• Histogram and histogram correlation
• Amplitude normalisation
• Linear transformation
• DC Offset Suppression
• Mixing
• Modulation
• Tempo correction*
• Resampling
• Bit depth conversion
• Stereo separation and merging two mono signals to stereo
• Phase change
• Adaptive inverse filter
• Adaptive tone suppressor
• Adaptive broadband noise filter
*Without pitch distortion
SIS
| Met
hods
7
Text transcription and speech segmentation
Separating speakers in a dialogue/polylogue
In manual mode, selected audio fragments can easily be assigned to particular categories (e.g., different speakers, sounds or noises) with text comments while the general text will be exported to MS Word. If there are two files of transcribed text, the programme can automatically search for all matching words in the audio recordings compared.
The module automatically marks lines according to speakers. Its reliability is up to 95% with a signal-to-noise ratio of at least 20 dB and the duration of each speaker's speech of at least 16 seconds.
SIS
| Met
hods
The speech-to-text plugin allows to automatically obtain the text content of a speech signal of an audio recording in Russian, English, Spanish, Kazakh, and Arabic. Additionally, the transcription is accompanied by word-to-word segmentation indicating the location of spoken words. This functionality allows the expert to work effectively with large amounts of audio recordings.
Using built-in algorithms, the module allows segmentation of the lines spoken by up to 5 speakers.
NEW
NEW
Automatic text transcription with the segmentation of lines spoken by speakers
8
Multi-window interface
Signal comparison
SIS allows several audio files to be opened in one or several windows at the same time. The windows can be positioned according to a particular task: vertically for identification purposes or horizontally to compare copies of audio recordings or the various sound cleaning options. Signals can be opened in several layers in one window, and their colours and transparency can be changed for better visualisation.
Windows can be connected according to time and spectral domain, which makes measurement easier using vertical and horizontal cursors. The instant spectrum can be overlaid for better visual comparison.Pitch histograms can be compared visually or numerically using values of minimum, maximum, median, asymmetry and general correlation.
Working with audio recordings in a multi-window interface
SIS
| Met
hods
9
Signal analysis
Working with projects and creating reports
Identification
SIS automatically calculates the signal characteristics, based on which the expert arrives at a conclusion if the recording is suitable for the identification analysis.
IKAR Lab 3 organises the expert's workflow efficiently. The project opens files that are related to examination directly from SIS, whether they are audio, text, video or photographic files. These files and identification results can be saved in a structured way, as can reports created in MS Word. The report can be supplemented with information on the settings for illustrations and visible representations of speech, screenshots of the working screen or its area.
This unique tool based on biometric algorithms and expert modules is made to automate and formalise the processes involved in audio forensics identification research: searching for comparable words and sounds, selecting sounds and melodic fragments to be compared,comparing speakers’ formants and pitches, and performing speech analysis. The results are presented as numerical indicators to contribute to the overall identification conclusion.
• Frequency response
• Signal-to-noise ratio
• Reverberation time
• Clipping and tonal noises
• Clear speech duration
Signal characteristics assessment
SIS
| Met
hods
10
Automatic comparison
The module’s machine learning process involved tens of thousands of speakers to make the engine train on the audio recordings made by speakers of different genders, ages, ethnicities, and languages. The varied types of speech material were captured in various channels and in multiple sound recording sessions. The high reliability of the biometric engine has been confirmed in NIST testing.
More methods of comparison: cxvector (a development on xvector) is used as the main method, and, in addition, smart-speaker and gen6-v3 (when the clear speech content in an audio recording is from 1.5 to 5 seconds). The new functionality offers faster and more secure identification.
NEW
The module performs 1:1 voice signal comparison. The method it uses depends on the speech signal characteristics of the audio recordings studied. All results are based on the extraction of voice biometric traits and calculations regarding their similarity.
Automatic identification results
SIS
| Met
hods
11
Comparison of formants
The process of comparing formants with the module involves two stages.
Additional features:
• Visual comparison of selected sounds on a vowel chart
• Comparison of the average formant values for selected sounds of two speakers
• Specifying words or triads as textual comments on reference fragments
• Exporting tables of reference fragments and results to MS Word
1. Search and selection of reference sound fragments for known and unknown speakers:
• using the scatter plot with vowel triangle and highlighting the searching area
• specifying the frequency range of formants search
• by the position of horizontal marks indicating the limits in hertz and percentage
• using a graphical vowel chart
2. Expert comparison. The module automatically calculates FR, FA and LR for the sounds selected and decides whether the outcome of identification is positive, negative or undefined
Speaker identification using the expert method of formant comparison
SIS
| Met
hods
12
Pitch comparison
Identification wizard
The pitch comparison module compares the specificities of speakers’ melodic patterns. The module enables melodic fragments to be selected, attributes them to 1 of 18 possible melodic types and compares them according to 15 parameters, including maximum, average and minimum pitch values, rate of pitch change, skewness, kurtosis and others.
The algorithm generates results in the form of a match percentage for each parameter and delivers an overall identification/elimination conclusion or an inconclusive result. All data can be easily exported as text reports.
This plugin offers a step-by-step identification process, displays the stages of research, and visualises the results for any comparison made.
Speaker identification using pitch comparison
SIS
| Met
hods
13
Overall conclusion
Analysis of an audio track extracted from a video
The outcome of each method can be saved in a given project. The programme is designed to bring the results from each module into account when making an overall conclusion. The expert can adjust the relative weight of each method in the overall conclusion or their significance can be automatically assigned through a calculation of the qualitative and quantitative characteristics of the audio recordings being compared. Based on the results, the expert can automatically generate a detailed report.
With the new SIS method, the expert gets immediate access to the audio track of a video file without requiring any additional editors. Just, upload the video file and SIS will automatically extract the audio track from the video and open it in a separate window.
The module allows work to be simultaneously done on a video in the video player and an audio track in the editor. The video and the sound are synchronised, and the video is automatically modified while the audio track is being edited.
NEW
Extraction and analysis of the audio track from a video
14
EdiTracker
The plugin performs diagnostics of the authenticity of analogue and digital audio recordings and greatly simplifies expert analysis using SIS by providing the user with manual and automatic analysis methods.
EdiTracker analysis methods
• Specifying the recording device parameters
• Identifying traces of previous digital signal processing
• Auditory analysis
• Detecting traces of tampering through phase shifts in the harmonics and phase scanning
• Scanning background noise
Authenticity check for the use of digital preprocessing of the audio recording
SIS
| Met
hods
15
Every analogue recording device has unique characteristics, such as frequency response, total harmonic distortion, pitch variation, effective frequency range, tempo deviation, etc.
EdiTracker automatically assesses these characteristics using a test signal. A mismatch between recording device parameters and characteristics of a signal allegedly recorded with that unit may be an indication of tampering.
Digital processing of analogue signals always requires a specific sample rate. During the digitising process, a phenomenon known as aliasing occurs. Aliasing degrades the audio quality as high-frequency components are superimposed on low-frequency ones.
The vast majority of analogue-to-digital and digital-to-analogue converters use anti-aliasing filters. EdiTracker automatically detects traces of such filters, the presence of which may suggest that the audio has been digitised.
EdiTracker automatically scans audio for technical narrow-band signals which normally come from an electrical network (ENF), batteries, nearby electrical appliances, etc.,
and estimates their phase continuity. An unjustified phase break can be interpreted as potential evidence of audio editing.
Specifying the recording device parameters
Identifying traces of digital preprocessing
Detecting traces of tampering through phase shifts in the harmonics
SIS
| Met
hods
16
Background scanning detects dramatic changes in the spectrum that are unnoticeable on the waveform and which may be signs of audio editing.EdiTracker also automatically scans the integrity of background noises and marks any abrupt change in noise level.
During the playback of an original audio recording, the entirety of the audio communication—including verbal and nonverbal speaker output and additional background interference—come together to form a complete and integrated picture of the audio and speech environment. Auditory analysis of these events based on the known characteristics of the recording
equipment and methods used can reveal possible violations in the intergrity of the overall audio picture and identify the location, facts and methods of such violations.EdiTracker provides an extended list of auditory and linguistic indicators that may indicate breaches in the authenticity of a recording. These resources can be used to create a textual report.
Scanning background noise
Auditory analysis
Authenticity check for hidden editing of an audio recording based on the uniformityof the background noise
SIS
| Met
hods
17
The spoofing detector searches for traces of spoofing attacks in the audio recording, such as replays, speech synthesis and voice disguising. This algorithm is based on a neural network trained on various types of spoofing. As a result, it can conclude whether or not the audio recording is masquerading as the authentic recording of a speaker.
Spoofing detection
Diagnostic module
A new SIS module for a more reliable assessment of the authenticity and examinability of an audio recording. The module detects various signal features that explain the nature of its origin or possible processing methods, which may either be unknown or deliberately hidden. In addition to EdiTracker, it detects the application of certain operations on a signal using the following methods:• Spoofing detection• DC offset analysis• Analysis of A/μ encoding traces• Analysis of MP3 encoding traces
NEW
Expert spoofing detection analysis
SIS
| Met
hods
18
This module analyses the audio recording to identify any dramatic change in DC offset, as this may be a sign of integrity violation. If such a violation is detected, the module highlights the corresponding areas.
This module analyses the audio recording to detect areas with signs of A/μ encoding. The possibility that an audio recording has been processed using these codecs is not indicated by the recording format. In the event of the detection of such coding, the module highlights the corresponding areas or the entire audio recording.
DC offset analysis
Detection of A/μ coding
Detecting the disturbance of DC offset uniformity iin two areas in the audio recording
Detecting A/μ coding areas
SIS
| Met
hods
19
This module analyses the audio recording to identify signs of MP3 coding. The possibility that an audio recording has been processed using this codec is not indicated by the recording format. In the event of the detection of such MP3 coding, the module displays a message describing the signs detected. Additionally, spectrograms, graphs and histograms are displayed, explaining the decision made by the algorithm.
Detection of MP3 coding
Detecting MP3 coding
SIS
| Met
hods
20
Sound CleanerNoise reduction and audio enhancement software
Usually, the examination of audio recordings requires the creation of a verbatim record, or transcription, thereof. Since audio recordings obtained in an operational context are often recorded in difficult conditions and not readily intelligible, the first step is to clean the sound of noise. To do this, the IKAR Lab 3 suite is optionally equipped with Sound Cleaner. It includes modern signal processing algorithms that are effective at suppressing broadband noise, tonal interference and pulses, while performing frequency response correction, equalising the signal, etc.
To determine the characteristics of noise and interferences, it is possible to build spectrograms, including ones in 3D FFT format. This increases the speed and accuracy of noise reduction.
NEW
A waveform and stereo signal processing scheme with filters and parameters pre-set by an expert
Simultaneous display of the spectrogram and waveform for quick assessment of signal and frequency processing needs
Soun
d Cl
eane
r
21
All filters work in real time — the result can be heard immediately after the filter has been added to the processing chain so that the user can select the optimal parameters by ear.
Sound Cleaner saves its processing results in WAV format and automatically generates comprehensive textual reports that log the process. The programme is compatible with any sound editors using VST 3 format.
STC Auto FilterSignificantly reduces the level of the most common types of noise using a single controller.
Cell phone Noise FilterReduces interference from the characteristic intermittent sounds of incoming mobile phone calls.
Broadband Noise Filter Reduces the noise level from rooms and streets and interference from communication channels or recording equipment. Such noises take the form of a hiss and cannot be suppressed by other methods, since the interference spectra intersect or coincide with the useful signal spectra.
Inverse FilterEqualises the frequency response of the communication channel in which the recording was created. The filter has two settings: amplification of the weak and suppression of the strong spectral components of the signal (flattening the average spectra).
Tone SuppressorSuppresses the stationary narrow-band and regular interferences (vibrations, network pickup, noises from home appliances, slow music, the sound of a car passing, noise from water or a room, reverberation, etc.).
Reverb SuppressorIncreases the intelligibility of speech, decreases the level of reverberation in recordings and reduces user fatigue by making the perception of a useful reverberated speech signal easier despite the presence of additional noise.
Click SuppressorAutomatically restores the speech or music signals distorted by pulse noises (clicks, radio noises, knocks, crackles, etc.).
Dynamic Range control Improves intelligibility when large drops occur in signal level. For example, amplifying a weak signal and suppressing a strong signal to balance the amplitude of the output signal.
EqualiserA 4096-band graphic equaliser with a built-in spectrograph for detailed spectrum correction in distorted recordings.
Clip RestorerRestores overloaded recording fragments by reconstructing their waveforms.
Reference Noise SuppressorSuppresses any noise from the main channel present in the reference channel (for example, TV or radio broadcastings, music, etc).
DTMF SuppressorProcesses the phone dialling signals, which is a sequence of short rectangular pulses with dual-frequency filling.
Soun
d Cl
eane
r
22
CaesarAudio recording transcription module
The module is designed to produce a verbatim transcription of recorded speech. The text is output into MS Word and automatically synchronised with the audio recording. This process simplifies the search for the corresponding audio fragment and text editing. The ability to playback the recording and transcribe it in a single, offline interface makes the expert’s work easier.
Audio recording text transcription
Caes
ar
23
STC-H246Audio hardware
To guarantee the high quality of input and output signals, the IKAR Lab 3 suite is equipped with a professional STC-H246 audio hardware device.
STC-H246 is perfect for setting up a workstation for digitising analogue audio recordings.The device is designes to measure parameters and generating electrical signals in the audio frequency range.
Parameter Value
Sampling rate 8–200 kHz
Resolution ADC/DAC 16-bit, 24-bit
Signal-to-noise ratio in the end-to-end channel, in the frequency band from 20 to 20 kHz
105 dB
Input/output channel connector types XLR, RCA, S/PDIF, TRS 6.3
Number of channels 2
Power 110/220 V 60/50 Hz
Case Metal
Size 111×166×190 mm
OS Windows 7, 8, 10
STC-
H24
6
24
Speech Technology Center is a global developer of products and solutions based on intelligent speech technologies and voice and facial biometrics. Over the last 30 years, we have accumulated highly sought-after expertise in artificial intelligence and machine learning.
We hold a leading position in the NIST, ASVspoof Challenge, VOiCES and CHiME world ratings. So far, we have implemented projects for more than 5,000 clients in 75 countries.
Many of our solutions have been applied in the public and commercial sectors to great advantage, from small expert laboratories to complex national security systems all around the world, includ-ing those in the USA, Latin America, the Middle East and Europe.
About the company
Images in this document are for example purposes only and may differ from the actual product. Full product specifications are available on request. Speech Technology Center Limited reserves the right to modify product characteristics and/or withdraw any product without prior notice
St. PetersburgVyborgskaya Embankment 45, Bldg. EUnit 1-Н, Off. 133, 194044 +7 812 325 8848 [email protected]
Moscow3 Marksistskaya St., Bldg. 2109147+7 495 669 [email protected]