Download - Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Transcript
Page 1: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Introduction to

Audio Signal Processing Human-Computer Interaction

Angelo Antonio Salatino [email protected]

http://infernusweb.altervista.org

Page 2: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

License

This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Page 3: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Overview

• Audio Signal Processing;

• Waveform Audio File Format;

• FFmpeg;

• Audio Processing with Matlab;

• Doing phonetics with Praat;

• Last but not least: Homework.

Page 4: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Audio Signal Processing

• Audio signal processing is an engineering field that focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Audio Signal

Processing

Input Signal

Output Signal

Data with meaning

Page 5: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Audio Processing in HCI

Some HCI applications involving audio signal processing are:

• Speech Emotion Recognition

• Speaker Recognition

▫ Speaker Verification

▫ Speaker Identification

• Voice Commands

• Speech to Text

• Etc.

Page 6: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Audio Signals

You can find audio signals represented in either digital or analog format.

• Digital – the pressure wave-form is a sequence of symbols, usually binary numbers.

• Analog – is a smooth wave of energy represented by a continuous stream of data.

Page 7: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Analog to Digital Converter (ADC)

• Don’t worry, it’s only a fast review!!!

Sample & Hold

Quantization Encoding Continuous in Time Continuous in Amplitude

Discrete in Time Continuous in Amplitude

Discrete in Time Discrete in Amplitude

Discrete in Time Discrete in Amplitude

Analog Signal Digital Signal

• For each measurement a number is assigned according to its amplitude.

• Sampling frequency and the number of bits to represent a sample can be considered as main features for digital signals.

• How these digital signals are stored?

Sampling Frequency must be defined

# bits per sample must be defined

Page 8: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV)

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

The Wav file is an instance of a Resource Interchange File Format (RIFF) defined by IBM and Microsoft. The RIFF is a generic file container format for storing data in tagged chunks (basic building blocks). It is a file structure that defines a class of more specific file formats, such as: wav, avi, rmi, etc.

Page 9: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form)

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

ChunkSize This is the size of the rest of the chunk following this number. The size of the entire file in bytes minus 8 for the two fields not included: ChunkID and ChunkSize.

Format Contains the letters «WAVE» in ASCII form (0x57415645 big-endian form)

Page 10: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV)

SubChunk1ID Contains the letters «fmt » in ASCII form (0x666d7420 big-endian form)

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

SubChunk1Size 16 for PCM. This is the size of the SubChunk which follows this number.

Page 11: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV)

AudioFormat Format Code or compression type: PCM = 0x0001 (Linear quantization, uncompressed) IEEE_FLOAT = 0x0003 Microsoft_ALAW=0x0006 Microsoft_MLAW=0x0007 IBM_ADPCM = 0x0103 …

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

NumChannels Mono = 1, Stereo = 2, etc. Note: Channels are interleaved

Page 12: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV)

SampleRate Samplig frequency: 8000, 16000, 44100, etc.

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

ByteRate Average bytes per second. It is typically determined by the Equation 1.

1) ByteRate = SampleRate ⋅NumChannels ⋅BitsPerSample

8

2) BlockAlign = NumChannels ⋅BitsPerSample

8

BlockAlign The number of bytes for one sample including all channels. It is determined by the Equation 2.

Page 13: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Waveform Audio File Format (WAV)

BitsPerSample 8 bits = 8, 16 bits = 16, etc.

Endianess Byte

Offeset Field Name Field Size Description

Big 0 ChunkID 4

RIFF Chunk Descriptor Little 4 ChunkSize 4

Big 8 Format 4

Big 12 SubChunk1ID 4

Format SubChunk

Little 16 SubChunk1Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Data SubChunk Little 40 SubChunk2Size 4

Little 44 Data SubChunk2Size

SubChunk2ID Contains the letters «data» in ASCII form (0x64617461 big-endian form)

SubChunk2Size This is the number of bytes in the Data field. If AudioFormat=PCM, then you can compute the number of samples (see Equation 3).

3) NumOfSamples =8 ⋅ SubChunk2Size

NumChannels ⋅ BitsPerSample

Page 14: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Example of wave header

Chunk Descriptor Fmt SubChunk

52 49 46 46 16 02 01 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00

R I F F W A V E f m t

Fmt SubChunk (cont…) Data SubChunk

80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 f2 01 01 00 … . . .

d a t a

SampleRate = 16000

ChunkSize = 66070

ByteRate = 32000

BloackAlign = 2

BitsPerSample = 16

NumChannels = 1

AudioFormat = 1 (PCM)

SubChunk1Size = 16

SubChunk2Size = 66034

Data

Page 15: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Exercise

For the next 15 min, write a C/C++ program that takes a wav file as input and prints the following values on standard output: • Header size; • Sample rate; • Bits per sample; • Number of channels; • Number of samples.

Good work!

Page 16: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Solution typedef struct header_file

{

char chunk_id[4];

int chunk_size;

char format[4];

char subchunk1_id[4];

int subchunk1_size;

short int audio_format;

short int num_channels;

int sample_rate;

int byte_rate;

short int block_align;

short int bits_per_sample;

char subchunk2_id[4];

int subchunk2_size;

} header;

/************** Inside Main() **************/

header* meta = new header;

ifstream infile;

infile.exceptions (ifstream::eofbit | ifstream::failbit | ifstream::badbit);

infile.open("foo.wav", ios::in|ios::binary);

infile.read ((char*)meta, sizeof(header));

cout << " Header size: "<<sizeof(*meta)<<" bytes" << endl;

cout << " Sample Rate "<< meta->sample_rate <<" Hz" << endl;

cout << " Bits per samples: " << meta->bits_per_sample << " bit" <<endl;

cout << " Number of channels: " << meta->num_channels << endl;

long numOfSample = (meta->subchunk2_size/meta->num_channels)/(meta->bits_per_sample/8);

cout << " Number of samples: " << numOfSample << endl;

However, this solution contains an error. Can you spot it?

Page 17: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

What about reading samples? short int* pU = NULL;

unsigned char* pC = NULL;

gWavDataIn = new double*[meta->num_channels]; //data structure storing the samples

for (int i = 0; i < meta->num_channels; i++) gWavDataIn[i] = new double[numOfSample];

wBuffer = new char[meta->subchunk2_size]; //data structure storing the bytes

/* data conversion: from byte to samples */

if(meta->bits_per_sample == 16)

{

pU = (short*) wBuffer;

for( int i = 0; i < numOfSample; i++)

for (int j = 0; j < meta->num_channels; j++)

gWavDataIn[j][i] = (double) (pU[i]);

}

else if(meta->bits_per_sample == 8)

{

pC = (unsigned char*) wBuffer;

for( int i = 0; i < numOfSample; i++)

for (int j = 0; j < meta->num_channels; j++)

gWavDataIn[j][i] = (double) (pC[i]);

}

else

{

printERR("Unhandled case");

}

This solution is available at: https://github.com/angelosalatino/AudioSignalProcessing

Page 18: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A better solution: FFmpeg

What FFmpeg says about itself:

• FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. No matter if they were designed by some standards committee, the community or a corporation.

Page 19: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Why FFmpeg is better?

• Off-the-shelf;

• Open Source;

• We can read samples from different kind of formats: wav, mp3, aac, flac and so on;

• The code is always the same for all these audio formats;

• It can also decode video formats.

Page 20: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A little bit of code …

Step 1

• Create AVFormatContext

▫ Format I/O context: nb_streams, filename, start_time, duration, bit_rate, audio_codec_id, video_codec_id and so on.

• Open file

AVFormatContext* formatContext = NULL;

av_open_input_file(&formatContext,"foo.wav",NULL,0,NULL)

Page 21: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A little bit of code …

Step 2

• Create AVStream

▫ Stream structure; It contains: nb_frames, codec_context, duration and so on;

• Association between audio stream inside the context and the new one.

// Find the audio stream (some container files can have multiple streams in them) AVStream* audioStream = NULL; for (unsigned int i = 0; i < formatContext->nb_streams; ++i) if (formatContext->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) { audioStream = formatContext->streams[i]; break; }

Page 22: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A little bit of code …

Step 3 • Create AVCodecContext

▫ Main external API structure; It contains: codec_name, codec_id and so on.

• Create AVCodec ▫ Codec Structure; It contains deep level information about

codec. • Find codec availability • Open Codec

AVCodecContext* codecContext = audioStream->codec;

AvCodec codec = avcodec_find_decoder(codecContext->codec_id);

avcodec_open(codecContext,codec);

Page 23: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A little bit of code …

Step 4 • Create AVPacket

▫ This structure stores compressed data.

• Create AVFrame

▫ This structure describes decoded (raw) audio or video data.

AVPacket packet;

av_init_packet(&packet);

AVFrame* frame = avcodec_alloc_frame();

Page 24: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A little bit of code …

Step 5

• Read packets

▫ Packets are read from AVContextFormat

• Decode packets

▫ Frame are decodec with CodecContext // Read the packets in a loop while (av_read_frame(formatContext, &packet) == 0) { … avcodec_decode_audio4(codecContext, frame, &frameFinished, &packet); … src_data = frame->data[0]; }

Page 25: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Problems with FFmpeg

• Update issues (with lib update, your previous code might not work) ▫ Deprecated methods; ▫ Function name or parameters could change.

• Poor documentation (until today)

Example of migration:

• avcodec_open (AVCodecContext *avctx, const AVCodec *codec)

• avcodec_open2 (AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options)

Page 26: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Audio Processing with Matlab

• Matlab contains a lot of built-in functions to read, listen, manipulate and save audio files.

• It also contains Signal Processing Toolbox and DSP System Toolbox

Advantages Disadvantages

• Well documented; • It works on different level of

abstraction; • Direct access to samples; • Coding is simple.

• Only wave, flac, mp3, mpeg-4 and ogg formats are recognized in audioread (Is it really a disadvantage?);

• License is expensive.

Page 27: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Let’s code: Opening files %% Reading file % Section ID = 1 filename = './test.wav'; [data,fs] = wavread(filename); % reads only wav file % data = sample collection, fs = sampling frequency % or ---> [data,fs] = audioread(filename); % write an audio file audiowrite('./testCopy.wav',data,fs)

Recognized formats by audioread()

Page 28: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Information and play %% Information & play % Section ID = 2 numberOfSamples = length(data); tempo = numberOfSamples / fs; disp (sprintf('Length: %f seconds',tempo)); disp (sprintf('Number of Samples %d', numberOfSamples)); disp (sprintf('Sampling Frequency %d Hz',fs)); disp (sprintf('Number of Channels: %d', min(size(data)))); %play file sound(data,fs); % PLOT the signal time = linspace(0,tempo,numberOfSamples);

plot(time,data);

Page 29: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Framing %% Framing % Section ID = 4 timeWindow = 0.04; % Frame length in term of seconds. Default: timeWindow = 40ms timeStep = 0.01; % seconds between two frames. Default: timeStep = 10ms (in case of OVERLAPPING) overlap = 1; % 1 in case of overlap, 0 no overlap sampleForWindow = timeWindow * fs; if overlap == 0; Y = buffer(data,sampleForWindow); else sampleToJump = sampleForWindow - timeStep * fs; Y = buffer(data,sampleForWindow,ceil(sampleToJump)); end [m,n]=size(Y); % m corresponds to sampleForWindow numFrames = n; disp(sprintf('Number of Frames: %d',numFrames));

𝑠(𝑡) = 𝑥(𝑡) ⋅ 𝑟𝑒𝑐𝑡𝑡 − 𝜏

#𝑠𝑎𝑚𝑝𝑙𝑒

Page 30: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Windowing %% Windowing % Section ID = 5 num_points = sampleForWindow; % some windows USE help window w_gauss = gausswin(num_points); w_hamming = hamming(num_points); w_hann = hann(num_points); plot(1:num_points,[w_gauss,w_hamming, w_hann]); axis([1 num_points 0 2]); legend('Gaussian','Hamming','Hann'); old_Y = Y; for i=1:numFrames Y(:,i)=Y(:,i).*w_hann; end %see the difference index_to_plot = 88; figure plot (old_Y(:,index_to_plot)) hold on plot (Y(:,index_to_plot), 'green') hold off clear num_points w_gauss w_hamming w_hann

𝑤𝐺𝐴𝑈𝑆𝑆(𝑛) = 𝑒−12𝑛−(𝑁−1) 2 𝜎(𝑁−1) 2

2

, 𝜎 ≤  0.5

𝑤𝐻𝐴𝑀𝑀𝐼𝑁𝐺(𝑛) = 0.54 + 0.46 cos2𝜋𝑛

𝑁 − 1

𝑤𝐻𝐴𝑁𝑁(𝑛) = 0.5  1 + cos2𝜋𝑛

𝑁 − 1

Page 31: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Energy %% Energy % Section ID = 6 % It requires that signal is already framed % Run Section ID=4 for i=1:numFrames energy(i)=sum(abs(old_Y(:,i)).^2); end figure, plot(energy)

𝐸 = |𝑥(𝑖)|2𝑁

𝑖=1

Page 32: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Fast Fourier Transform (FFT)

%% Fast Fourier Transform (sull'intero segnale) % Section ID = 7 NFFT = 2^nextpow2(numberOfSamples); % Next higher power of 2. (in order to optimize FFT computation) freqSignal = fft(data,NFFT); f = fs/2*linspace(0,1,NFFT/2+1); % PLOT plot(f,abs(freqSignal(1:NFFT/2+1))) title('Single-Sided Amplitude Spectrum of y(t)') xlabel('Frequency (Hz)') ylabel('|Y(f)|') clear NFFT freqSignal f

Page 33: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Short Term Fourier Transform (STFT)

%% Short Term Fourier Transform % Section ID = 8 % It requires that signal is already framed. Run Section ID=4 NFFT = 2^nextpow2(sampleForWindow); STFT = ones(NFFT,numFrames); for i=1:numFrames STFT(:,i)=fft(Y(:,i),NFFT); end indexToPlot = 80; %frame index to plot if indexToPlot < numFrames f = fs/2*linspace(0,1,NFFT/2+1); plot(f,2*abs(STFT(1:NFFT/2+1,indexToPlot))) % PLOT title(sprintf('FFT del frame %d', indexToPlot)); xlabel('Frequency (Hz)') ylabel(sprintf('|STFT_{%d}(f)|',indexToPlot)) else disp('Unable to create plot'); End % ********************************************* specgram(data,sampleForWindow,fs) % SPECTROGRAM title('Spectrogram [dB]')

Page 34: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Auto-correlation %% Auto-Correlazione per frames % Section ID = 9 % It requires that signal is already framed % Run Section ID=4 for i=1:numFrames autoCorr(:,i)=xcorr(Y(:,i)); end indexToPlot = 80; %frame index to plot if indexToPlot < numFrames % PLOT plot(autoCorr(sampleForWindow:end,i)) else disp('Unable to create plot'); end clear indexToPlot

Rx(n) = x(i) ⋅ x(i+ n)

𝑁

𝑖=1

Page 35: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

A system for doing phonetics: Praat

• PRAAT is a comprehensive speech analysis, synthesis, and manipulation package developed by Paul Boersma and David Weenink at the Institute of Phonetic Sciences of the University of Amsterdam, The Netherlands.

Page 36: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Pitch with Praat

Page 37: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Formants with Praat

5th

4th

3rd

2nd

1st

Page 38: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Other features with Praat

• Intensity

• Mel-Frequency Cepstrum Coefficients (MFCC);

• Linear Predictive Coefficients (LPC);

• Harmonic-to-Noise Ratio (HNR);

• and many others.

Page 39: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Scripting in Praat

• Praat can run scripts containing all the different commands available in its environment and perform the operations and functionalities that they represent.

fileName$ = "test.wav" Read from file... 'fileName$' name$ = fileName$ - ".wav" select Sound 'name$' To Pitch (ac)... 0.0 50.0 15 off 0.1 0.60 0.01 0.35 0.14 500.0 numFrame=Get number of frames for i to numFrame time=Get time from frame number... i value=Get value in frame... i Hertz if value = undefined value=0 endif path$=name$+"_pitch.txt" fileappend 'path$' 'time' 'value' 'newline$' endfor select Pitch 'name$' Remove select Sound 'name$' Remove

Here is an example to perform a pitch listing and save it in a text file.

Page 40: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

Homework

• Exercise 1) Consider a speech signal containing silence, unvoiced and voiced regions, as showed here and write a Matlab function (or whatever language you prefer) capable to identify these sections.

• Exercise 2) Then, in voiced regions identify the fundamental frequency, the so called pitch.

Please, try this at home!!

Voiced

Unvoiced

Silence

Page 41: Introduction to Audio Signal ProcessingWaveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form) Endianess Byte Offeset Field

• Signal Processing ▫ http://deecom19.poliba.it/dsp/Teoria_dei_Segnali.pdf (Italian)

• WAV ▫ https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

▫ http://www.onicos.com/staff/iz/formats/wav.html

• MATLAB ▫ http://www.mathworks.com/products/signal/

▫ http://www.mathworks.com/products/dsp-system/

▫ http://homepages.udayton.edu/~hardierc/ece203/sound.htm

▫ http://www.utdallas.edu/~assmann/hcs7367/classnotes.html

References and further reading