Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found...

Voice RecognitionBy: Tim Lindquist & Alex Christenson

Overview● Project Objective

● Background

● Feature Extraction Process

● Feature Matching Process

● Implementation

● Demonstration

● Python

ObjectiveDevelop a real time speaker identification system using Python

Project Status:

MATLAB=working

Python=in progress

BackgroundSpeaker Identification:

-understanding who is speaking

Speaker Verification:

-is the process of accepting or rejecting the identity claim of a speaker

Speech Recognition vs. Speaker Recognition:

-identifying what is said vs. who said it

Overall Process

Feature Extraction

Input audio signal sampled at fs=10000Hz

Human voice max frequency is 3000Hz (fs satisfies Nyquist rate)

Frame BlockingBlocking: Signal is blocked into frames of N samples. With overlap N-M

N=256 M=100

Windowingeach frame is windowed to minimize discontinuities at the end points of each frame

Size 0<n<N-1 using Hamming window

FFTDFT: using FFT function, converts each frame from time domain into the frequency

domain

Mel-Frequency WrappingFilterbank with triangular bandpass frequency response

Linear frequency spacing <1000 Hz<Logarithmic frequency spacing

Human Speech Є BL{300, 3000} Hz

k=number of mel spectrum coefficients=20

CepstrumDCT: converts the mel spectrum coefficients back to time domain

Provides a good representation of the local spectral properties for a given frame

Output is a set of coefficients called an acoustic vector

Feature MatchingVector Quantization(VQ): Process of mapping vectors to a finite number of regions in

space

Cluster: The region the VQ maps too

Codeword: center of a cluster

Codebook: collection of codewords

Feature MatchingSpeaker 1- Acoustic vector(circles)

Speaker 2- Acoustic vector (triangles)

Acoustic vector=clusters of speaker samples

Codewords(black shapes)=center of clusters

Codebook(yellow box)=collection of codewords

Clustering the Training Vectors1. Design a 1-vector codebook

2. Split codebook according to rule

3. Search for the Nearest neighbor

4. Update the centroid

5. Iterate 3, 4 until average distance< threshold (ε)6. Iterate 2,3 and 4 until a codebook size (M) is designed

ImplementationTraining Phase Testing Phase

● Input: signal used as reference for verification Input: new signal & reference codebook

● Output: vector quantized codebook Output: The reference signal that matches

Process Process

1. Read audio signal 1. Steps 1-6 again

2. Block into frames of 256 samples 2. Find minimum distance to codeword

3. Hamming filter blocks 3. Identify speaker from cluster

4. Compute DFT of blocks

5. Compute power spectrum & Mel filter

6. Take DCT to produce Mel frequency cepstral coefficients

7. Assemble code book through VQLBG algorithm

Demonstrationcode=train('traindir2\',2);

test('testdir2\', 2, code);

test('testdir1\', 4, code);

Trained with 44 english sounds

Python CodeFound libraries that use MATLAB commands

Manually rewriting scripts

So far

● Record audio from mic, automatically split when silence occurs

● Progress making melfb and mfcc functions

Sourceshttp://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency

-cepstral-coefficients-mfccs

https://en.wikipedia.org/wiki/Vector_quantization

http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/

http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs





Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found...

Documents

Transcript of Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found...