Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found...
Transcript of Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found...
![Page 1: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/1.jpg)
Voice RecognitionBy: Tim Lindquist & Alex Christenson
![Page 2: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/2.jpg)
Overview● Project Objective
● Background
● Feature Extraction Process
● Feature Matching Process
● Implementation
● Demonstration
● Python
![Page 3: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/3.jpg)
ObjectiveDevelop a real time speaker identification system using Python
Project Status:
MATLAB=working
Python=in progress
![Page 4: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/4.jpg)
BackgroundSpeaker Identification:
-understanding who is speaking
Speaker Verification:
-is the process of accepting or rejecting the identity claim of a speaker
Speech Recognition vs. Speaker Recognition:
-identifying what is said vs. who said it
![Page 5: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/5.jpg)
Overall Process
![Page 6: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/6.jpg)
Feature Extraction
Input audio signal sampled at fs=10000Hz
Human voice max frequency is 3000Hz (fs satisfies Nyquist rate)
![Page 7: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/7.jpg)
Frame BlockingBlocking: Signal is blocked into frames of N samples. With overlap N-M
N=256 M=100
![Page 8: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/8.jpg)
Windowingeach frame is windowed to minimize discontinuities at the end points of each frame
Size 0<n<N-1 using Hamming window
![Page 9: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/9.jpg)
FFTDFT: using FFT function, converts each frame from time domain into the frequency
domain
![Page 10: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/10.jpg)
Mel-Frequency WrappingFilterbank with triangular bandpass frequency response
Linear frequency spacing <1000 Hz<Logarithmic frequency spacing
Human Speech Є BL{300, 3000} Hz
k=number of mel spectrum coefficients=20
![Page 11: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/11.jpg)
CepstrumDCT: converts the mel spectrum coefficients back to time domain
Provides a good representation of the local spectral properties for a given frame
Output is a set of coefficients called an acoustic vector
![Page 12: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/12.jpg)
Feature MatchingVector Quantization(VQ): Process of mapping vectors to a finite number of regions in
space
Cluster: The region the VQ maps too
Codeword: center of a cluster
Codebook: collection of codewords
![Page 13: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/13.jpg)
Feature MatchingSpeaker 1- Acoustic vector(circles)
Speaker 2- Acoustic vector (triangles)
Acoustic vector=clusters of speaker samples
Codewords(black shapes)=center of clusters
Codebook(yellow box)=collection of codewords
![Page 14: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/14.jpg)
Clustering the Training Vectors1. Design a 1-vector codebook
2. Split codebook according to rule
3. Search for the Nearest neighbor
4. Update the centroid
5. Iterate 3, 4 until average distance< threshold (ε)6. Iterate 2,3 and 4 until a codebook size (M) is designed
![Page 15: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/15.jpg)
ImplementationTraining Phase Testing Phase
● Input: signal used as reference for verification Input: new signal & reference codebook
● Output: vector quantized codebook Output: The reference signal that matches
Process Process
1. Read audio signal 1. Steps 1-6 again
2. Block into frames of 256 samples 2. Find minimum distance to codeword
3. Hamming filter blocks 3. Identify speaker from cluster
4. Compute DFT of blocks
5. Compute power spectrum & Mel filter
6. Take DCT to produce Mel frequency cepstral coefficients
7. Assemble code book through VQLBG algorithm
![Page 16: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/16.jpg)
Demonstrationcode=train('traindir2\',2);
test('testdir2\', 2, code);
test('testdir1\', 4, code);
Trained with 44 english sounds
![Page 17: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/17.jpg)
Python CodeFound libraries that use MATLAB commands
Manually rewriting scripts
So far
● Record audio from mic, automatically split when silence occurs
● Progress making melfb and mfcc functions
![Page 18: Tim Lindquist - About Me - Voice Recognition...Trained with 44 english sounds Python Code Found libraries that use MATLAB commands Manually rewriting scripts So far Record audio from](https://reader034.fdocuments.net/reader034/viewer/2022042022/5e796ed725601d6fc42b1dd5/html5/thumbnails/18.jpg)
Sourceshttp://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/
http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency
-cepstral-coefficients-mfccs
https://en.wikipedia.org/wiki/Vector_quantization