Voice Recognition
-
Upload
ryan-mendez -
Category
Documents
-
view
170 -
download
0
Transcript of Voice Recognition
Voice Recognition
Josh LintagRegie LongoriaRyan Mendez
Initial Problem
• Problems with variation– Sample length and emphasis– Time domain issue: Starting and ending at the
same time• Program Design– Using the frequency domain to compare– Take an average of voice
Basic Recording
• Create a for loop for recording 10 different samples of voice to be averagedfor i = 1:10file = sprintf('%s%d.wav','g',i);input('You have 2 seconds to say your name. Press enter when ready to record--> ');y = wavrecord(88200,44100); sound(y,44100); wavwrite(y,44100,file);end
• Writes wav files into “file”
Basic Recording 2
• You’re probably wondering what this line means:y = wavrecord(88200,44100); This line basically setting the time of the recording. How do you get two seconds out of this? Well, you take the frequency of the recording (44100 hz) and divide it by 88200hz. Which gives you a half. Then you inverse the half due to the fact that HZ is just 1/second. In the end, you’d have two seconds.
Coding of the Action
name = input ('Enter the name that must be recognized -- >','s');ytemp = zeros (88200,20);r = zeros (10,1);for j = 1:10 file = sprintf ('% s % d.wav','g',j); [t, fs] = wavread (file); s = abs (t); start = 1; last = 88200; for i = 1:88200 if s (i) >=.1 && i <=7000 start = 1; break end if s (i) >=.1 && i > 7000 start = i-7000; break end end
for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=81200 last = 88200; break end if s (k)>= .1 && k <81200 last = k + 7000; break end end r (j) = last-start; ytemp (1: last - start + 1,2 * j) = t (start:last); ytemp (1: last - start + 1,(2*j - 1)) = t (start:last);end
What This Means
• This bit of code makes it look like a lot going on. Really, this code is taking the WAV file and converting it to a matrix. The first chunk is determining where your voice starts. The second is determining where it ends. It does this by determining where the drastic changes are in the frequency. It then determines the length of the entire recording.
Truncation, FFT, Normalizationy = zeros (min (r),20);for i = 1:20 y (:,i) = ytemp (1:min (r),i);end
fy = fft (y);fy = fy.*conj (fy);
fn = zeros (600,20);for i = 1:20 fn (1:600,i) = fy (1:600,i)/sqrt(sum (abs (fy (1:600,i)).^2));end
What This Means
• The first part truncates the matrix to find the minimization.
• The second part transforms it to actual waves (into the frequency domain.)
• The third part is basically getting rid of background noise by having it set to only what frequencies human speech is capable of.
Average Vector, Norm, and STDpu = zeros (600,1);for i = 1:20 pu = pu + fn (1:600,i);endpu = pu/20;
tn = pu/sqrt(sum (abs (pu).^2));
std = 0;for i = 1:20 std = std + sum (abs (fn (1:600,i)-tn).^2);endstd = sqrt (std/19);
What This Means
• The first part’s job is to simply create the average vector from the values of the matrices given in the last bit of code.
• The second portion normalizes the value given by the first.
• The third simply finds the standard deviation of the values.
Verification• Verification process
input ('You will have 2 seconds to say your name. Press enter when ready')
usertemp = wavrecord (88200,44100);sound (usertemp,44100);rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> ');while rec == 1 rec = 0; input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> ');end
What This Means
• This is the part where you record your voice for two seconds. If you’re unhappy with it, you click 1, thus clearing that last recording and restarting with a fresh one.
Test Crops = abs (usertemp);start = 1;last = 88200;for i = 1:88200 if s (i) >=.1 && i <=5000 start = 1; break end if s (i) >=.1 && i > 5000 start = i-5000; break endend
for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=83200 last = 88200; break end if s (k)>= .1 && k <83200 last = k + 5000; break endend
What This Means
Like a couple slides ago, this bit is cropping the voice recording down to a size mandated by the project. Two seconds, that is.
FFT, Plotuser = usertemp (start:last);userftemp = fft (user);userftemp = userftemp.*conj (userftemp);userf = userftemp (1:600);userfn = userf/sqrt(sum (abs (userf).^2));
hold on;subplot (2,1,1);plot (userfn)title ('Normalized Frequency Spectra Of Recording')subplot (2,1,2);plot (tn);title ('Normalized Frequency Spectra of Average')
What This Means• Computes the FFT of the recording and then normalizes it• Both the recording and the average vector is graphed onto a
plot, first half is recording and the 2nd half is average vector
Testing
s = sqrt (sum (abs (userfn - tn).^2));if s < 2*std name = strcat ('HELLO----',name,' !!!!'); nameelse name = strcat ('YOU ARE NOT---- ',name,' !!!!'); nameend