Speech Recognition Using Matlab

3
Speech recognition using matlab Project leader: Anurag kumar dwivedi Team members: Gaurav srivastava Nitin banarasi Ram prakash Mishra Objective: This project aims at developing a system which is able to recognize the human being on the basis of the words pronounced by them. Technical details: Speech recognition is a multileveled pattern recognition task, in which acoustical signals are examined and structured into a hierarchy of subword units (e.g., phonemes), words,phrases, and sentences. Each level may provide additional temporal constraints, e.g., known word pronunciations or legal word sequences, which can compensate for errors or uncertainties at lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at the highest level. There is basically three parameters which governs the speech recognition: Energy(Amplification) Spectral parameter Fundamental frequency Since our speech contains both voiced , unvoiced and transient excitation.in speech recognition system first of all we separate vocal tract from excitation.Hence we use MFCC(mel frequency ceptral coefficient). The speech introduced in the system is first

Transcript of Speech Recognition Using Matlab

Page 1: Speech Recognition Using Matlab

Speech recognition using matlabProject leader: Anurag kumar dwivedi

Team members: Gaurav srivastava

Nitin banarasi

Ram prakash Mishra

Objective:This project aims at developing a system which is able to recognize the human being on the basis of the words pronounced by them.

Technical details:Speech recognition is a multileveled pattern recognition task, in which acoustical signals are examined and structured into a hierarchy of subword units (e.g., phonemes), words,phrases, and sentences. Each level may provide additional temporal constraints, e.g., known word pronunciations or legal word sequences, which can compensate for errors or uncertainties at lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at the highest level. There is basically three parameters which governs the speech recognition:

Energy(Amplification) Spectral parameter Fundamental frequency

Since our speech contains both voiced , unvoiced and transient excitation.in speech recognition system first of all we separate vocal tract fromexcitation.Hence we use MFCC(mel frequency ceptral coefficient). The speech introduced in the system is first sampled and digitized.After digitizing we take small samples and using premphasis filter we eliminate the unwanted slope which is occurred due to excitation.Now the signal obtained from filter is windowed and we take the small part of it.Since the windowing function changes every time & hence the over lapping occurs .This overlapping is used is used to recognize the speech. Similar to filtering a time signal by manipulation in the frequency domain cepstral liftering create a smooth spectrum without ripple. Power spectrum is real valued and symmetric .Hence we use cosine transform.To recognize the speech we enter certain input into the system and then compute the difference between actual input and desired input.Difference is smaller more accurate our system will be.For this purpose we use different distance measure algorithm .We use dynamic programing for pattern

Page 2: Speech Recognition Using Matlab

matching .It is also known as dynamic time warping. DP finds the optimal time warping functionneeded to compare two vector sequences X and M.Simultaneously, the distance D(X,M) between the vector sequences is computed.The vector sequence which has the least distance is preferred for the further analysis. High variability of speech patterns require a large number of

prototypes to adequately model a given class. Variability of speech

patterns is extremelyhigh for speaker independent tasks.Hence we use statistic modeling of speech production process.We use HMM(Hidden markov model) for stochastic modeling.first all the object is classified into different classes.For each class a stochastic model is constructed.For a given unknown vector sequence the model specific emission probability density is computed.This unknown vector sequence is assigned to those

stochastic model which has highest probability density. Each word model

requires huge amount of training data. Changing the vocabulary requires recording of new speech samples.hence first we make the word model from subword unit and then a limited number of subword units can be trained using a limited amount of speech data.Since we know that All sequences of words differ often only in the permutations of the most ambiguous words. To remove this problem we use word graphs.

Innovativeness & Usefulness:In this project we have used HMM model. HMMs have a variety ofapplications. When HMM is applied to speech recognition,the states are interpreted as acoustic models indicating what sounds are likely to be

heard during their corresponding segments of speech; while the transitions

provide temporal constraints, indicating how the states may follow each other in sequence. Because speech always goes forward in time, transitions in a speech application always go forward (or make a self-loop, allowing a

state to have arbitrary duration).

Tools used:We have used Matlab R2008b.