Speech recognition techniques
-
Upload
sonukumar142 -
Category
Engineering
-
view
188 -
download
0
Transcript of Speech recognition techniques
Acoustic Speech Recognition Techniques
Audio Signal Recognized Text
Sonu Kumar Mishra
BE Comp 2015-16
Introduction Historical Survey Motivation General Model of ASR Feature Extraction Hidden Markov Model Existing Systems Developing ASR Systems Revolution in ASR
The first speech recognition system (Audrey) was developed at bell laboratories in 1952. It could recognise numbers spoken by one person.
In 1970s Carnegie Mellon came out with HARPY system which could recognise 1011 words with different pronunciation.
In 1980s new systems based on Hidden Markov Model was introduced. HMM was statistical approach and more robust than the earlier technology.
Language is the fundamental mode of communication. Communicating with machines in natural language effectively is a challenge.
We can use ASR systems to control machines, find contents online and contribute to generate contents.
Most of the speech recognition systems and contents (about 80 %) are available for 10 major languages. Hence, there is a need to expand the system for local languages.
If we can interact with machines in our own local language then it would be greater achievement for modern era.
Speech Input
Analog to digital
Feature Extraction
(Generate speech fingerprint)
Compare and Select wordsProb
Matrices updated
Compare and Select Sentence level match
Pick most probable word
Output
Training algorithm
Word fingerprint template
Sentence fingerprint template
HMM
“Dividing sound waves, extract phonemes & represent using some parameters”
LPCC MFCC RASTA-PLP
Low resource, High popularity,Easy implementation,Single speaker, single language, Below 300 words
Moderate resource, High popularity, Easy-moderate impl,Multi speaker, Multi-language, Moderate vocabulary
High resource, Low popularity, Modrate-hard impl, Multi-speaker, Multi-language, Large vocabulary
Power Spectral Analysis (FFT)
First Order Derivative (DELTA)
Energy Normalization
Outdated Techniques
DNN
Multiple User
Multiple Language
Large Vocabulary
Abundant Resources
Phonetic Dictionary (TTS Synthesizer)
Z1
XnX2X1
Z2 Zn
Observed Data
Hidden or latent data
“Markov Chain”
Why HMM ?
Simple for sequential and temporal data. Handle real world applications. It works on the principle of New State = ʄ (old state, noise)
Initial Probability
Transition Probability
Observed Probability
Applications :
Speech Recognition. Facial Expression Recognition. Handwriting Recognition. Bioinformatics : Analyzing biological
data.
Large systems like Siri, Google voice and Cortana are based on neural network.
High computing processors.
AI algorithms Parallel processing.
Time
Money
ScientistsComputing Power
Engineers
Using built-in supportIn order to build from scratch
Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.
An open source toolkit for speech recognition, which includes a recognizer library written in C; an adjustable, modifiable recognizer written in Java. Acoustic model, language model, Input source, Dictionary
Use of ASR systems to interact with the devices used in daily life.
ASR systems working in local languages.
Developing Neural network based ASR systems working in all major languages.