2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh
-
Upload
beverly-wilkins -
Category
Documents
-
view
21 -
download
1
description
Transcript of 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh
1
2009 Almost-Spring Short Course on Speech Recognition
Instructors: Bhiksha Raj and Rita Singh
Welcome
2
What will the course be about
• We will cover most relevant topics of speech recognition
• The focus will be on the theory and practice– We will not discuss code for the most part– We will keep maths out of it as far as possible,
however
• We will discuss algorithms and implementation details
3
Instructors
• Bhiksha Raj: Carnegie Mellon University– Expert in speech recognition
• Rita Singh: Carnegie Mellon University– Expert in speech recognition
• Peter Wolf: Independent Consultant– Previously in Dragon Systems Inc.– Sphinx4 expert, expert in speech recogintion
application development– Brought in primarily as a resource for helping with
sphinx4 and answering applications related questions
4
Format of Course
• 3 Lectures daily– Morning: 8.00 AM, 1.00 – 1.30 ours– Late Morning / Early Afternoon: 11:00 AM– Afternoon: 2.30 PM
• The schedule is flexible – timings may vary depending on how much is covered
• Lectures expected to last 1.00 – 1.5 hours each
• Intervening times expected to be taken up by exercises
5
Instruction Format
• Lectures will be pictorially oriented
• Although we will cover general topics, the specific implementations described will be based on CMU Sphinx– Most other systems are similar
• Exercises will be based on sphinx
6
Lecture Outline: Day 1
• Lecture 1: “Speech recognition for dummies”– a quick development of speech recognition as string
matching
• Lecture 2: “Feature computation”– Explaining how features are computed for speech
recognition, including all signal processing
• Lecture 3: “Hidden Markov Models”– Describing HMMs and all associated problems
7
Lecture Outline: Day 2
• Lecture 1: “Training From Continuous Speech”– How to train models from continuous speech– Phonemes, why we need them and how to train them
• Lecture 2: “Context dependent phonemes”– What are context dependent phonemes– Various types of context dependent phonemes– Training CD phonemes
• Lecture 3: “Decision Trees and State Tying”– All about decision trees for parameter sharing in ASR systems
8
Lecture Outline: Day 3
• Lecture 1: “Training context-dependent models with tied states”– A (relatively) short lecture explaining the final overall process for
training models
• Lecture 2: “Language Modelling”– How to model “language” for speech recognition– Statistical language modelling
• Lecture 3: “Decoding: Basics”– Describing the basic ideas behind the decoding strategies for
continuous speech
9
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”– Explaining various more advanced approaches to decoding
• Arriving at the state of art
• Lecture 2: “Advanced Topics”– Adaptation, Normalization, Discriminative Training etc.
• Session 3: Open.– Any spillover– Question Answering
10
Exercises: Day 1
• There will be exercises following most lectures
• Lecture 1: None
• Lecture 2: Exercise on capture and feature computation from speech signals
• Lecture 3: None
11
Exercises: Day 2
• Lecture 1: “Training From Continuous Speech”– Exercise on training phoneme models and
recognizing with them
• Lecture 2: “Context dependent phonemes”– Exercise on training models for context-dependent
phonemes and recognizing with them
• Lecture 3: “Decision Trees and State Tying”– Exercise on learning decision trees
12
Exercises: Day 3
• Lecture 1: “Training context-dependent models with tied states”– Exercise on complete training of the ASR system
• Lecture 2: “Language Modelling”– Exercises on building JSGF grammars and Ngram
LMs for speech recognition
• Lecture 3: “Decoding: Basics”
13
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”– Decoding with various speech recognition system
variants:• Sphinx3 flat, Sphinx3 tree, Sphinx4
• Lecture 2: “Advanced Topics”– No exercises
• Session 3: Open.– No exercises
14
Software to Install
• We will be using the CMU sphinx extensively– Sphinxtrain– Sphinx3 decoder– Sphinx4 decoder– CMU LM Toolkit or SRI LM Toolkit
• We will need additional software to go with it– Java, ant, groovy for S4
15
Sphinx Downloads: http://cmusphinx.sourceforge.net
16
• Sphinxbase: – Click on the “sphinxbase” link on the left
– Click “all releases”
– Download version 0.4.1• http://downloads.sourceforge.net/cmusphinx/sphinxbase-0.4.1.tar.bz2?use_
mirror=superb-east
• Sphinx3: – Click on “sphinx3” link on left
– Click on “all releases”
– Download version 3-0.8• http://downloads.sourceforge.net/cmusphinx/sphinx3-0.8.zip?
use_mirror=internap
Sphinx Downloads: http://cmusphinx.sourceforge.net
17
• Cepview: – Click on the “cepview” link on the left
• lm3g2dmp: – Click on “lm3g2dmp” link on left
• The above two are visualization / data-structure optimization tools and are not critical– But they are small, so you might as well download them
• CMULM toolkit: You may install SRI LM toolkit instead– Better maintained – CMU toolkit is not currently maintained
Sphinx Downloads: http://cmusphinx.sourceforge.net
18
• Sphinx4: – For this workshop download a copy of sphinx that is under development
at github.com– http://github.com/juanzanos/sphinx4/tree/master
• Click on download link– Caveat: some scripts may not run; if so we will revert to release version
• Sphinx4 will also need– Java JDK 1.6 -- from http://javasoft.com– Apache ant -- from http://ant.apache.org– A useful scripting tool (some of our latest scripts are in it): Groovy– Groovy can be had from http://groovy.codehaus.org
• Bookmark this link:– http://cmusphinx.sourceforge.net/sphinx4/doc/
UsingSphinxTrainModels.html
Sphinx Downloads: http://cmusphinx.sourceforge.net
19
Operating Systems
• Sphinxbase and Sphinx3 packages have been tried and tested on linux– We are not windows people
• Suggestion: Prefer linux-based machines– You may also try to run these programs on cygwin under
windows• Sphinx* should compile under cygwin
• Install “tcsh” under cygwin
• We will provide tcsh scripts
• Sphinx4 is platform independent
20
Additional Packages
• Would be useful to have a visualization tool– Need to visualize matrices as surfaces
• Matlab would be great
• If you don’t have matlab, download octave– http://www.gnu.org/software/octave/
21
Data
• You may use any data you wish to
• For exercise we will attempt to provide a small amount of data– As much as can be dealt with on your
computers
22
Questions
• ?