Transcript of Sslis
- 1. By: Khalid El-Darymli G0327887 S peech to S ign L anguage I
nterpreter S ystem ( SSLIS ) Supervisor: Dr. Othman O. Khalifa
International Islamic University Malaysia Kulliyyah of Engineering,
ECE Dept.
- 2. OUTLINE
- Research goal and objectives,
- Main parts of the SSLIS ,
-
- General Structure: AM, Dictionary, LM,
-
- and Decoding: the Viterbi beam search.
- - ASL, ASL alphabets & Signed English.
- Structure & flow of SSLIS,
- Parameter tuning & accuracy measurements,
- Conclusions, Shortcomings & Further work.
- 3. Problem Statement
- There is no free software, let alone one with a reasonable
price, to convert speech into sign language in live mode.
- There is only one software commercially available to convert
uttered speech in live mode to a video sign language
- This software is called iCommunicator and in order to purchase
it deaf person has to pay USD 6,499!
! IS IT FAIR ?
- 4. RESEARCH GOAL AND OBJECTIVES
- Design and Manipulation of Speech to Sign Language Interpreter
System .
- The SW is open source and freely available which in turn will
benefit the deaf community.
- To fill the gap between deaf and nondeaf people in two senses.
Firstly, by using this SW for educational purposes for deaf people
and secondly, by facilitating the communication between deaf and
nondeaf people.
- To increase independence and self-confidence of the deaf
person.
- To increase opportunities for advancement and success in
education, employment, personal relationships, and public access
venues.
- To improve quality of life.
- 5. Main Parts of the SSLIS Speech-Recognition Engine Sign
Language Database Recognized Text ASL Translation Continuous Input
Speech Recognized Text
- 6. Automatic Speech Recognition ( ASR ):
- SR systems are clustered according to three categories:
Isolated vs. continuous , speaker dependent vs. speaker independent
and small vs. large vocabulary .
- The expected task of our software entails using a large
vocabulary , speaker independent and continuous speech
recognizer.
SR Engine Recognized Text Input Voice
- 7. Sphinx 3.5
- It is originally started at CMU and then it has been released
as open source SW.
- It is still in development but already includes trainers,
recognizers, AMs, LMs and some limited documentation.
- It works best on continuous speech and large vocabulary.
- It does not provide any interface in order to make the
integration of all components easier.
- In other words, it is a collection of tools and resources that
enables developers/researchers to build successful speech
recognizers.
- 8. The Structure of SR Engine (LVCSR) Signal Processing AM P (
A 1 , , A T | P 1 , , P k ) Dictionary P ( P 1 , P 2 , , P k | W )
LM P ( W n | W 1 , , W n-1 ) X={x 1 ,x 2 , , x T } Hypothesis
Evaluation Decoder P(X | W)*P(W) TRAINING DECODING Best Hypotheses
H = {W 1 , W 2 , , W k } W BEST Input Audio
- 9. SR ENGINE SPECS
0.97 Pre-Emphasis 6855.4976 Hz Higher Filter Frequency ( f h )
133.33334 Hz Lower Filter Frequency ( f l ) 512 DFT Size 40 Number
of Mel Filters 13 Number of Cepstra Mel FIlterbank Filterbank Type
0.025625 sec Window Length 100 frames/sec Frame Rate 16000.0 Hz
Sampling Rate Default Value Parameter
- 10. KNOWLEDGE BASE
- It was trained using the MFC vectors derived from 140 hours of
1996 and 1997 hub4 training data.
- Each vector is thus 39-dimensional.
- acoustic model is 3-state within-word and cross-word triphone
HMMs with no skips permitted between states.
- It is continuous and comprised of 6000 senones with 8 Gaussians
per state.
Acoustic Model
- 11. DICTIONARY
- We are using the CMU dictionary (v. 0.6).
- It is a machine-readable pronunciation dictionary for North
American English that contains over 125,000 words and their
transcriptions.
- It has mappings from words to their pronunciations in the given
phoneme set which comprised of 39 phonemes.
- 12. LM
- It was taken from CMU open source resources.
- It is a trigram model, which has been built for tasks similar
to broadcast news.
- The vocabulary covers 64000 words
- 13. SIGN LANGUAGE
- Sign Language is acommunicationsystem using gestures that are
interpreted visually.
- As a whole, sign languages share the same modality , a sign,
but they differ from country to country.
- 14. AMERICAN SIGN LANGUAGE ( ASL )
- ASL is the dominant sign language in the US, anglophone Canada
and parts of Mexico.
- Currently, approximately 450,000 deaf people in the United
States use ASL as their primary language
- ASL signsfollowa certain order, just as words do in spoken
English. However, in ASL one sign can express meaning that would
necessitate the use of several words in speech.
- The grammar of ASL uses spatial locations, motion, and context
to indicate syntax.
- 15. ASL ALPHABETS
- It is a manual alphabet representing all the letters of the
English alphabet, using only the hands.
- Making words using a manual alphabet is called fingerspelling
.
- Manual alphabets are a part of sign languages
- For ASL, the one-handed manual alphabet is used.
- Fingerspelling is used to complement the vocabulary of ASL when
spelling individual letters of a word is the preferred or only
option, such as with proper names or the titles of works.
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv
Ww Xx Yy Zz
- 16. SIGNED ENGLISH ( SE )
- SE is a reasonable manual parallel to English.
- The idea behind SE and other signing systems parallel to
English is that deaf people will learn English better if they are
exposed, visually through signs, to the grammatical features of
English.
- SE uses two kinds of gestures: sign words and sign markers
.
- Each sign word stands for a separate entry in a Standard
English dictionary.
- The sign words are signed in the same order as words appear in
an English sentence. Sign words are presented in singular, non-past
form.
- Sign markers are added to these basic signs to show, for
example, that you are talking about more than one thing or that
some thing has happened in the past.
- When this does not represent the word in mind, the manual
alphabet can be used to fingerspell the word.
- Most of signs in SE are taken from the American Sign Language.
But these signs are now used in the same order as English words and
with the same meaning.
- 17. ASL vs. SE (an Example) It is alright if you have a lot ASL
Translation SE Translation IT I S ALL RIGHT IF YOU HAVE A LOT
- 18. DEMONSTRATION OF THE ASL IN OUR SW A number of 2,600 ASL
prerecorded video clips In case of nonbasic word, extract the basic
word out of it Recognized Word (SR engines output) Is the basic
word within the ASL database vocabulary? The American Manual
Alphabet Only in case of a nonbasic input word, append some
suitable marker Final Output None of the database contents matched
the input basic word No Yes Fingerspelling of the original input
word The equivalent ASL video clip of the input word, some marker
could be appended
- 19. STRUCRURE AND FLOW OF SSLIS
- Flowchart of the main program
Program Start
- Irregular Past Participle Verbs
Has Exit been clicked?
- Select program to execute
Is Program selected Run selected program Is selected program Live
Decode? Enable button to stop Live Decode Has stop button been
clicked? Stop Program Live Decode Show Program Output Wait for
Event from Class Is Live Decode running? Stop Live Decode End
Program Initialize Class Continue next slide NO YES NO YES YES NO
NO YES YES NO
- 20.
- Flowchart of the class procedure
Event raised Word lattice entry received Word lattice ending
received Word lattice total received Class Class running in the
background Wait for INFO event Call program function AddWordLattice
Add word lattice entry to an appropriate table Display Speech to
Text output Call program function AddWordLattice Add word lattice
entry to an appropriate table Live Decode starting received Word
hypothesis entry received Total hypotheses entry received Total
Frames entry received Call program function AddWordHypothesis Add
word hypothesis entry to an appropriate table Call program function
AddTotalHypothesis Add Total Hypotheses entry to an appropriate
table Display Total Frames entry in appropriate position NO NO
Display msg box to user to start decoding Live speech Press ENTER
to start recording YES NO YES NO YES YES NO NO YES YES YES YES NO
NO NO
- 21. PARAMETER TUNNING & ACCURACY MEASUREMENTS
- -beam : Determines which HMMs remain active at any given point
(frame) during recognition. (Based on the best state score within
each HMM.)
- -pbeam : Determines which active HMM can transition to its
successor in the lexical tree at any point. (Based on the exit
state score of the source HMM.)
- -wbeam : Determines which words are recognized at any frame
during decoding. (Based on the exit state scores of leaf HMMs in
the lexical trees.)
TUNING THE PRUNING BEHAVIOUR:
- 22.
- 23. TUNING LM RELATED PARAMETERS :
- -lw : The language weight.
- -wip : The word insertion penalty.
- 24. SSLIS CAPABILITIES
- Real time speech to text to video sign language.
- Text to video sign language.
- Automatic WER calculation.
- Text to computer generated voice with synchronized lips.
- Speed control of ASL movies in play.
- Minimize to Auto allows drag and drop from any text editor to
be signed.
- Demonstration of SE manual as parallel to English.
- Demonstration of decoding process of speech.
- Live Decode Program allows real time speech recognition while
Live Pretend and Decode allows speech recognition in batch
mode.
- 25. Conclusions
- The research aim of offering freely available and open source
SSLIS is fulfilled.
- Sphinx 3.5 was manipulated as the SR engine.
- SE manual was followed for translation.
- 26. Shortcomings &Further Work
- Degradation in the speech recognition accuracy.
- Using a poor quality microphone would highly degrade the
recognition accuracy of our system.
- Virtual memory constraints.
- 27. References
- [1] Harrington, T. (July, 2004). Statistics: Deaf Population of
the United States Retrieved May 2, 2005.
http://library.gallaudet.edu/dr/faq-statistics-deaf-us.html
- [2] Rabiner, L. R. (Feb 1994). Applications of Voice Processing
to Telecommunications . Proceedings of the IEEE, Vol. 82, No. 2,
pp. 199-228.
- [3] Seltzer, M. (1999). Sphinx III Signal Processing Front End
Specifications . CMU Speech Group. Retreived May 2, 2005.
www.cs.cmu.edu/~mseltzer/sphinxman/s3_fe_spec.pdf
- [4] Rabiner, L., & Juang, B-H. (1993). Fundamentals of
Speech Recognition . New Jersey: Prentice Hall international.
- [5] Becchetti, C., & Ricotti, L. R. (1999). Speech
Recognition Theory and C++ Implementation . England: Wiley.
- [6] Huang, X., Acero, A., Hon, H-W., & Reddy, R. (2001).
Spoken Language processing, a Guide to Theory, Algorithm and System
Development . Prentice Hall PTR
- [7] Hwang, Mei-Yuh. (1993). Subphonetic Acoustic Modeling for
Speaker-Independent Continuous Speech Recognition . Ph.D. thesis,
Computer Science Department, Carnegie Mellon University. Tech
Report No. CMU-CS-93-230
- [8] Jelinek, F. (Apr. 1976). Continuous Speech Recognition by
Statistical Methods . Proceedings of the IEEE, Vol. 64, No. 4. pp.
532-556.
- [9] Baker, J.K. (1975). The DRAGON System-An Overview. IEEE
Transactions on Acoustics, Speech, and Signal Processing,
ASSP-23(1). pp.24-29.
- [10] Lin, E. (May 2003). A First Generation Hardware Reference
Model for a Speech Recognition Engine . Master Thesis, Computer
Science Department, Carnegie Mellon University.
- 28.
- [11] Ravishankar, M. (May 1996) Efficient Algorithms for Speech
Recognition . Ph.D. dissertation, Carnegie Mellon University. Tech
Report. No. CMU-CS-96 143.
- [12] Ravishankar, M. K. (2004). Sphinx-3 s3.X Decoder (X=5).
Sphinx Speech Group. School of Computer Science, Carnegie Mellon
University. Retrieved May 2, 2005.
http://cmusphinx.sourceforge.net/sphinx3/
- [13] Rosenfeld, R. The CMU Statistical Language Modeling (SLM)
Toolkit . Retrieved May 2, 2005.
http://www.speech.cs.cmu.edu/SLM_info.html
- [14] Gouva, E. The CMU Sphinx Group Open Source Speech
Recognition Engines . Retrieved May 2, 2005.
http://www.speech.cs.cmu.edu/sphinx/
- [15] N ational I nstitute of S tandards and T echnology.
Retrieved May 2, 2005. http://www.nist.gov/
- [16] Wilcox, S. (2005). Sign Language . The Microsoft Encarta
Reference Library.
- [17] Personal Communicator . [CD-ROM]. version 2.4. (2001) .
Michigan: US. Communication Technology Laboratory, Michigan State
University.
- [18] American Sign Language Video Dictionary and Inflection
Guide . (2000). [CD-ROM]. New York: US. National Technical
Institute for the Deaf, Rochester Institute of technology. ISBN:
0-9720942-0-2.
- [19] ASL University. Finferspelling: Introduction . Retrieved
May 2, 2005.
http://www.lifeprint.com/asl101/fingerspelling/fingerspelling.htm
- [20] Bornstein, H., Saulnier, K.L. & Hamilton, L.B. (1992).
The Comprehensive Signed English Dictionary (Sixth printing). USA:
Washington DC, The Signed English series. Clerc Books, Gallaudet
University Pres.
- 29. Thank You