Speech Recognition Application
-
Upload
nell-savage -
Category
Documents
-
view
53 -
download
0
description
Transcript of Speech Recognition Application
![Page 1: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/1.jpg)
Speech Recognition Speech Recognition ApplicationApplication
Voice Enabled Phone Directory
- Yousef Rabah
يوسف رباح -
![Page 2: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/2.jpg)
Why Speech Enabled Phone Why Speech Enabled Phone DirectoryDirectory
Growing Technology Easy AccessMainly used for:
– Educational purposes– People with certain Disabilities– Mobile use
![Page 3: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/3.jpg)
ProblemProblem
Automatic speech interacting phone directory assistance
![Page 4: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/4.jpg)
Automatic Speech Recognition - SphinxAutomatic Speech Recognition - Sphinx Speaker Dependent vs.
Independent Acoustic modeling Isolated vs. Continuous HMM – Probabilities,
Parameters, Training Language Model
– Unigrams: <s> & </s>– Bigrams: P(word2 | word1)
Phonemes Lexicon Structure
– ZERO Z IH R OW– TWO T UW
– H A HEIGH H
![Page 5: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/5.jpg)
Input / Output Input / Output 24003 samples in file
/usr/local/share/sphinx3/model/lm/an4/hell.rawINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil>INFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2)INFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTHINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> HINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH
Backtrace (null)
LatID SFrm EFrm AScr LScr Type
254 0 45 -391470 -74100 -1<sil>
594 46 81 -472155 -148846 0 H
1291 82 102 -288621 -148846 0 E
1850 103 126 -235274 -148846 0 L
2599 127 147 -430694 -148846 0 L
2650 148 148 0 -148846 0 </s>
0 148 -1818214 -818330 (Total)
FWDVIT: H E L L (null)
![Page 6: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/6.jpg)
DifficultiesDifficulties
Hardware issuesASR software issuesLetter phonemesTime
![Page 7: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/7.jpg)
SolutionSolution
4 Stage Process :
![Page 8: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/8.jpg)
SolutionSolution
Database (PostgreSQL)
– Names– Phone numbers– Fast access
![Page 9: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/9.jpg)
SolutionSolution
Architecture of application– db.pm– people.pm– people.pl– record.pl– wav_to_raw.pl– get_speech.pl– display_speech.pm– display_speech.pl– VEPD.pm– VEPD.pl
Example:…
PC: press space bar before and after you speak:
User: S AH EM
PC: Decoded as, SAM ?
Results | 1
1. SAM |SMITH | 765-973-2145
…
![Page 10: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/10.jpg)
SolutionSolution
![Page 11: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/11.jpg)
ResultsResults
A first step towards hands free speech enabled phone directory
Speaker Independent Application’s Features:
- Adding user- Retrieving user (via speech)- Manual search- Viewing current phone directory
![Page 12: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/12.jpg)
Possible Future EnhancementPossible Future Enhancement
ASR enabled for :– Adding users– Phone # search– Word Recognition (instead of letters)
More accurate ASR (as tech. Grows)Graphical outlook (via perl/tk)Communication through VoiceXML
![Page 13: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/13.jpg)
Special ThanksSpecial Thanks
To friends and family– Jim Rogers – Hassan Halta– Skylar Thompson– Kushboo Goel– Rabah family – El-Shabab el-taybeh
![Page 14: Speech Recognition Application](https://reader036.fdocuments.net/reader036/viewer/2022082408/56812c4c550346895d90d313/html5/thumbnails/14.jpg)
Questions/CommentsQuestions/Comments