Automatic Switchboard Operator Luboš Šmídl, Tomáš Valenta Department of Cybernetics Faculty of...
-
Upload
vanessa-demers -
Category
Documents
-
view
218 -
download
2
Transcript of Automatic Switchboard Operator Luboš Šmídl, Tomáš Valenta Department of Cybernetics Faculty of...
Automatic Switchboard Operator
Luboš Šmídl, Tomáš Valenta
Department of Cybernetics
Faculty of Applied Sciences
University of West Bohemia in Pilsen
ContentsA dialogue systemThe dialogueAutomatic speech recognition and speech grammarThe dataAdvanced featuresMaintenance-free runningExperiences and futureInteresting factsOther applications
PurposeAutomatic Switchboard Operator is a voice application
whose purpose is to answer phone calls and transfer callers to requested persons. The caller makes input preferably by voice and the system informs him by voice as well.
A voice dialogue systemWhole UWB in PilsenFirst such a large application of its kind in CZ
A dialogue system
Data
Document serverDilague controller
Scripting engineASR TTS
Telephoneback-end
Telephonenetwork
Caller
Data: MySQL, Oracle, …Document server: PHPDialogue controller: VoiceXML InterpreterSpeech engines: ERIS by SpeechTech and Dpt. of
CyberneticsTelephony: SIP or ISDN
The dialogueExperienced vs. newbies
Shortcuts Call n-th number
Called person specificationFirst name and surnameTitles and degreesDepartment and
functionVoice or DTMF input
Smith → 76484#
Automatic Speech RecognitionMethods
LVCSR Isolated wordsGrammars
person = (
[(salut function) | (function salut) | salut | function]
[degrees] (([firstname] surname) | (surname firstname))
[degrees]
[function | department]
) | function;
Speech grammar complexityProf. Ing. Josef Psutka, CSc., boss of DCy
1. Josef Psutka2. Engineer Psutka3. Boss of the Department of Cybernetics4. Mister Psutka, professor5. Professor Psutka, the Department of Cybernetics6. Psutka Josef7. etc.
26,042 acceptable utterances
The DataVisual data vs. Aural data
Prof. Ing. Psutkaprofessor engineer psutka
Generating pronunciationsRules-based, for TTS vs. for ASRTomáš
Tomáš, Thomas, Tom
Fields taggingBetter grammar matching, faster DB searchJ(firstname) P(surname) D(department) F(function) T(degree)
Advanced featuresWeb presentationAdministration
Rules for pronunciationsShortcuts or Direct numbersCallers’ rights
Phonebook searchingMonitoring Statistics
Maintenance-free running Windows services, daemons Task scheduler
1. Import data2. Generate pronunciations3. Generate and compile grammar4. Optional sanitary restart
ExperiencesRunning since 2008Extended grammar accepting
Hello, Please, Thank you I would like to talk to
Optimizing promptsApplication made generalFuture
Using statistics for person/number selectionMore info about employeesMore features and speed for experienced usersNew technologies: better TTS and ASR
Interesting numbers
2,095 persons
2,322 telephones
35,566,194 utterances
2.5 hours – grammar compilation time
Other dialogue applications Entrance exams
Since June 2000 3,000–5,000 calls a year
Exams Web access alternative
Recent news reading RSS from www.idnes.cz Categories: general, sport, economics, …
ASR demo Users can test ASR capabilities Web interface, users log in, own set of utterances
and others…
Thank you for attention
Do you have any questions?
VoiceXML Mark-up language based upon XML Main advantage
Minimizes client/server communication (more interactions in a document) Hides low-lever implementation details from the programmer Enables better portability Designed for content providers, dialogue designers Separates user interface (VoiceXML) from program logic Easy for both simple and complex applications
VoiceXML Interpreter (like web browser) Document getter Document interpreter (dialogue controller) I/O interface – speech engine: telephony, ASR and TTS units
Two kinds of dialogue: forms and menus