Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF.
-
Upload
lora-cannon -
Category
Documents
-
view
217 -
download
0
Transcript of Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF.
Speaker Identification and Verification
Dan Burnett, Nuance
58th IETF
Terminology
• Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers
• Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim)
• Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.
draft-burnett-mrcpext-00.txt
• Created by Nuance and Intervoice• Proposes extensions to MRCP v1
(draft-shanmugham-mrcp-04.txt)• Based originally on Nuance functionality,
modified to be more general• Starting point for MRCP v2 functionality
discussions• Also extensions for speaker-enrolled grammars,
hotword recognition, and to the recognition resource
Proposed SI/SV process(simplified, see section 6.7)
VER-START-SESSIONVER-BUFFERING-START
VER-SET-VOICEPRINT
VER-END-SESSION
VER-DELETE-VOICEPRINT
VER-ROLLBACK
GET-PARAMS
SET-PARAMS
VERIFY
VER-FROM-BUFFER*
VER-BUFFERING-STOP
VER-BUFFERING-CONTROL
VER-FROM-BUFFER*
* Requires active buffering and ver/id sessions.
Discussion points
• Why buffering?
• Registry for return info
• Anything else before I convert to MRCPv2?
Voice/Text Grammar Enrollment(simplified, see section 5.5)
• Extension to existing recognition resource
• Creates speaker-produced grammar entries
• E.g., voice-enrolled entries for voice dialing
• Both speech and text can be used to create grammar entries
START-ENROLLMENT-SESSION
END/ABORT-ENROLLMENT-SESSION
PAUSE/RESUME-ENROLLMENT-SESSION
ENROLLMENT-ROLLBACK
RECOGNIZE/STOP*
ADD/DELETE/MODIFY-PHRASE
* These methods already exist in the recognizer resource
Hotword(see section 7)
• New recognition resource
• Instead of listening for a set time period, listens continuously until it matches a grammar
• Non-matching speech is ignored and does not affect the state of the recognizer
Other Extensions
• Record method (sec. 4.4)– Allows end-pointed recording of an audio
stream
• Interpret method (sec. 4.5)– Behaves as a recognition except that text input
is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.