Dynamic Match Lattice Spotting
description
Transcript of Dynamic Match Lattice Spotting
![Page 1: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/1.jpg)
1
Dynamic Match Lattice Spotting
Spoken Term Detection Evaluation
Queensland University of Technology
Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan
Presented by Roy Wallace
![Page 2: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/2.jpg)
2
Overview
• Phonetic-based index open-vocabulary
• Based on lattice-spotting technique
• Two-tier database
• Dynamic-match rules
• Algorithmic optimisations
NOTE: Patented technology
![Page 3: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/3.jpg)
3
Conceptgreasy
?
Phone decomposition
……………
aenxmdow
nxrnayth
iysaxrg
g r iy s iy
![Page 4: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/4.jpg)
4
Concept
Target sequence:
Observed sequences:
Costs
g r ax s ih
th ay n r nx
ow d m nx ae
… … … … …
Dynamic matching
ax ih
g r iy s iy
![Page 5: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/5.jpg)
5
Indexing
FeatureExtraction
Segmentation
SpeechRecognition
SequenceGeneration
Lattices
SequenceDB
Hyper-SequenceGeneration
Hyper-Sequence
DB
Audio
![Page 6: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/6.jpg)
6
Hyper-sequence Mapping
• Map individual phones to “parent” classes
– We use Vowels, Fricatives, Glides, Stops and Nasals
• Simple example
– Parent classes: Vowels, Consonants
– Map each phone to parent class to create hyper-sequence
Cc
Vv
i
i
,...,, 321
SequenceDB
Hyper-Sequence
DB
![Page 7: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/7.jpg)
7
Hyper-sequence Mapping
Hyper-sequence DB
Search term:
Hyper-sequence:
g r oy s ih
t l ow p iy
nx s eh r ay
d r ax b ae
b f ax d aa
oy b r aa f
eh g r iy m
… … … … …
Sequence DB
C C V C V
V C C V CC V C V C… … … … …… … … … …… … … … …
g r iy s iy
C C V C V
![Page 8: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/8.jpg)
8
SearchingTerm
SequenceDB
Hyper-Sequence
DB
ResultsDynamic Matching
KeywordVerification
Hyper-mapping
Phone decomp.
Split longterms
Mergelong terms
![Page 9: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/9.jpg)
9
Dynamic Matching
• Minimum Edit Distance (MED)
• i.e. Levenshtein Distance
• Insertions, deletions, substitutions
• Finds minimum cost of transformation
![Page 10: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/10.jpg)
10
Dynamic Matching
• Substitution costs
– Derived from phone confusion statistics
t transcripreference in the was phone
recogniserby emitted was phone
yR
xE
y
x
xy
xy
s
ERp
ERI
yxyxC
|log
|
phone with phone ngsubstituti ofcost ),(
![Page 11: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/11.jpg)
11
Optimisations
• Prefix sequence optimisation
• Early stopping optimisation
• Linearised MED search approximation
![Page 12: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/12.jpg)
12
Long Term Mergingolympic sites
ow l ih m p ih k s ay t s
ow l ih m p ih k p ih k s ay t s
Search Search
Merge
Results
![Page 13: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/13.jpg)
13
Keyword Verification
• Acoustic
– Use acoustic score from lattice to boost occurrences with high confidence
• Neural Network
– Produce a confidence score by fusing
• MED score and Acoustic score
• Term phone length
• Term phone classes
![Page 14: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/14.jpg)
14
Results
Source TypeDevSet phone
error rate
Primary system
Contrastive systems
No Acous. LTS Only
Bnews 24% 0.246 0.245 0.208
CTS 45% 0.104 0.102 0.080
Confmtg 56% 0.021 0.019 0.016
Index size 558 MB/Sh (297 MB/Sh for No Acous.)
Index speed 18x real-time
Search speed 3 hr searched / CPU-sec
Maximum Term-Weighted Value on EvalSet terms
![Page 15: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/15.jpg)
15
Conclusion
• Open-vocabulary and phone-based
• Patented technology utilises
– sequence and hyper-sequence databases
– optimisations for rapid searches
• Advantages
– Other languages
– Economy of scale
![Page 16: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/16.jpg)
16
Conclusion
• Limitations
– Indexing speed and size
– Need to split long sequences
• Future work
– Keyword Verification
• Word-level information (e.g. LVCSR)
• Acoustic features (e.g. prosody)
– Indexing/searching frameworks
– Spoken Document Retrieval and other semantic applications
![Page 17: Dynamic Match Lattice Spotting](https://reader037.fdocuments.net/reader037/viewer/2022102819/56815a95550346895dc813d0/html5/thumbnails/17.jpg)
17
References1. A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with
applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005
2. K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication
3. CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
4. S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc.
5. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.