SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf ·...
Transcript of SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf ·...
![Page 1: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/1.jpg)
SNR-‐Aware PLDA Modeling for Robust Speaker Verifica?on
Department of Electronic and Informa?on Engineering The Hong Kong Polytechnic University
廣東順德中山大學-‐卡內基梅隆大學國際聯合研究院(SYSU-‐CMU-‐Joint Research Ins?tute)
28 Dec. 2015
Man-Wai MAK [email protected]
http://www.eie.polyu.edu.hk/~mwmak
http://www.eie.polyu.edu.hk/~mwmak/papers/SYSU-CMU-2015.pdf
![Page 2: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/2.jpg)
2
Contents
1. I-‐Vector/PLDA for Speaker Verifica?on 2. SNR-‐Aware PLDA Modeling
– SNR-‐Invariant PLDA – Mixture of PLDA
3. Experiments on SRE12
4. Conclusions
2
![Page 3: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/3.jpg)
3
I-‐Vectors for Speaker Verifica4on • State-‐of-‐the-‐art method for speaker verifica?on • Factor analysis model:
!µs =
!µ +Txs
• Instead of using the high-‐dimension to present the speaker s, we use the low-‐dimension (typically 500) i-‐vector xs to represent the speaker.
• T is es?mated by an EM algorithm using the u]erances of many speakers. T represents the subspace in which the i-‐vectors vary.
• Given T, es?mate xs for each target speaker and test u]erance xt
UBM supervector Low-‐rank total variability matrix
Speaker-‐dependent i-‐vector
(61440×500)
!µs
![Page 4: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/4.jpg)
4
I-‐Vectors for Speaker Verifica4on • Given an u]erance, we align its acous?c vectors against a UBM
to obtain the sufficient sta?s?cs:
• The i-‐vector of the u]erance is the posterior mean of the latent factor of the factor analysis model:
Alignment
UBM
i-vector of utterance i: hxi|Oi = L
�1i T
T(⌃(b))�1
f̃i
L
�1i = cov(xi,xi|O) =
⇣I+T
T⌃
(b)�1NiT
⌘�1
4
![Page 5: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/5.jpg)
5
I-‐Vectors for Speaker Verifica4on
Align ot with UBM
Ni =
ni,1I 0 ! 00 ni,2I 0 00 0 ! 00 0 " ni,MI
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
!fi =
!fi ,1!"fi ,M
!
"
####
$
%
&&&&
hxi|Oi = L
�1i T
T(⌃(b))�1
f̃i
L
�1i = cov(xi,xi|O) =
⇣I+T
T⌃
(b)�1NiT
⌘�1
5
![Page 6: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/6.jpg)
6
I-‐Vectors for Speaker Verifica4on
UBM
Training Data
Training Total Variability Matrix
I-‐Vector Extractor LDA+WCCN
U]erance from Target Speaker s
Test u]erance t
Scoring Method
Decision Maker Reject θ<
θ≥Accept
xs
xt
WTxs
WTxt
T
• Given an u]erance from speaker s and a total variability matrix T, we es?mate his/her i-‐Vector xs
• Because T defines the combined space describing both speaker variability and channel variability, we use LDA+WCCN to remove channel variability
![Page 7: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/7.jpg)
7
I-‐Vectors for Speaker Verifica4on
Before LDA (x) Ader LDA
Each point represents an u]erance. Each marker type represents a speaker.
WTx
7
![Page 8: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/8.jpg)
8
I-‐Vectors Scoring
SCD xs,xt( ) =WTxs,W
TxtWTxs W
Txt
• Given the i-‐vector of target speaker and the i-‐vector of a test u]erance, we compute the cosine-‐distance score:
• If the score is larger than a threshold θ, then we accept the speaker; otherwise we reject the speaker.
SCD(xs,xt )∈ [0,1]
8
![Page 9: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/9.jpg)
Probabilis4c LDA for SV • PLDA is based on a genera?ve model that uses pre-‐processed
i-‐vectors as input • It aims to model the speaker and channel variability in the i-‐
vector space • The method assumes that there is a speaker subspace V
within the i-‐vector space • The i-‐vector xs is wri]en as:
i-vector extracted from the utterance of
speaker s Global mean of all i-vectors Defining
Speaker subspace
Speaker factor
Residual noise with covariance Σ
xs =m+Vzs +εs
9
![Page 10: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/10.jpg)
10
Probabilis4c LDA for SV • Similarly, the i-‐vector xt from a test u]erance is wri]en as:
• Ini?a?vely, you may think of zs and zt are projected vectors on the speaker subspace defined by the eigenvectors in V.
• But unlike PCA, given an i-‐vector xt , there are infinite numbers of zt. So, we need to consider the joint density of xt and zt when compu?ng the likelihood of xt
xt =m+Vzt +εt
10
![Page 11: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/11.jpg)
11
PLDA Scoring
x t =m+Vz+ εt
x s =m+Vz+ εsxt =m+Vzt +εtxs =m+Vzs +εs
against
H0: Same speaker H1: Different speaker
11
![Page 12: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/12.jpg)
12
Conven4onal Noise Robust PLDA
• In conven?onal mul?-‐condi?on training, we pool i-‐vectors from various background noise levels to train m, V and Σ.
EM Algorithm {m,V,Σ}
I-vectors with 2 SNR ranges
![Page 13: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/13.jpg)
13
Conven4onal Noise Robust PLDA • Conven?onal i-‐vector/PLDA systems use a channel
space (with covariance ) to handle all SNR condi?ons.
I-‐Vector/PLDA Scoring
Enrollment Utterances
PLDA Scores
{m,V,Σ}
Σ
![Page 14: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/14.jpg)
14
Contents
1. I-‐Vector/PLDA for Speaker Verifica?on 2. SNR-‐Aware PLDA Modeling
– SNR-‐Invariant PLDA – Mixture of PLDA
3. Experiments on SRE12
4. Conclusions
![Page 15: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/15.jpg)
15
• We argue that the varia?on caused by SNR variability can be modeled by an SNR subspace and u]erances falling within a narrow SNR range should share the same SNR factor (Li & Mak, Interspeech15; Li & Mak, T-‐ASLP 15)
SNR Subspace
SNR Factor 2
Group1
Group2
Group3
SNR Factor 1
SNR Factor 3
SNR Invariant PLDA
![Page 16: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/16.jpg)
16
6 dB
• Method of modeling SNR informa?on
clean 15 dB
SNR Subspace
w6dB
wcln
w15dB
I-vector Space
i-vector
SNR Invariant PLDA
![Page 17: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/17.jpg)
17
SNR-‐invariant PLDA • PLDA:
• By adding an SNR factor to the conven?onal PLDA, we have SNR-‐invariant PLDA:
where U denotes the SNR subspace, is an SNR factor, and is the speaker (iden?ty) factor for speaker i.
• Note that it is not the same as PLDA with channel subspace R:
k kij i k ij= + + +x m Vh Uw ε
wk
ih
ij i ij= + +x m Vh ε
xij =m+Vhi +Rrij + εij
i: Speaker index j: Session index
k: SNR index
![Page 18: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/18.jpg)
18
SNR-‐invariant PLDA • We separate I-‐vectors into different groups
according to the SNR of their u]erances
k kij i k ij= + + +x m Vh Uw ε
EM Algorithm {m,V,U,Σ}
![Page 19: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/19.jpg)
19
Compared with Conven4onal PLDA
k kij i k ij= + + +x m Vh Uw ε
Conventional PLDA
ij i ij= + +x m Vh ε
SNR-Invariant PLDA
![Page 20: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/20.jpg)
20
PLDA vs SNR-‐invariant PLDA
PLDA SNR-‐invariant PLDA
Generative Model
ij i ij= + +x m Vh ε k kij i k ij= + + +x m Vh Uw ε
p(x) = N (x |m,VVT +Σ) ( ) ( | , )T Tp N= + +x x m VV UU Σ
{ }=θ m,V,Σ { }=θ m,V,U,Σ
![Page 21: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/21.jpg)
21
PLDA vs SNR-‐invariant PLDA
PLDA SNR-‐invariant PLDA
E-Step
1 11
| ( )iHTi i ijjX − −
== −∑h L V Σ x m
1| | | TTi i i i iX X X−= +h h L h h
![Page 22: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/22.jpg)
PLDA SNR-‐invariant PLDA
22
PLDA versus SNR-‐invariant PLDA M-Step
1( ) | |T Tij i i iij ij
X X−
⎡ ⎤ ⎡ ⎤= − ⎣ ⎦⎣ ⎦∑ ∑V x m h h h
( )( ) | ( )T Tij ij i ijij
ii
X
H
⎡ ⎤− − − −⎣ ⎦=∑
∑x m x m V h x m
Σ
![Page 23: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/23.jpg)
SNR-‐invariant PLDA Score
23
![Page 24: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/24.jpg)
24
Contents
1. I-‐Vector/PLDA for Speaker Verifica?on 2. SNR-‐Aware PLDA Modeling
– SNR-‐Invariant PLDA – Mixture of PLDA
3. Experiments on SRE12
4. Conclusions
![Page 25: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/25.jpg)
25
Mixture of PLDA (mPLDA) • Conven?onal i-‐vector/PLDA systems use a single PLDA
model to handle all SNR condi?ons.
PLDA Model
Enrollment i-vectors
PLDA Scores
{m,V,Σ}
![Page 26: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/26.jpg)
26
• We argue that a PLDA model should focus on a small range of SNR.
PLDA Model 1
PLDA Score
PLDA Model 2
PLDA Model 3
PLDA Score
PLDA Score
Mixture of PLDA (mPLDA)
![Page 27: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/27.jpg)
27
• The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the u]erance’s SNR (Mak, Interspeech14; Mak et al., T-‐ASLP 16)
PLDA Model 1
PLDA Score PLDA
Model 2
PLDA Model 3
SNR Es?mator
SN
R P
oste
rior E
stim
ator
M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-0142, Jan. 2016.
Mixture of PLDA (mPLDA)
![Page 28: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/28.jpg)
28
Mo4va4on of mPLDA • The idea of mPLDA is based on two hypotheses:
1. Different levels of background noise will cause the i-‐vectors to fall on different regions of the i-‐vector space
2. SNR variability nega?vely affects PLDA speaker recogni?on accuracy, but its effect can be mi?gated by explicitly modelling the SNR-‐dependent speaker subspaces through mixture of PLDA.
![Page 29: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/29.jpg)
29
Mo4va4on of mPLDA • To verify these two hypotheses, we corrupted 7,156 clean
telephone u]erances from 763 speakers with babble noise at 6dB and 15dB using the FaNT tool
• This results in 3 sets of i-‐vectors: clean, 15dB, and 6dB • Then, a GMM is constructed as shown below.
FaNT
FaNT
I-Vector Extraction
I-Vector Extraction
Compute mean & cov
Compute mean & cov
I-Vector Extraction
Compute mean & cov
Construct GMM
Clean speech
{1/3, ⌧k,�k}3k=1
6dB
15dB
⌧1,�1
⌧3,�3
![Page 30: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/30.jpg)
30
Mo4va4on of mPLDA • We used par??on coefficients (PC) and par??on entropy
coefficients (PE) to quan?fy the cluster separability of the three groups of i-‐vectors.
PC à 1 and PE à 0 mean that the clusters are well separated
![Page 31: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/31.jpg)
31
Mo4va4on of mPLDA • To verify the 2nd hypothesis, we perform speaker
iden?fica?on experiments under SNR-‐match and SNR-‐ mismatch condi?ons.
• There are 9 combina?ons of PLDA models and SNR groups, of which three are matched in training and test condi?ons and six are mismatched.
• The SID accuracy gradually decreases when the SNR of the training data progressively deviates from that of the test data.
![Page 32: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/32.jpg)
32
mPLDA: Model Parameters
2
For modeling SNR of utts.
For modeling SNR-dependent i-vectors
• Model Parameters:
![Page 33: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/33.jpg)
33
Graphical Model of mPLDA
For modeling SNR of utts.
For modeling SNR-dependent i-vectors
`ij : SNR of the j-th utterance from the i-th speaker
xij: i-vector of the j-th utterance from the i-th speaker
V ={Vk}k=1K
π ={πk}k=1K
![Page 34: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/34.jpg)
34
Graphical Model: PLDA vs. mPLDA
`ij : SNR of the j-th utterance from the i-th speaker
PLDA mPLDA
![Page 35: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/35.jpg)
35
Genera4ve Model for mPLDA
where the posterior prob of SNR is
Pos
terio
r of S
NR
: SNR in dB
![Page 36: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/36.jpg)
36
PLDA vs. mPLDA
PLDA Mixture of PLDA
Generative Model
![Page 37: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/37.jpg)
37
EM: PLDA vs. mPLDA Auxiliary Function
PLDA:
Mixture of PLDA:
Latent indicator variables:
SNR of training utterances:
Speaker indexes
Session indexes
No. of mixtures
Latent speaker factors:
![Page 38: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/38.jpg)
38
EM: PLDA vs. mPLDA
PLDA Mixture of PLDA
E-Step
![Page 39: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/39.jpg)
PLDA Mixture of PLDA
39
EM: PLDA vs. mPLDA M-Step
![Page 40: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/40.jpg)
40
Likelihood-‐Ra4o Scores of mPLDA • Same-‐speaker likelihood:
i-vectors of target and test speakers
SNR of target and test utterances
![Page 41: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/41.jpg)
41
Likelihood-‐Ra4o Scores of mPLDA • Different-‐speaker likelihood:
• Verifica?on Score = Same-speaker likelihood
Different-speaker likelihood
41 #For full derivation, see http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf
![Page 42: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/42.jpg)
Complexity Analysis
42
Dimension of i-vectors
![Page 43: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/43.jpg)
43
Types of mPLDA • The mixture of PLDA models can be of two types:
1. SNR-‐independent mPLDA (SI-‐mPLDA) 2. SNR-‐dependent mPLDA (SD-‐mPLDA)
![Page 44: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/44.jpg)
44
Types of mPLDA • SNR-‐independent mPLDA is the supervised version of Hinton’s mixture of factor analyzers, where the supervision comes from the speaker labels
• Equivalent to clustering in i-‐vector space with the subspaces Vk of clusters determined by PLDA
• No guidance from SNR informa?on.
![Page 45: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/45.jpg)
45
SI-‐mPLDA vs. SD-‐mPLDA
Mixture weights independent of the SNR of utterances.
p(x) =KX
k=1
⇢kN (x,VkVTk +⌃k)
• SNR-‐independent mPLDA:
• SNR-‐dependent mPLDA:
Posterior prob. of SNR obtained from a 1-D GMM
![Page 46: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/46.jpg)
46
Cluster Alignment in mPLDA
SNR-independent mPLDA SNR-dependent mPLDA
In SD-mPLDA, i-vectors that are aligned to the same mixture component have similar SNR
![Page 47: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/47.jpg)
47
SNR-‐dependent vs. SNR-‐independent
Performance on CC4 of NIST12 (male)
PLDA
SNR-indepedent mPLDA
SNR-dependent mPLDA
![Page 48: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/48.jpg)
48
Contents
1. I-‐Vector/PLDA for Speaker Verifica?on 2. SNR-‐Aware PLDA Modeling
– SNR-‐Invariant PLDA – Mixture of PLDA
3. Experiments on SRE12
4. Conclusions
![Page 49: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/49.jpg)
49
Data and Features • Evalua4on dataset: Common evalua?on condi?on 1 and 4 of
NIST SRE 2012 core set. • Parameteriza4on: 19 MFCCs together with energy plus their
1st and 2nd deriva?ves à 60-‐Dim • UBM: gender-‐dependent, 1024 mixtures • Total Variability Matrix: gender-‐dependent, 500 total factors • I-‐Vector Preprocessing:
Ø Whitening by WCCN then length normaliza?on Ø For SI-‐PLDA, followed by NFA (500-‐dim à 200-‐dim) + WCCN Ø For mPLDA, followed by LDA (500-‐dim à 200-‐dim) + WCCN
![Page 50: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/50.jpg)
50
Distribu4on of SNR in SRE12
Each SNR region is handled by a specific set of SNR factors
![Page 51: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/51.jpg)
51
Finding SNR Groups
Training Utterances
![Page 52: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/52.jpg)
SNR Distribu4ons • SNR Distribution of training and test utterances in CC4
52
Test Utterances
Training Utterances
![Page 53: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/53.jpg)
Performance on SRE12
Method Parameters Male Female
K Q EER(%) minDCF EER(%) minDCF
PLDA -‐ -‐ 5.42 0.371 7.53 0.531
SDmPLDA -‐ -‐ 5.28 0.415 7.70 0.539
SNR-‐Invariant PLDA
3 40 5.42 0.382 6.93 0.528
5 40 5.28 0.381 6.89 0.522
6 40 5.29 0.388 6.90 0.536
8 30 5.56 0.384 7.05 0.545
No. of SNR Groups
No. of SNR factors (dim of ) wk 53
CC1
![Page 54: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/54.jpg)
Performance on SRE12
Method Parameters
Male Female
K Q EER(%) minDCF EER(%) minDCF
PLDA -‐ -‐ 2.40 0.332 2.19 0.335
SNR-‐dependent mPLDA
-‐ -‐ 2.47 0.283 2.07 0.328
SNR-‐Invariant PLDA
3 40 1.96 0.277 1.74 0.290
6 40 1.99 0.278 1.72 0.290
No. of SNR Groups
No. of SNR factors (dim of ) wk
54
CC2
![Page 55: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/55.jpg)
Performance on SRE12
Method Parameters Male Female
K Q EER(%) minDCF EER(%) minDCF
PLDA -‐ -‐ 3.13 0.312 2.82 0.341
SD-‐mPLDA -‐ -‐ 2.88 0.329 2.71 0.332
SNR-‐Invariant PLDA
3 40 2.72 0.289 2.36 0.314
5 40 2.67 0.291 2.38 0.322
6 40 2.63 0.287 2.43 0.319
8 30 2.70 0.292 2.29 0.313
No. of SNR Groups
55
No. of SNR factors (dim of ) wk
CC4
![Page 56: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/56.jpg)
Performance on SRE12
Method Parameters
Male Female
K Q EER(%) minDCF EER(%) minDCF
PLDA -‐ -‐ 2.86 0.286 2.47 0.343
SNR-‐dependent mPLDA
-‐ -‐ 2.86 0.295 2.59 0.332
SNR-‐Invariant PLDA
3 40 2.47 0.273 2.07 0.294
6 40 2.48 0.275 2.04 0.294
No. of SNR Groups
No. of SNR factors (dim of ) wk
56
CC5
![Page 57: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/57.jpg)
Performance on SRE12
CC4, Female
Conventional PLDA
SNR-Invariant PLDA
57
![Page 58: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/58.jpg)
Conclusions
• We show that while I-‐vectors of different SNR fall on different regions of the I-‐vector space, they vary within a single cluster in an SNR-‐subspace.
• Therefore, it is possible to model the SNR variability by adding an SNR loading matrix and SNR factors to the conven?onal PLDA model.
• We also show that I-‐vectors derived from u]erances of different SNR live in different speaker subspaces.
• Therefore, it is possible to model SNR variability by mixture of SNR-‐dependent PLDA
58
![Page 59: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/59.jpg)
Bibliography 1. M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-‐Vector Speaker Verifica?on",
IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-‐0142, Jan. 2016.
2. Na Li and M.W. Mak, "SNR-‐Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verifica?on", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 23, no. 10, pp. 1648-‐1659, Oct. 2015.
3. W. Rao and M.W. Mak, "Boos?ng the Performance of I-‐Vector Based Speaker Verifica?on via U]erance Par??oning", IEEE Trans. on Audio, Speech and Language Processing, vol. 21, no. 5, pp. 1012-‐1022, May 2013.
4. N. Li and M.W. Mak, "SNR-‐Invariant PLDA with Mul?ple Speaker Subspaces", ICASSP'16, March, 2016.
5. X.M. Pang and M.W. Mak, "Noise Robust Speaker Verifica?on via the Fusion of SNR-‐Independent and SNR-‐Dependent PLDA", InternaAonal Journal of Speech Technology, Oct. 2015.
6. M.W. Mak, "Fast Scoring for Mixture of PLDA in I-‐Vector/PLDA Speaker Verifica?on” Proc. APSIPA’15, pp. 587-‐593, Dec. 2015, Hong Kong.
7. M.W. Mak and H.B. Yu, " A Study of Voice Ac?vity Detec?on Techniques for NIST Speaker Recogni?on Evalua?ons", Computer Speech & Language, vol. 28, No. 1, Jan 2014, pp. 295-‐313.
8. N. Li and M.W. Mak, "SNR-‐Invariant PLDA Modeling for Robust Speaker Verifica?on, Interspeech'15, Sept. 2015, Dresden, Germany, pp. 2317 -‐ 2321.
9. P. Kenny, “Bayesian speaker verifica?on with heavy-‐tailed priors,” in Proc. of Odyssey: Speaker and Language RecogniAon Workshop, Brno, Czech Republic, June 2010.
10. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-‐end factor analysis for speaker verifica?on,” IEEE TransacAons on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788–798, May 2011.
59
![Page 60: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*](https://reader033.fdocuments.net/reader033/viewer/2022052804/60536dfd390e5b0a0205014a/html5/thumbnails/60.jpg)
Acknowledgment
60 Xiaomin Pang Zhili Tan Shibiao Wan Wei RAO Na LI