iVector approach to Phonotactic LRE

iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011

Phonotactic LRE

Train Classifier

LR, SVM, LMGLC,..Language-dependant

Utterance

L Recognizer(Hvite,BUTPR,...)

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Train :

Classifier

Test Utterance

L Recognizer(Hvite,BUTPR,...)

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Test :

Language dependant Score

N-gram Counts

N^3= 226981 for RU phoneme set

1 • Problem : Huge vector of n-gram counts• Solutions:

▫Choose the most frequent n-grams▫Choosing top N n-grams discriminatively(LL)▫Compress the n-gram counts

Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce

the n-gram vector dimensionality PCA-based dimensionality reduction

▫iVector feature selection

D =USV

Sub-space multinomial modeling

• Every vector of n-gram counts consist of E events (#n-grams)• Log probability of nth utterance in MN distribution is:

• can be defined as :

• Model parameter to be estimated in ML estimation are t and w• No analytical solution!• We use Newton Raphson update as a Numerical solution

N^3= 226981 for RU phoneme set

Sub-space multinomial modeling• 1st solution :

▫consider all 3-grams to be components of a Bernoulli trial

▫Model the entire vector of 3-gram counts with one multinomial distribution

▫N-gram events are not independent (not consistent with Bernoulli trial presumption!)

• 2nd solution▫Cluster 3-grams based on their histories▫Model each history with a separate MN

distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree

Training of iVector extractor

•Number of iterations : 5-7 (depends on sub-space dimension)

•Sub-space dimension : 600

3 seconds 10 seconds 30 seconds

Classifiers

•Configuration : L one-to-all linear classifier▫L: number of targeted languages

•Classifiers:▫SVM▫LR▫Linear Generative Classifier ▫MLR (to be done!)

Results on different classifiers

•Task : NIST LRE 2009

Dev-3s Dev-10s Dev-30s Eval-3s Eval-10s

Eval-30s

PCA-SVM

2.83 7.05 17.77 3.62 8.82 21.00

PCA-LR 2.22 6.22 17.26 2.93 8.29 22.60

PCA-GLC

2.81 8.25 19.83 3.50 9.88 22.88

iVec-SVM

6.54 14.07 26.79 8.54 17.5 18.06

iVec-LR 2.44 6.88 18.01 3.05 8.10 21.39

iVec-GLC

2.58 7.13 18.18 2.92 8.03 21.13

Results of different systems LRE09

Dev-3s Dev-10s

Dev-30s Evl-3s Evl-10s

Evl-30s

BASE-HU-SVM 2.83 7.05 17.77 3.62 8.82 21.00

PCA-HU-LR 2.22 6.22 17.26 2.93 8.29 22.60

iVect-HU-LR 2.81 8.25 19.83 3.05 8.10 21.05

iVec+PCA-HU-LR

2.05 5.74 16.71 2.79 7.63 21.05

iVec-RU-LR 2.66 6.46 17.50 2.59 7.42 19.83

iVec-LR HU+RU

1.54 4.44 13.30 2.09 5.34 16.53

iVec-LR HURU 1.90 5.10 14.69 2.06 5.80 17.79

N-gram clustering

•Remove all the 3-gram with repetition < 10 over all training utterances

•Model each history with a separate MN distribution

•1084 histories, up to 33 3-grams each

Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Merging histories using BDT• In case of 3-gram PiPjPk

• Merging histories which do not increase the entropy more than a certain value

PiP22Pk

PiP33Pk

PiP33+22Pk

E1=Entropy(Model1)

Models 1 Models 2

E2=Entropy(Model2)

D= E1 – E2D= E1 – E2

Results on DT Hist. merging

•1089-60

More iterations on training T => T matrix is moving toward zero matrix!

Iteration Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

DT 4.36 10.41 22.20 5.46 12.80 27.09

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Deeper insight to the iVector Extrac.

Wnnew =Wn

old + Hn−1gn

gn = tiT (l ni −φni

old l njj

∑i=1

Hn = tiT ti

∑ max(l ni,φniold l nj

Tenew = Te

old + He−1ge

ge = (l ne −φneold l ni

∑n=1

∑ )WnT

He = max(l ne,φneold l ni

∑ )n=1

∑ wnwnT

Strange results• 3-grams with no repetition through out the

whole training set should not affect system performance!

• Remove all the 3-grams with no repetition through the whole training set

• 35973->35406 (567 reduction)

• Even worse result if we prune more!!!!

Dev-30s Dev-10s Dev-3s Eval-30s

Eval-10s

Eval-3s

35973 2.44 6.88 18.01 3.05 8.10 21.39

35406 3.35 8.05 19.73 3.63 9.18 22.60

DT clustering of n-gram histories•The overall likelihood is an order of

magnitude higher than the 1st solution•Change of the model-likelihood is quite

notable in each iteration!•The T Matrix is mainly zero after some

iterations!

1st iteration

2nd iteration

3rd iteration

4th iteration

5th iteration

6th iteration

Closer look at TRAIN set

TRAIN voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

TRAIN cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔

DEV voa ✔ ✔ ✗ ✔ ✗ ✔

✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗

DEV cts ✗ ✗ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔

EVAL voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

EVAL cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔

Ivector inspection

Cant Engl

iVect inspection

•Multiple data source causes bimodality

•We also see this effect in some single source languages

iVector approach to Phonotactic LRE

Documents

Transcript of iVector approach to Phonotactic LRE

finalizacion ciclo 2012 lre

LRE Annual Report 2012

LRE - 04.07

LRE Catering

LRE Associate Partners

L1 phonotactic restrictions and perceptual adaptation ... · L1 phonotactic restrictions and perceptual adaptation: English affricates ... orthography is largely morpho-phonemic ...

Spoken Arabic Dialect Identiﬁcation Using Phonotactic · PDF fileSpoken Arabic Dialect Identiﬁcation Using Phonotactic Modeling ... The Arabic dialects, ... considered a sub-dialect

FONOTAKTIK BAHASA BANJAR (BANJARESE PHONOTACTIC

Lre-tl Cited 2

Learning Phonotactic Distributionsruccs.rutgers.edu/images/personal-alan-prince/papers/ap-bt-LPhDist...Learning Phonotactic Distributions ... problem is the subset principle ... Linguistic

Vocabulary size and phonotactic probability 1

L1 phonotactic restrictions and perceptual adaptation: English

KOCHETOV-2002-Production Perception and Phonotactic Patterns Palatalization

LRE Propshaft Maintenance

Predicting phonotactic difficulty in second language acquisition

iVECTOR TYPE BC SELECT COMPACT iVECTOR MKII Mk2 Brochure.pdfOutputs tested in accordance with BS 4856 Part 1 for heating and Part 2 for cooling. *iVECTOR Type BC 4 pipe and Type BN

Phonotactic and phrasal properties of speech rhythm. Evidence from

LEAST RESTRICTIVE ENVIRONMENT (LRE) 101 LEAST... · ENVIRONMENT (LRE) 101 Riverview School District Mini Overview Presentation November 5, 2019. DEFINING LRE • LRE is part of the

Spoken Arabic Dialect Identification Using Phonotactic ...

Speech Errors, Phonotactic Constraints, and Implicit Learning: A