By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis,...

51
By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL George Doddington, NA-sayer “Pushing the Envelope” A six month report
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis,...

Page 1: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

By the Novel Approaches team,With site leaders:

Nelson Morgan, ICSIHynek Hermansky, OGI

Dan Ellis, ColumbiaKemal Sönmez, SRIMari Ostendorf, UW

Hervé Bourlard, IDIAP/EPFLGeorge Doddington, NA-sayer

“Pushing the Envelope”

A six month report

Page 2: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

OverviewOverview

Nelson Morgan, ICSINelson Morgan, ICSI

Page 3: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

The Current Cast of The Current Cast of CharactersCharacters

• ICSI: Morgan, Q. Zhu, B. Chen, G. Doddington

• UW: M. Ostendorf, Ö. Çetin

• OGI: H. Hermansky, S. Sivadas, P. Jain

• Columbia: D. Ellis, M. Athineos

• SRI: K. Sönmez

• IDIAP: H. Bourlard, J. Ajmera, V. Tyagi

Page 4: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR

• Escape dependence on spectral envelope

• Use multiple front-ends across time/freq

• Modify statistical models to accommodate new front-ends

• Design optimal combination schemes for multiple models

Page 5: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

time

Task 1: Pushing the Task 1: Pushing the Envelope (aside)Envelope (aside)

• Problem: Spectral envelope is a fragile information carrier

estimate of sound identity

info

rmati

on

fusio

n

10 msOLD

PROPOSED

• Solution: Probabilities from multiple time-frequency patches

ith estimate

up to 1s

kth estimate

nth estimate

estimate of sound identity

Page 6: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Task 2: Beyond Task 2: Beyond Frames…Frames…

• Solution: Advanced features require advanced models, free of fixed-frame-rate paradigm

OLD

PROPOSED

conventional HMMshort-term features

• Problem: Features & models interact; new features may require different models

advanced features multi-rate, dynamic-scale classifier

Page 7: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Today’s presentationToday’s presentation

• Infrastructure: training, testing, software

• Initial Experiments: pilot studies• Directions: where we’re headed

Page 8: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Infrastructure Infrastructure

Kemal Sönmez, SRIKemal Sönmez, SRI(SRI/UW/ICSI effort)(SRI/UW/ICSI effort)

Page 9: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Initial Experimental Initial Experimental ParadigmParadigm

• Focus on a small task to facilitate exploratory work (later move to CTS)

• Choose a task where LM is fixed & plays a minor role (to focus on acoustics)

• Use mismatched train/test data:To avoid tuning to the taskTo facilitate later move to CTS

• Task: OGI numbers/ Train: swbd+macrophone

Page 10: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

• Composition

(total ~ 60 hours)

* subset of SWB-1 hand-checked at SRI for accuracy of transcriptions and segmentations

• WER 2-4% higher vs. full 250+ hour training

Hub5 “Short” Training Hub5 “Short” Training SetSet

hoursCorpus Male Female

callhome 2.8 13.8

switchboard* 5.9 4.3credit-card 6.7 7.1macrophone 12.4 5.8

Page 11: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Reduced UW Training Reduced UW Training SetSet

• A reduced training set to shorten expt. turn-around time

• Choose training utterances with per-frame likelihood scores close to the training set average

• 1/4th of the original training set• Statistics (gender, data set constituencies) are similar

to that of the full training set.

• For OGI Numbers, no significant WER sacrifice in the baseline HMM system (worse for Hub 5).

data set constituencies

male/femalemacrophon

ecallhome

credit-card

otherswitchboard

“short” 32% 32% 12% 24% 45/55%

Reduced (UW)

38% 28% 12% 22% 48/52%

Page 12: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Development Test SetsDevelopment Test Sets• A “Core-Subset” of OGI’s Numbers 95 corpora – telephone

speech of people reciting addresses, telephone numbers, zip codes, or other miscellaneous items

• “Core-Subset” or “CS” consists of utterances that were phonetically hand-transcribed, intelligible, and contained only numbers

• Vocabulary Size: 32 words (digits + eleven, twelve… twenty… hundred…thousand, etc.)

Data Set Name Total Utterance

Total Words Duration (hours)

Numbers95-CS Cross

Validation

357 1353 ~0.2

Numbers95-CSDevelopment

1206 4673 ~0.6

Numbers95-CSTest

1227 4757 ~0.6

Page 13: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Statistical Modeling Statistical Modeling Tools Tools

• HTK (Hidden Markov Toolkit) for establishing an HMM baseline, debugging

• GMTK (Graphical Models Toolkit) for implementing advanced models with multiple feature/state streamsAllows direct dependencies across streams Not limited by single-rate, single-stream paradigmRapid model specification/training/testing

• SRI Decipher system for providing lattices to rescore (later in CTS expts)

• Neural network tools from ICSI for posterior probability estimation, other statistical software from IDIAP

Page 14: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Baseline SRI Baseline SRI RecognizerRecognizer

for the numbers taskfor the numbers task• Bottom-up state-clustered Gaussian mixture

HMMs for acoustic modeling• Acoustic adaptation to speakers using affine mean

and variance transforms[Not used for numbers]• Vocal-tract length normalization using maximum

likelihood estimation [Not helpful for numbers]• Progressive search with lattice recognition and N-

best rescoring [To be used in later work]• Bigram LM

Page 15: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Initial ExperimentsInitial Experiments

Barry Chen, ICSIBarry Chen, ICSIHynek Hermansky, OHSU (OGI)Hynek Hermansky, OHSU (OGI)

Özgür Çetin, UWÖzgür Çetin, UW

Page 16: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Goals of Initial Goals of Initial ExperimentsExperiments

• Establish performance baselinesHMM + standard features (MFCC, PLP)HMM + current best from ICSI/OGI

• Develop infrastructure for new modelsGMTK for multi-stream & multi-rate featuresNovel features based on large timespansNovel features based on temporal fine

structure

• Provide fodder for future error analysis

Page 17: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

ICSI Baseline ICSI Baseline experimentsexperiments

• PLP based - SRI system

• “Tandem” PLP-based ANN + SRI system

• Initial combination approach

Page 18: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Development Baseline: Development Baseline: Gender Independent Gender Independent

PLP SystemPLP System

Training SetWord,SentenceError Rate on

Numbers95-CS Test Set

Full “Short” Hub5 (85k utterances, ~64.9 hrs)

3.4%,10.2%

UW Reduced Hub5 (20k utterances, ~18.8 hrs)

3.8%,11.4%

Page 19: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Phonetically Trained Neural Phonetically Trained Neural NetNet

• Multi-Layer Perceptron (input, hidden, and output layer)• Trained Using Error-Backpropagation Technique – outputs

interpreted as posterior probabilities of target classes• Training Targets: 47 mono-phone targets from forced

alignment using SRI Eval 2002 system• Training Utterances: UW Reduced Hub5 Set• Training Features: PLP12+e+d+dd, mean & variance

normalized on per-conversation side basis• MLP Topology:

9 Frame Context Window (4 frames in past + current frame + 4 frames in future)

351 Input Units, 1500 Hidden Units, and 47 Output Units Total Number of Parameters: ~600k

Page 20: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Baseline ICSI TandemBaseline ICSI Tandem

• Outputs of Neural Net before final softmax non-linearity used as inputs to PCA

• PCA without dimensionality reduction

• 4.1% Word and 11.7% Sentence Error Rate on Numbers95-CS test set

Page 21: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Baseline ICSI Tandem+PLPBaseline ICSI Tandem+PLP

• PLP Stream concatenated with neural net posteriors stream• PCA reduces dimensionality of posteriors stream to 16

(keeping 95% of overall variance)• 3.3% Word and 9.5% Sentence Error Rate on Numbers95-

CS test set

Page 22: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Word and String Error Rates on Word and String Error Rates on Numbers95-CS Test SetNumbers95-CS Test Set

Page 23: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

OGI Experiments:OGI Experiments:New Features in EARSNew Features in EARS

• Develop on home-grown ASR system (phoneme-based HTK)

• Pass the most promising to ICSI for running in SRI LVCSR system

• So far new features match the performance of the

baseline PLP features but do not exceed itadvantage seen in combination with the

baseline

Page 24: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Looking to the human Looking to the human auditory system for design auditory system for design

inspirationinspiration

• Psychophysics Components within

certain frequency range (several critical bands) interact [e.g. frequency masking]

Components within certain time span (a few hundreds of ms) interact [e.g. temporal masking]

• Physiology 2-D (time-frequency)

matched filters for activity in auditory cortex [cortical receptive fields]

Page 25: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

TRAP-based HMM-NN hybrid ASR

Posterior probabilitiesof phonemes

Multilayer Perceptron

(MLP)

Mean &variancenormalized,hamming windowedcritical bandtrajectory

101 pointinput

Multilayer Perceptron

(MLP)

Multilayer Perceptron

(MLP)

Searchfor the best

match

Page 26: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Feature estimation from linearly transformed temporal

patterns

MLP

MLPtransform

transform

TANDEMHMMASR

? ? ?

Page 27: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Preliminary Preliminary TANDEM/TRAP results TANDEM/TRAP results

(OGI-HTK)(OGI-HTK)

WER% on OGI numbers, training on UW reduced training set,monophone models

BASELINE 4.5

TANDEM 4.1

TANDEM with TRAP 3.9

Page 28: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Features from more than one Features from more than one critical-band temporal critical-band temporal

trajectorytrajectory

+

averagefrequencyderivative

cosinetransform

Studying KLT-derived basis functions, we observe:

Page 29: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

UW Baseline UW Baseline ExperimentExperimentss

• Constructed an HTK-based HMM system that is competitive with the SRI system

• Replicated the HMM system in GMTK• Move on to models which integrate

information from multiple sources in a principled manner:

Multiple feature streams (multi-stream models)

Different time scales (multi-rate models)

• Focus on statistical models not on feature extraction

Page 30: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

HTK HMM BaselineHTK HMM Baseline• An HTK-based standard HMM system:

• 3 state triphones with decision-tree clustering,

• Mixture of diagonal Gaussians as state output dists.,

• No adaptation, fixed LM.

• Dimensions explored:• Front-end: PLP vs. MFCC, VTLN

• Gender dependent vs. independent modeling

• Conclusions: • No significant performance differences

• Decided on PLPs, no VTLN, gender-independent models for simplicity

Page 31: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

HMM Baselines (cont.)HMM Baselines (cont.)• Replicated HTK baseline with equivalent results in GMTK

• To reduce experiment turn-around time, wanted to reduce the training set

• For HMMs and Numbers95, 3/4th of the training data can be safely ignored:

WER %

tool dev test

HTK 3.7 3.2

GMTK 3.7 3.0

Training set

WER %

dev test

Full “short” 3.7 3.2

1/4th (“reduced”)

3.4 3.4

Page 32: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Multi-stream ModelsMulti-stream Models• Information fusion from multiple streams of features • Partially asynchronous state sequences

states of stream X

state

s of stre

am

Y

state seq. of stream Y

STATE TOPOLOGY

state seq. of stream X

feature stream X

feature stream Y

GRAPHICAL MODEL

modelWER %

dev test

HMM (PLP) 3.9 4.2

multi-stream(PLP+MFCC)

Page 33: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Temporal envelope Temporal envelope featuresfeatures

(Columbia)(Columbia)• Temporal fine structure is lost (deliberately)

in STFT features:

• Need a compact, parametric description...time / sec

0.65 0.7 0.75 0.8 0.85 0.90

2000

4000

6000

8000

-6dB

0

-40

-20

0

0.65 0.7 0.75 0.8 0.85 0.9-0.05

0

0.05

0.1

0.15mpgr1-sx419

10 mswindows

Page 34: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Frequency-DomainFrequency-DomainLinear Prediction Linear Prediction

(FDLP)(FDLP)

• Extend LPC with LP model of spectrum

• ‘Poles’ represent temporal peaks:

• Features ~ pole bandwidth, ‘frequency’

TD-LPy[n] = iaiy[n-i]

DFTFD-LP

Y[k] = ibiY[k-i]

0.65 0.7 0.75 0.8 0.85 0.9-0.05

0

0.05

0.1

mpgr1-sx419: TDLPC env (60 poles / 300 ms)

Page 35: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Preliminary FDLP Preliminary FDLP ResultsResults

• Distribution of pole magnitudes for different phone classes (in 4 bands):

• NN Classifier Frame Accuracies:

plp12N 57.0%

plp12N+FDLP4 58.2%

-2 0 2 4 60

0.02

0.04

0.06

0.08

0.10-500 Hz band

-2 0 2 4 6

500-1000 Hz band

-2 0 2 4 6

1-2 kHz band

-2 0 2 4 6

2-4 kHz band

-log(1-||)

/ah//p/

Page 36: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

DirectionsDirections

Dan Ellis, ColumbiaDan Ellis, Columbia(SRI/UW/Columbia work)(SRI/UW/Columbia work)

Nelson Morgan, ICSINelson Morgan, ICSI(OGI/IDIAP/ICSI work + summary)(OGI/IDIAP/ICSI work + summary)

Page 37: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Multi-rate Models (UW)Multi-rate Models (UW)

long-term features

short-term features

Cro

ss-s

cale

d

epe

nde

nci

es

(exa

mpl

e)

coarse state chain

fine state chain

• Integrate acoustic information from different time scales

• Account for dependencies across scales

• Better robustness against time- and/or frequency localized interferences

•Reduced redundancy gives better confidence estimates

Page 38: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

SRI DirectionsSRI Directions• Task 1: Signal-adaptive weighting of time-frequency patches

Basis-entropy based representation

Matching pursuit search for optimal weighting of patches

Optimality based on minimum entropy criterion

• Task 2: Graphical models of patch combinations

Tiling-driven dependency modeling

GM combines across patch selections

Optimality based on information in representation

Page 39: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Data-derived phonetic Data-derived phonetic features (Columbia)features (Columbia)

• Find a set of independent attributes to account for phonetic (lexical) distinctionsphones replaced by feature streams

• Will require new pronunciation modelsasynchronous feature transitions (no phones)mapping from phonetics (for unseen words)

Joint work with Eric Fosler-Lussier

Page 40: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

ICA for feature basesICA for feature bases• PCA finds decorrelated bases;

ICA finds independent bases

• Lexically-sufficient ICA basis set?

test/dr1/faks0/sa2

Basis vectors

5

10

15

0

2

4

6

8

time / labels d ow n ae s m iy t ix k eh r iy ix n oy l iy r ae g l ay k dh ae tcl

0

2

4

6

8

frequency / Bark

-1

0

1

0 5 10 15 20-1

0

1

2

01234

Page 41: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

OGI Directions:OGI Directions:Targets in sub-bandsTargets in sub-bands• Initially context-independent and band-

specific phonemes• Gradually shifted to band-specific 6 broad

phonetic classes (stops, fricatives, nasals, vowels, silence, flaps)

• Moving towards band-independent speech classes (vocalic-like, fricative-like, plosive-like, ???)

Page 42: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

More than one temporal pattern?

Mean &Variance normalized,Hamming windowedcritical bandtrajectory

MLP

MLPKLT1

101 dim

KLTn

Page 43: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Pre-processing by 2-D operatorsPre-processing by 2-D operatorswith subsequent TRAP-TANDEMwith subsequent TRAP-TANDEM

frequ

ency

time

1 2 10 0 0-1 -2 -1

-1 0 1-2 0 2-1 0 1

0 1 2-1 0 1-2 -1 0

-2 -1 0-1 0 10 1 2

differentiate faverage t

differentiate taverage f

diff upwardsav downwards

diff downwardsav upwards

Page 44: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

IDIAP Directions:IDIAP Directions:Phase AutoCorrelation Phase AutoCorrelation

FeaturesFeaturesTraditional Features: Autocorrelation based.Very sensitive to additive noise, other variations.Phase AutoCorrelation (PAC):

if represents autocorrelation

coeffs derived from a frame of length PACs:

.1,...,1,0 , NkkR1N

energy. Frame 0 , 0

cos1-

R

R

kRkP

Page 45: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Entropy Based Multi-Entropy Based Multi-Stream CombinationStream Combination

• Combination of evidences from more than one expert to improve performance

• Entropy as a measure of confidence• Experts having low entropy are more

reliable as compared to experts having high entropy

• Inverse entropy weighting criterion• Relationship between entropy of the

resulting (recombined) classifier and recognition rate

Page 46: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

ICSI Directions:ICSI Directions:Posterior Combination Posterior Combination

FrameworkFramework

• Combination of Several Discriminative Probability Streams

Page 47: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Improvement of the Combo Infrastructure

• Improve basic features:

Add prosodic features: voicing level, energy continuity,

Improve PLP by further removing the pitch difference among speakers.

• Tandem

Different targets, different training features. E.g.: word boundary.

• Improve TRAP (OGI)

• Combination

Entropy based, accuracy based stream weighting or stream selection.

Page 48: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

New types of tandem features: Possible

word/syllable boundary

NNProcessing

Inputfeature

Target posterior

Input feature:• Traditional or improved

PLP• Spectral continuity• Voicing, voicing continuity• Formant continuity feature• …more

• Phonemes• Word/syllable

boundary• Broad phoneme

classes• Manner/ place /

articulation… etc

Page 49: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

Data Driven Subword Unit Data Driven Subword Unit Generation (IDIAP/ICSI)Generation (IDIAP/ICSI)

Initial segmentation:large number of clusters

Is thresholdless BIC-likemerging criterion met?

Merge, re-segment, and re-estimate

Yes

StopNo

• Motivation: Phoneme-based units may not be optimal for ASR.

• Approach (based on speaker segmentation

method):

Page 50: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

SummarySummary

• Staff and tools in place to proceed with core experiments

• Pilot experiments provided coherent substrate for cooperation between 6 sites

• Future directions for individual sites are all over the map, which is what we want

• Possible exploration of collaborations w/MS in this meeting

Page 51: By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.