L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

22
LETTER TO PHONEME ALIGNMENT Reihaneh Rabbany Shahin Jabbari

Transcript of L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

Page 1: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

LETTER TO PHONEME ALIGNMENT

Reihaneh Rabbany

Shahin Jabbari

Page 2: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 2

Page 3: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

TEXT TO SPEECH TEXT TO SPEECH PROBLEM

Conversion of Text to Speech: TTS

Automated Telecom ServicesE-mail by PhoneBanking SystemsHandicapped People

3

Page 4: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

PRONUNCIATIONPRONUNCIATION

Pronunciation of the words Dictionary Words Non-Dictionary Words

Phonetic Analysis

Dictionary Look-up Language is alive, new words add Proper Nouns

4

Phonetic AnalysisWord

Pronunciation

Page 5: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 5

Page 6: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

PROBLEM

Letter to Phoneme Alignment◦ Letter: c a k e

◦ Phoneme: k ei k

6

L2P

Page 7: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

CHALLENGES

No Consistency◦ City / s /◦ Cake / k /◦ Kid / k /

No Transparency◦ K i d (3) / k i d / (3) ◦ S i x (3) / s i k s / (4)◦ Q u e u e (5) / k j u: / (3)◦ A x e (3) / a k s / (3)

7

Page 8: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 8

Page 9: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

ONE-TO-ONE EMDAELEMANS ET.AL., 1996 Length of word = pronunciation Produce all possible alignments

Inserting null letter/phoneme

Alignment probability

9

i

ii lpPAP )|()(

Page 10: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

DECISION TREEBLACK ET.AL., 1996

Train a CART Using Aligned Dictionary Why CART? A Single Tree for Each Letter

10

Page 11: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

KONDRAK

Alignments are not always one-to-one A x e / a k s / B oo k /b ú k /

Only Null Phoneme Similar to one-to-one EM

Produce All Possible Alignments Compute the Probabilities

11

Page 12: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 12

Page 13: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

FORMAL MODEL

Word: sequence of letters

Pronunciation: sequence of phonemes

Alignment: sequence of subalignments

Problem: Finding the most probable alignment

13

mpppP ...21

iiik PLaaaaA ,...21

nlllL ...21

),|(maxarg PLAPA Abest

2|||,| ii PL

Page 14: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

MANY-TO-MANY EM

1. Initialize prob(SubAlignmnets)// Expectation Step2. For each word in training_set

2.1. Produce all possible alignments 2.2. Choose the most probable

alignment// Maximization Step3. For all subalignments

3.1. Compute new_p(SubAlignmnets)

14][

],[)(

i

iii lM

plMaP

Page 15: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

DYNAMIC BAYESIAN NETWORK

15

Model

Subaligments are considered as hidden variables

Learn DBN by EM

lili PiPi

ai

k

iiii PLaPAP

1

),|()(

],[

][)(

ii

ii lpM

aMaP

Page 16: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

CONTEXT DEPENDENT DBN

Context independency assumption Makes the model simpler It is not always a correct assumption Example: Chat and Hat

Model

16

lili PiPi

aiai-1

k

iiiii PLaaPAP

11 ),,|()(

],,[

][)(

1 iii

ii lpaM

aMaP

Page 17: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 17

Page 18: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

EVALUATION DIFFICULTIES

Unsupervised Evaluation No Aligned Dictionary

Solutions How much it boost a supervised module

Letter to Phoneme Generator Comparing the result with a gold alignment

AER

18

Page 19: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

Letter to Phoneme Generator

Percentage of correctly generated phonemes and words

How it works? Finding Chunks

Binary Classification Using Instance-Based-Learning

Phoneme Prediction Phoneme is predicted independently for each letter Phoneme is predicted for each chunk

Hidden Markov Model 19

Page 20: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

ALIGNMENT ERROR RATIO

AER Evaluating by Alignment Error Ratio

Counting common pairs between Our aligned output Gold alignment

Calculating AER

20

|| A

GAAER

Page 21: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 21

Page 22: L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

RESULTS

22

10 fold cross validation

Model Word Accuracy

Phoneme Accuracy

Best previous results 66.82 92.45

One_To_One EM 53.87% 85.66%

Many_To_Many EM 76% 94.5%

DBN ContextIndependent

79.12% 95.23%

ContextDependent

81.54% 96. 70%