“Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin...

16
“Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han

Transcript of “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin...

Page 1: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

“Improving Pronunciation Dictionary Coverage of Names by

Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut

Presented by Han

Page 2: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Problem:

• The pronunciation of out-of-vocabulary (OOV) words is a major problem in TTS.

• Many OOV words are names.• For English names, the orthography for names is

highly irregular.• Current methods of approaching this problem

has low accuracy.– Using hand-written or automatically learned rules to

replace a sequence of graphemes by a sequence of phonemes.

Page 3: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Challenge

Page 4: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Their Method• Scope: English surnames, forenames,

street names and place names.• Based on: the observation that some of

the words in the above categories have same pronunciation, but slightly different spelling.

• Approach: learn from existing data (data-driven) of the rules of these variations, so that next time we see an OOV word, we will try to apply these rules and see if we can transform that word into an IOV word.

Page 5: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Different Orthographical Expressions for the Same Pronunciation

Page 6: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Hypothesis

• Given a name that’s not in the dictionary, there’s about 10% chance that it DOES have a valid pronunciation in the dictionary. We have to somehow map it to a valid in-dictionary word.

Page 7: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

A Hierarchical Approach

Dictionary

Filter 1

Filter 2

etc.

Page 8: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Two Ways of Using This Method and Their Results

• Online– Results suggested pronunciations are good in

80% of cases.

• Offline– For surnames, a model trained on a 23,000-

entry dictionary was able to add 5,000 new entries, increasing the coverage by about 1%.

Page 9: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Algorithm (Part I)Training

• 1) reverse dictionary (pron -> ortho)

• 2) delete one-to-one mappings

• 3) Each pair of spellings that share a common pronunciation generates a set of rewrite rules, ri where i = 0 to n, in the form of “A -> B / L _ R”

Page 10: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Algorithm (Part I)Training

Page 11: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Algorithm (Part I)Training

• Each rule, ri, is then evaluated on the rest of the dictionary to see how useful it is. – MISS– OOV– DIFF– GOOD

And gets four scores: niMISS, ni

OOV, niDIFF, and ni

GOOD

• From each set of rules generated by a pair, only one rule is chosen: shortest and ni

DIFF =0.

Page 12: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

The Algorithm (Part I)Predication

• Sort all rules by score.

• When given an OOV word, use the rule with the highest score that can map it into an IOV word.

Page 13: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Some Examples of Resulted Rewrite Rules

Page 14: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Some Results

Page 15: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Accuracy Test Results

Page 16: “Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Accuracy Test Results