Inducing and using phonetic similarity

Inducing and using phonetic similarity

Martijn Wieling and John Nerbonne

University of Tübingen / University of Groningen

Language Diversity Conference, July 20, 2013

1 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

OverviewI Inducing phonetic similarity

I The need for phonetic similarity (i.e. sensitive sound segment distances)I Obtaining sensitive sound segment distancesI Evaluating the quality of sensitive sound segment distances

I Using phonetic similarity: English accentsI The Speech Accent ArchiveI Linking computational and perceptual pronunciation distancesI A visualization of English accents

I Time permitting: Pronunciation similarity via articulography


Collaborators


The need for sensitive segment distancesI In our research on language variation, we employ pronunciation

distances (on the basis of alignments)I We would like to improve alignment quality and the distances

I There is no widely accepted procedure to determine phonetic similarity(Laver, 1994)

I Here we use the distribution of pronunciation variation to determinesimilarity

I In line with language as “un systême oû tout se tient” (focus on relationsbetween items, not items themselves; Meillet, 1903)


Our starting point: the Levenshtein distanceRestriction: vowels are not aligned with consonants

I The Levenshtein distance measures the minimum nr. of insertions,deletions and substitutions to transform one string into another

mO@lk@ delete O 1m@lk@ subst. @/E 1mElk@ delete @ 1mElk insert @ 1mEl@k

4

m O @ l k @m E l @ k

1 1 1 1

I Note the implicit identification of sound correspondences


Counting sound segment correspondencesI Counting the frequency of sound segments (in the Levenshtein alignments)

p b ... U u Total5 × 105 2 × 105 ... 90,000 9 × 105 108

I Counting the frequency of the aligned sound segments (in the Levenshtein alignments)

p b ... U up 2 × 105 10,650 ... 0 0b 88,000 ... 0 0...

......

...U 65,400 5,500u 4 × 105

Total: 5 × 107

I Probability of observing [p]: 5 × 105 / 108 = 0.005 (0.5%)I Probability of observing [b]: 2 × 105 / 108 = 0.002 (0.2%)I Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002 (0.02%)


Association strength between segment pairsI Pointwise Mutual Information (PMI): assesses degree of statistical

dependence between aligned segments (x and y )

PMI(x , y) = log2

(p(x , y)

p(x)p(y)

)I p(x , y): relative occurrence of the aligned segments x and y in the whole

datasetI p(x) and p(y): relative occurrence of x and y in the whole dataset

I The greater the PMI value, the more segments tend to cooccur incorrespondences


Association strength between segment pairsI Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002I Probability of observing [p]: 5 × 105 / 108 = 0.005I Probability of observing [b]: 2 × 105 / 108 = 0.002

PMI(x , y) = log2

(p(x , y)

p(x)p(y)

)⇒

PMI([p], [b]) = log2

(0.0002

0.005× 0.002

)

PMI([p], [b]) ≈ 4.3


Using PMI values with the Levenshtein algorithmI Idea: use association strength to weight edit operationsI PMI is large for strong associations, so invert it (0 - PMI)

I Strongly associated segments will have a low distanceI PMI range varies, so normalize it between 0 and 1I Use PMI-induced weights as costs in Levenshtein algorithm

I Cost of substituting identical sound segments is always set to 0


The PMI-based Levenshtein algorithmI We use the standard Levenshtein algorithm to calculate the initial PMI

weights and convert these to costs (i.e. sound distances)

I These sensitive sound distances are then used as edit operation costs inthe Levenshtein algorithm to obtain new alignments, new counts, andnew PMI sound distances

I This process is repeated until alignments and PMI sound distancesstabilize

I Besides improved alignments (Wieling et al., 2009), this procedureautomatically yields sensitive sound segment distances

m O @ l k @m E l @ k

0.20 0.15 0.12 0.12


Pronunciation dataI Six independent dialect data sets (IPA pronunciations)

I Dutch: 562 words in 613 locations (Wieling et al., 2007)I German: 201 words in 186 locations (Nerbonne and Siedle, 2005)I U.S. English: 153 words in 483 locations (Kretzschmar, 1994)I Bantu (Gabon): 160 words in 53 locations (Alewijnse et al., 2007)I Bulgarian: 152 words in 197 locations (Prokic et al., 2009)I Tuscan: 444 words in 213 locations (Montemagni et al., 2012)

I For all datasets sound segment distances are obtained using thePMI-based Levenshtein algorithm

I We use a slightly adapted version: ignoring identical sound segmentsubstitutions in the counts


Acoustic dataI For the evaluation, we obtained acoustic vowel measurements (F1 and

F2) reported in the scientific literatureI Pols et al. (1973; NL), van Nierop et al. (1973; NL), Sendlmeier and

Seebode (2006; GER), Hillenbrand et al. (1995; US), Nurse and Phillipson(2003, p. 22; BAN), Lehiste and Popov (1970; BUL), Calamai (2003; TUS)

I To determine acoustic vowel distance, we calculate the Euclideandistance of the formant frequencies

I Our perception of frequency is non-linear and calculating the Euclideandistance on the basis of Hertz values would not give enough weight to thefirst formant

I We therefore first scale the Hertz frequencies to Bark


Comparison procedure between acoustic and PMIdistances

I We assess the relation between the generated and acoustic distancesusing the Pearson correlation

I We visualize the relative position of the sound segments by applyingmultidimensional scaling (MDS) to the distance matrices

I Missing distances are not allowed in the (classical) MDS procedure, so insome cases not all sound segments are visualized


Acoustic vs. PMI vowel distances

Pearson’s r Explained variance (r2)Dutch 0.672 45.2%

Dutch w/o Frisian 0.686 47.1%German 0.633 40.1%

German w/o @ 0.785 61.6%US English 0.608 37.0%

Bantu 0.642 41.2%Bulgarian 0.677 45.8%

Tuscan 0.758 57.5%


MDS visualization of Dutch vowelsPMI visualization captures 76% of the variation

(a) IPA

u

o

ɑ

a

ɛ

eɪ

i y

ʏøɔ

(b) Acoustics

ɒ

e

ɛ

ə

a

ɪ

ɑ

ɔ

i

o

æ

ʌ

u

œʊɵ

y

ø

(c) PMI distances


MDS visualization of German vowelsPMI visualization captures 70% of the variation

(a) IPA

a

e

ɛ

ɪ

ioʊ

u

ʏy

œ

ø

ə ɔ

(b) Acoustics

ɔə

o

ɒɑ

ɤ

u

ɐ

ʊ

ʌ

a

æ

ɪ

ɛe

œø

i

ʏɯy

(c) PMI distances


MDS visualization of U.S. English vowelsPMI visualization captures 65% of the variation

(a) IPA

iɪ

e

ɛæ

ɑ

oʊu

ʌ ɔ

(b) Acoustics

əɪ

ɛ

u ʊ

ɑ

ɒ

ɜɔ

ʌ

ɐ

o

ɞ

(c) PMI distances


MDS visualization of Bantu vowelsPMI visualization captures 90% of the variation

(a) IPA

i

e

əɛ

a

ɔ

ou

(b) Acoustics

e

i

ɛ ɔ

u

a

o

ə

(c) PMI distances


MDS visualization of Bulgarian vowelsPMI visualization captures 86% of the variation

(a) IPA

i

e ə

a

o

u

(b) Acoustics

ɑ

ei

ə

ɛ

ɤ

o uʊ

a

(c) PMI distances


MDS visualization of Tuscan vowelsPMI visualization captures 97% of the variation

(a) IPA

a

ɛ

ei

o

u

ɔ

(b) Acoustics

a

i

o

e

u

(c) PMI distances


What about consonants?I Induced distances correlate strongly with acoustic vowel distances

I Causation is probably the reverse: acoustics explains distributionsSweeney’s insight: “I gotta use words when I talk to you...”

I But for other segments (consonants) acoustic/phonetic distances are notwell accepted, and this procedure provides a measure of distance


MDS visualization of Dutch consonantsPMI visualization captures 50% of the variation


MDS visualization of Dutch consonantsPlace (3 groups) dominates over manner (2 groups) and voicing (no groups)


RecapI The PMI approach yields sensible sound distances in a flexible,

data-driven manner

I For more details and references, see: Martijn Wieling, Eliza Margarethaand John Nerbonne (2012). Inducing a measure of phonetic similarityfrom pronunciation variation. Journal of Phonetics, 40(2), 307-314

I In the following, this method is used to obtain pronunciation distances onthe basis of English accented speech


The Speech Accent ArchiveAvailable online at http://accent.gmu.edu


Audio example

Listen to an example


Validating the PMI-based Levenshtein pronunciationdistances

I There is only a single study investigating the relation betweenLevenshtein distances and perceptual distances

I Gooskens and Heeringa (LVC, 2004)I Focusing on Norwegian dialectsI The reported correlation strength was r ≈ 0.7

I We conducted a new study based on the Speech Accent Archive,investigating the relation between perceptual and Levenshteinpronunciation distances


Outline of the perception experimentI We provided six sets of 50 audio samples (286 distinct speakers; 272

non-native) and asked native American English speakers to rate theirnative-likeness on a scale from 1 (very foreign sounding) to 7 (nativeAmerican English)

I More than 1100 (!) people filled in the questionnaire


Results of the perception experiment(Wieling, Bloem, Mignella, Timmermeister and Nerbonne, in prep.)

I We calculated the average native-likeness score for every speaker

I We used the PMI-based Levenshtein algorithm to obtain the automaticpronunciation distances for each of these speakers and the average U.S.speaker (based on 115 samples of native U.S. English speakers born inthe U.S.)

I The distance between two speakers is calculated by averaging the wordpronunciation distances between the two speakers

I The correlation between the two measures was r ≈ 0.8

I We may conclude that the Levenshtein distance is a valid measure todetermine perceptually sound pronunciation distances


Visualizing English pronunciationsI We used 989 phonetically transcribed samples from the SAAI We grouped the transcriptions (i.e. speakers) per countryI For non-English speaking countries, we excluded speakers who moved to

an English-speaking country before age 13I We only included countries with at least 5 speakers

I Pronunciation distances between countries were calculated using thePMI-based Levenshtein algorithm and are visualized using MDS


MDS visualization of accent distances88% of the variation in the original pronunciation distances is captured


ConclusionsI The PMI approach allows us to induce sensible sound distances

I The approach is readily applicable to any (dialect) dataset with similarpronunciations

I The PMI-based Levenshtein algorithm yields pronunciation distanceswhich correspond well with perceptual pronunciation distances

I Next step: determining pronunciation distances on the basis of themovements of the articulators during speech

I Work in progress...I If interested, ask about it!


Thank you for your attention!


Pronunciation similarity via articulography: intro


Connected to the articulography device


Pilot experiment

“The men hit the squirrel that walked on this verybroad street, but they just failed to kill him.”

American speaker German speaker


Tongue movement


Tongue tip movement

palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate

low

high

front backPosition front−back

Hei

ght

Start pronunciation ....... End pronunciation

Position tongue tip in mouth

American speaker: "that"


low

high


Hei

ght



German speaker: "that"

Listen to the pronunciations


Statistical analysis

low

high

Startpronun.

Endpronun.

Time

Hei

ght

Speaker

American

German

Height tongue tip: "that"

fron

tba

ck

Startpronun.

Endpronun.

Time

Pos

ition

fron

t−ba

ck

Speaker

American

German

Position tongue tip front−back: "that"


Tongue tip movement


low

high


Hei

ght



American speaker: "squirrel"


low

high


Hei

ght



German speaker: "squirrel"

Listen to the pronunciations


Statistical analysis

low

high

Startpronun.

Endpronun.

Time

Hei

ght

Speaker

American

German

Height tongue tip: "squirrel"

fron

tba

ck

Startpronun.

Endpronun.

Time

Pos

ition

fron

t−ba

ck

Speaker

American

German

Position tongue tip front−back: "squirrel"


Pilot conclusionsI Pronunciation and tongue movement may reveal different patternsI Generalized additive modeling allows us to extract the general movement

patterns while taking into account the individual variationI It also allows the incorporation of additional informative variables (e.g., the

age at which a non-native speaker learned English)


Thank you for your attention!


Inducing and using phonetic similarity

Documents

Transcript of Inducing and using phonetic similarity