Inducing and using phonetic similarity

45
Inducing and using phonetic similarity Martijn Wieling and John Nerbonne University of Tübingen / University of Groningen Language Diversity Conference, July 20, 2013 1 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Transcript of Inducing and using phonetic similarity

Page 1: Inducing and using phonetic similarity

Inducing and using phonetic similarity

Martijn Wieling and John Nerbonne

University of Tübingen / University of Groningen

Language Diversity Conference, July 20, 2013

1 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 2: Inducing and using phonetic similarity

OverviewI Inducing phonetic similarity

I The need for phonetic similarity (i.e. sensitive sound segment distances)I Obtaining sensitive sound segment distancesI Evaluating the quality of sensitive sound segment distances

I Using phonetic similarity: English accentsI The Speech Accent ArchiveI Linking computational and perceptual pronunciation distancesI A visualization of English accents

I Time permitting: Pronunciation similarity via articulography

2 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 3: Inducing and using phonetic similarity

Collaborators

3 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 4: Inducing and using phonetic similarity

The need for sensitive segment distancesI In our research on language variation, we employ pronunciation

distances (on the basis of alignments)I We would like to improve alignment quality and the distances

I There is no widely accepted procedure to determine phonetic similarity(Laver, 1994)

I Here we use the distribution of pronunciation variation to determinesimilarity

I In line with language as “un systême oû tout se tient” (focus on relationsbetween items, not items themselves; Meillet, 1903)

4 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 5: Inducing and using phonetic similarity

Our starting point: the Levenshtein distanceRestriction: vowels are not aligned with consonants

I The Levenshtein distance measures the minimum nr. of insertions,deletions and substitutions to transform one string into another

mO@lk@ delete O 1m@lk@ subst. @/E 1mElk@ delete @ 1mElk insert @ 1mEl@k

4

m O @ l k @m E l @ k

1 1 1 1

I Note the implicit identification of sound correspondences

5 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 6: Inducing and using phonetic similarity

Our starting point: the Levenshtein distanceRestriction: vowels are not aligned with consonants

I The Levenshtein distance measures the minimum nr. of insertions,deletions and substitutions to transform one string into another

mO@lk@ delete O 1m@lk@ subst. @/E 1mElk@ delete @ 1mElk insert @ 1mEl@k

4

m O @ l k @m E l @ k

1 1 1 1

I Note the implicit identification of sound correspondences

5 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 7: Inducing and using phonetic similarity

Counting sound segment correspondencesI Counting the frequency of sound segments (in the Levenshtein alignments)

p b ... U u Total5 × 105 2 × 105 ... 90,000 9 × 105 108

I Counting the frequency of the aligned sound segments (in the Levenshtein alignments)

p b ... U up 2 × 105 10,650 ... 0 0b 88,000 ... 0 0...

......

...U 65,400 5,500u 4 × 105

Total: 5 × 107

I Probability of observing [p]: 5 × 105 / 108 = 0.005 (0.5%)I Probability of observing [b]: 2 × 105 / 108 = 0.002 (0.2%)I Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002 (0.02%)

6 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 8: Inducing and using phonetic similarity

Association strength between segment pairsI Pointwise Mutual Information (PMI): assesses degree of statistical

dependence between aligned segments (x and y )

PMI(x , y) = log2

(p(x , y)

p(x)p(y)

)I p(x , y): relative occurrence of the aligned segments x and y in the whole

datasetI p(x) and p(y): relative occurrence of x and y in the whole dataset

I The greater the PMI value, the more segments tend to cooccur incorrespondences

7 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 9: Inducing and using phonetic similarity

Association strength between segment pairsI Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002I Probability of observing [p]: 5 × 105 / 108 = 0.005I Probability of observing [b]: 2 × 105 / 108 = 0.002

PMI(x , y) = log2

(p(x , y)

p(x)p(y)

)⇒

PMI([p], [b]) = log2

(0.0002

0.005× 0.002

)

PMI([p], [b]) ≈ 4.3

8 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 10: Inducing and using phonetic similarity

Using PMI values with the Levenshtein algorithmI Idea: use association strength to weight edit operationsI PMI is large for strong associations, so invert it (0 - PMI)

I Strongly associated segments will have a low distanceI PMI range varies, so normalize it between 0 and 1I Use PMI-induced weights as costs in Levenshtein algorithm

I Cost of substituting identical sound segments is always set to 0

9 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 11: Inducing and using phonetic similarity

The PMI-based Levenshtein algorithmI We use the standard Levenshtein algorithm to calculate the initial PMI

weights and convert these to costs (i.e. sound distances)

I These sensitive sound distances are then used as edit operation costs inthe Levenshtein algorithm to obtain new alignments, new counts, andnew PMI sound distances

I This process is repeated until alignments and PMI sound distancesstabilize

I Besides improved alignments (Wieling et al., 2009), this procedureautomatically yields sensitive sound segment distances

m O @ l k @m E l @ k

0.20 0.15 0.12 0.12

10 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 12: Inducing and using phonetic similarity

The PMI-based Levenshtein algorithmI We use the standard Levenshtein algorithm to calculate the initial PMI

weights and convert these to costs (i.e. sound distances)

I These sensitive sound distances are then used as edit operation costs inthe Levenshtein algorithm to obtain new alignments, new counts, andnew PMI sound distances

I This process is repeated until alignments and PMI sound distancesstabilize

I Besides improved alignments (Wieling et al., 2009), this procedureautomatically yields sensitive sound segment distances

m O @ l k @m E l @ k

0.20 0.15 0.12 0.12

10 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 13: Inducing and using phonetic similarity

Pronunciation dataI Six independent dialect data sets (IPA pronunciations)

I Dutch: 562 words in 613 locations (Wieling et al., 2007)I German: 201 words in 186 locations (Nerbonne and Siedle, 2005)I U.S. English: 153 words in 483 locations (Kretzschmar, 1994)I Bantu (Gabon): 160 words in 53 locations (Alewijnse et al., 2007)I Bulgarian: 152 words in 197 locations (Prokic et al., 2009)I Tuscan: 444 words in 213 locations (Montemagni et al., 2012)

I For all datasets sound segment distances are obtained using thePMI-based Levenshtein algorithm

I We use a slightly adapted version: ignoring identical sound segmentsubstitutions in the counts

11 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 14: Inducing and using phonetic similarity

Acoustic dataI For the evaluation, we obtained acoustic vowel measurements (F1 and

F2) reported in the scientific literatureI Pols et al. (1973; NL), van Nierop et al. (1973; NL), Sendlmeier and

Seebode (2006; GER), Hillenbrand et al. (1995; US), Nurse and Phillipson(2003, p. 22; BAN), Lehiste and Popov (1970; BUL), Calamai (2003; TUS)

I To determine acoustic vowel distance, we calculate the Euclideandistance of the formant frequencies

I Our perception of frequency is non-linear and calculating the Euclideandistance on the basis of Hertz values would not give enough weight to thefirst formant

I We therefore first scale the Hertz frequencies to Bark

12 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 15: Inducing and using phonetic similarity

Comparison procedure between acoustic and PMIdistances

I We assess the relation between the generated and acoustic distancesusing the Pearson correlation

I We visualize the relative position of the sound segments by applyingmultidimensional scaling (MDS) to the distance matrices

I Missing distances are not allowed in the (classical) MDS procedure, so insome cases not all sound segments are visualized

13 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 16: Inducing and using phonetic similarity

Acoustic vs. PMI vowel distances

Pearson’s r Explained variance (r2)Dutch 0.672 45.2%

Dutch w/o Frisian 0.686 47.1%German 0.633 40.1%

German w/o @ 0.785 61.6%US English 0.608 37.0%

Bantu 0.642 41.2%Bulgarian 0.677 45.8%

Tuscan 0.758 57.5%

14 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 17: Inducing and using phonetic similarity

MDS visualization of Dutch vowelsPMI visualization captures 76% of the variation

(a) IPA

u

o

ɑ

a

ɛ

i y

ʏøɔ

(b) Acoustics

ɒ

e

ɛ

ə

a

ɪ

ɑ

ɔ

i

o

æ

ʌ

u

œʊɵ

y

ø

(c) PMI distances

15 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 18: Inducing and using phonetic similarity

MDS visualization of German vowelsPMI visualization captures 70% of the variation

(a) IPA

a

e

ɛ

ɪ

ioʊ

u

ʏy

œ

ø

ə ɔ

(b) Acoustics

ɔə

o

ɒɑ

ɤ

u

ɐ

ʊ

ʌ

a

æ

ɪ

ɛe

Ͽ

i

ʏɯy

(c) PMI distances

16 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 19: Inducing and using phonetic similarity

MDS visualization of U.S. English vowelsPMI visualization captures 65% of the variation

(a) IPA

e

ɛæ

ɑ

oʊu

ʌ ɔ

(b) Acoustics

əɪ

ɛ

u ʊ

ɑ

ɒ

ɜɔ

ʌ

ɐ

o

ɞ

(c) PMI distances

17 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 20: Inducing and using phonetic similarity

MDS visualization of Bantu vowelsPMI visualization captures 90% of the variation

(a) IPA

i

e

əɛ

a

ɔ

ou

(b) Acoustics

e

i

ɛ ɔ

u

a

o

ə

(c) PMI distances

18 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 21: Inducing and using phonetic similarity

MDS visualization of Bulgarian vowelsPMI visualization captures 86% of the variation

(a) IPA

i

e ə

a

o

u

(b) Acoustics

ɑ

ei

ə

ɛ

ɤ

o uʊ

a

(c) PMI distances

19 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 22: Inducing and using phonetic similarity

MDS visualization of Tuscan vowelsPMI visualization captures 97% of the variation

(a) IPA

a

ɛ

ei

o

u

ɔ

(b) Acoustics

a

i

o

e

u

(c) PMI distances

20 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 23: Inducing and using phonetic similarity

What about consonants?I Induced distances correlate strongly with acoustic vowel distances

I Causation is probably the reverse: acoustics explains distributionsSweeney’s insight: “I gotta use words when I talk to you...”

I But for other segments (consonants) acoustic/phonetic distances are notwell accepted, and this procedure provides a measure of distance

21 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 24: Inducing and using phonetic similarity

MDS visualization of Dutch consonantsPMI visualization captures 50% of the variation

22 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 25: Inducing and using phonetic similarity

MDS visualization of Dutch consonantsPlace (3 groups) dominates over manner (2 groups) and voicing (no groups)

23 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 26: Inducing and using phonetic similarity

RecapI The PMI approach yields sensible sound distances in a flexible,

data-driven manner

I For more details and references, see: Martijn Wieling, Eliza Margarethaand John Nerbonne (2012). Inducing a measure of phonetic similarityfrom pronunciation variation. Journal of Phonetics, 40(2), 307-314

I In the following, this method is used to obtain pronunciation distances onthe basis of English accented speech

24 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 27: Inducing and using phonetic similarity

The Speech Accent ArchiveAvailable online at http://accent.gmu.edu

25 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 28: Inducing and using phonetic similarity

Audio example

Listen to an example

26 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 29: Inducing and using phonetic similarity

Validating the PMI-based Levenshtein pronunciationdistances

I There is only a single study investigating the relation betweenLevenshtein distances and perceptual distances

I Gooskens and Heeringa (LVC, 2004)I Focusing on Norwegian dialectsI The reported correlation strength was r ≈ 0.7

I We conducted a new study based on the Speech Accent Archive,investigating the relation between perceptual and Levenshteinpronunciation distances

27 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 30: Inducing and using phonetic similarity

Outline of the perception experimentI We provided six sets of 50 audio samples (286 distinct speakers; 272

non-native) and asked native American English speakers to rate theirnative-likeness on a scale from 1 (very foreign sounding) to 7 (nativeAmerican English)

I More than 1100 (!) people filled in the questionnaire

28 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 31: Inducing and using phonetic similarity

Results of the perception experiment(Wieling, Bloem, Mignella, Timmermeister and Nerbonne, in prep.)

I We calculated the average native-likeness score for every speaker

I We used the PMI-based Levenshtein algorithm to obtain the automaticpronunciation distances for each of these speakers and the average U.S.speaker (based on 115 samples of native U.S. English speakers born inthe U.S.)

I The distance between two speakers is calculated by averaging the wordpronunciation distances between the two speakers

I The correlation between the two measures was r ≈ 0.8

I We may conclude that the Levenshtein distance is a valid measure todetermine perceptually sound pronunciation distances

29 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 32: Inducing and using phonetic similarity

Visualizing English pronunciationsI We used 989 phonetically transcribed samples from the SAAI We grouped the transcriptions (i.e. speakers) per countryI For non-English speaking countries, we excluded speakers who moved to

an English-speaking country before age 13I We only included countries with at least 5 speakers

I Pronunciation distances between countries were calculated using thePMI-based Levenshtein algorithm and are visualized using MDS

30 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 33: Inducing and using phonetic similarity

MDS visualization of accent distances88% of the variation in the original pronunciation distances is captured

31 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 34: Inducing and using phonetic similarity

ConclusionsI The PMI approach allows us to induce sensible sound distances

I The approach is readily applicable to any (dialect) dataset with similarpronunciations

I The PMI-based Levenshtein algorithm yields pronunciation distanceswhich correspond well with perceptual pronunciation distances

I Next step: determining pronunciation distances on the basis of themovements of the articulators during speech

I Work in progress...I If interested, ask about it!

32 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 35: Inducing and using phonetic similarity

Thank you for your attention!

33 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 36: Inducing and using phonetic similarity

Pronunciation similarity via articulography: intro

34 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 37: Inducing and using phonetic similarity

Connected to the articulography device

35 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 38: Inducing and using phonetic similarity

Pilot experiment

“The men hit the squirrel that walked on this verybroad street, but they just failed to kill him.”

American speaker German speaker

36 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 39: Inducing and using phonetic similarity

Tongue movement

37 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 40: Inducing and using phonetic similarity

Tongue tip movement

palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate

low

high

front backPosition front−back

Hei

ght

Start pronunciation ....... End pronunciation

Position tongue tip in mouth

American speaker: "that"

palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate

low

high

front backPosition front−back

Hei

ght

Start pronunciation ....... End pronunciation

Position tongue tip in mouth

German speaker: "that"

Listen to the pronunciations

38 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 41: Inducing and using phonetic similarity

Statistical analysis

low

high

Startpronun.

Endpronun.

Time

Hei

ght

Speaker

American

German

Height tongue tip: "that"

fron

tba

ck

Startpronun.

Endpronun.

Time

Pos

ition

fron

t−ba

ck

Speaker

American

German

Position tongue tip front−back: "that"

39 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 42: Inducing and using phonetic similarity

Tongue tip movement

palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate

low

high

front backPosition front−back

Hei

ght

Start pronunciation ....... End pronunciation

Position tongue tip in mouth

American speaker: "squirrel"

palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate

low

high

front backPosition front−back

Hei

ght

Start pronunciation ....... End pronunciation

Position tongue tip in mouth

German speaker: "squirrel"

Listen to the pronunciations

40 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 43: Inducing and using phonetic similarity

Statistical analysis

low

high

Startpronun.

Endpronun.

Time

Hei

ght

Speaker

American

German

Height tongue tip: "squirrel"

fron

tba

ck

Startpronun.

Endpronun.

Time

Pos

ition

fron

t−ba

ck

Speaker

American

German

Position tongue tip front−back: "squirrel"

41 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 44: Inducing and using phonetic similarity

Pilot conclusionsI Pronunciation and tongue movement may reveal different patternsI Generalized additive modeling allows us to extract the general movement

patterns while taking into account the individual variationI It also allows the incorporation of additional informative variables (e.g., the

age at which a non-native speaker learned English)

42 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen

Page 45: Inducing and using phonetic similarity

Thank you for your attention!

43 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen