The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million...
-
Upload
mafalda-klettenberg-di-castro -
Category
Documents
-
view
214 -
download
0
Transcript of The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million...
![Page 1: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/1.jpg)
The Challenge of MorphologyMapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers)
Allkütulekefun
![Page 2: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/2.jpg)
The Challenge of MorphologyMapudungun
-ke -fu -n-leAllkütu
![Page 3: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/3.jpg)
The Challenge of MorphologyMapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 4: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/4.jpg)
The Challenge of MorphologyMapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
I
![Page 5: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/5.jpg)
The Challenge of MorphologyMapudungun
I used to
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 6: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/6.jpg)
The Challenge of MorphologyMapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 7: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/7.jpg)
The Challenge of MorphologyMapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Tasks for Morphology• Segment Words• Map Morphemes onto Features
![Page 8: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/8.jpg)
The Challenge of Morphology
Tasks for Morphology
• Segment Words• Map Morphemes
onto Features
• Learn these tasks– unsupervised – from data – for any language
![Page 9: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/9.jpg)
• Paradigm– Set of affixes that interchangeably
attach to a set of stems– English Example
• Regular Verbs: Ø.s.ing.ed• Regular Adj: Ø.er.est
Leverage the Natural Structure of Morphology
![Page 10: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/10.jpg)
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 11: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/11.jpg)
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 12: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/12.jpg)
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 13: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/13.jpg)
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 14: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/14.jpg)
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 15: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/15.jpg)
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 16: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/16.jpg)
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
e.esblamsolv
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
![Page 17: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/17.jpg)
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Ø.s.dblame
sblameroamsolve
e.esblamsolv
![Page 18: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/18.jpg)
e.esblamsolv
e.edblam
esblamsolv
Ø.s.dblame
Ø.sblamesolve
Øblameblamesblamedroams
roamedroaming
solvesolvessolving
e.es.edblam
edblamroam
dblameroame
Ø.dblame
s.dblame
sblameroamsolve
es.edblam e
blamsolv
me.mesbla
me.medbla
mesbla
me.mes.medbla
medblaroa
mes.medbla
mebla
![Page 19: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/19.jpg)
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
Spanish Newswire Corpus40,011 Tokens
6,975 Types
19
![Page 20: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/20.jpg)
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
20
Suffixes
Stems
Level 5 = 5 suffixes
Stem Type Count
![Page 21: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/21.jpg)
a.as.o.os43
african, cas, jurídic, l, ...
Adjective Inflection Class
21
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
From the spurious suffix “tro”
![Page 22: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/22.jpg)
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.o.os43
african, cas, jurídic, l, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
22
Dec
reas
ing
Ste
m C
ount
Incr
easi
ng S
uffix
Cou
nt
Basic Search Procedure
![Page 23: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/23.jpg)
Scaling Up
• Scaling Up– 1 Million word corpus– Network built on demand
• New Approach to Search– High Recall initial search– Weed the results to improve precision
• Results– Boost Recall of Suffixes in Spanish
• from 0.5 to 0.8– But very low precision currently
![Page 24: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/24.jpg)
Top Examples of Selected Schemes
1 Million Words of Spanish
Suffixes # of Stems
Part of Speech
Ø.s 2 Noun
a.as.o.os 4 Adjective
Ø.ba.ban.da.das.do.dos.n.ndo.r.ron.rse.rá.rán.ría.rían 16 Verb (-ar)
Ø.es 2 Noun
a.aba.aban.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.e.en.ó
18 Verb (-ar)
Ø.a.emos.on.se.á.án.ía.ían 9 Verb (-ar/-er/-ir)
ones.ón 2 Nominalization
l.les 2 Noun
![Page 25: The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun.](https://reader034.fdocuments.net/reader034/viewer/2022051705/5706386a1a28abb82390454b/html5/thumbnails/25.jpg)
Next Steps for Morphology Induction
• Clean the Selected Schemes– Current Work
• Convert Paradigms into a Segmenter– Soon
• Agglutinative sequences of suffixes– Soon
• Learn Mappings from Morphemes to Features– Future Goal