Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online...

51
Online Linguistic Database Joel Dunham Sunday, April 28, 2013

Transcript of Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online...

Page 1: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Online Linguistic Database

Joel Dunham

Sunday, April 28, 2013

Page 2: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Context

• The tasks of documenting and analyzing endangered languages are urgent

• These are hard tasks

• ... but there is an intuition that the “grunt” work could be accomplished more quickly

Sunday, April 28, 2013

Page 3: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Online Linguistic Database (OLD)

• Software for creating web applications that facilitate language documentation and linguistic analysis

• www.onlinelinguisticdatabase.org

Sunday, April 28, 2013

Page 4: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

OLD overview• a program for creating web applications

• collaboratively created language databases

• open source – https://github.com/jrwdunham/old

• good documentation

• platform-agnostic (Mac, Linux, Windows)

• Python (Pylons), MySQL (SQLAlchemy), JavaScript (CoffeeScript, Backbone), HTML5

Sunday, April 28, 2013

Page 5: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

OLD

BLA OLD

onlinelinguisticdatabse.org

bla.onlinelinguisticdatabse.org

OKA OLDoka.onlinelinguisticdatabse.org

GIT OLDgit.onlinelinguisticdatabse.org

NTK OLDntk.onlinelinguisticdatabse.org

KUT OLDkut.onlinelinguisticdatabse.org

CRD OLDcrd.onlinelinguisticdatabse.org

CRK OLDcrk.onlinelinguisticdatabse.org

KWK OLDkwk.onlinelinguisticdatabse.org

a program for creating web applications ...

Sunday, April 28, 2013

Page 6: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

OLD 0.2.7

• in production: Blackfoot, Okanagan, Gitxsan, Nata, etc.

• Python server-side logic

• HTML/JavaScript GUI

OLD 1.0

OLD 1.0

• under development

• RESTful web service (Python)

• HTML5/CoffeeScript GUI

Aside on versions

Sunday, April 28, 2013

Page 7: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

multi-user, concurrent collaboration

• Imagine a field methods class producing a valuable resource of structured and formatted endangered language data simply as a byproduct of their normal workflow

Sunday, April 28, 2013

Page 8: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

multi-user, concurrent collaboration

• a typical field methods class generates at least 1,000 hours of recordings plus transcriptions & analysis over the course of the 5 years of its impact

Sunday, April 28, 2013

Page 9: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

File

File

File

Form

Form

Form

Collection

Corpus

Corpus

structure, presentation, exploration

Sunday, April 28, 2013

Page 10: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

chienchiendog‘dog’N

structure, presentation, exploration

ssPL‘plural’Phi

lelethe‘the’D

les chiensle-s chien-sthe-PL dog-PL‘the dogs’DPD-Phi N-Phile|the|D-s|PL|Phi chien|dog|N-s|PL|Phi

Sunday, April 28, 2013

Page 11: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

structure, presentation, exploration

Sunday, April 28, 2013

Page 12: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

structure, presentation, exploration

Sunday, April 28, 2013

Page 13: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

structure, presentation, exploration

Sunday, April 28, 2013

Page 14: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

input validation

Sunday, April 28, 2013

Page 15: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

input validation

Sunday, April 28, 2013

Page 16: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

search

• an unlimited number of filter expressions composed via boolean operators into a tree structure

• regular expressions

Sunday, April 28, 2013

Page 17: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

there are 3 words in the transcription

the morpheme break field contains the morpheme iksi

None of the translations contains

the string Bark or bark

and

it was entered earlier than Apr 24, 2013 at

4:57 p.m.

or

it was modified after Apr 24, 2013 at 4:57

p.m.

it was not elicited on the first or the third

day of 2012

and

it was entered by a user with an odd-

numbered ID

it has an ID greater than 10,000

it is not tagged as pseudo-data

search

Sunday, April 28, 2013

Page 18: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

category search

• search for high-level morpho-syntactic patterns via the system-generated syntactic category string value

• E.g., find DPs with one or more adjectival modifiers:

• D (A )+ N

• the big dog

• the big bad cat

Sunday, April 28, 2013

Page 19: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

unambiguous morpheme search

• get exactly the morpheme you are looking for via the system-generated break gloss category value

• E.g., search for s|PL to get exactly the morpheme /s/ glossed as “PL” and not s|PRES or i|PL

Sunday, April 28, 2013

Page 20: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phrase structure search

• supply your forms with syntactic representations via the syntax field:

(TP

(DP

(D le-s)

(N chien-s))

(VP

(V courr-aient)))

Sunday, April 28, 2013

Page 21: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phrase structure search

• give me all forms where a TP immediately dominates a DP and a VP:

TP < DP < VP

Sunday, April 28, 2013

Page 22: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

• Imagine you could specify phonological mappings in a familiar notation and use those specifications to implement phonological generators and parsers

Sunday, April 28, 2013

Page 23: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

• Applications:

• enter morpheme segmentations and have phonological models generate orthographic and phonetic transcriptons automatically

• incorporate phonological models into morphophonological parsers that output morpheme segmentations and glosses when given transcriptions as input

• specify competing phonologies, test them against large datasets and compare them

Sunday, April 28, 2013

Page 24: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

• Finite State Transducers – one way to do it

• context-sensitive phonological rewrite rules (cf. Chomsky & Halle, 1968) actually describe regular relations (Johnson 1972) and these can be represented by finite-state transducers (FSTs) (cf. Karttunen & Beesley, 2001)

Sunday, April 28, 2013

Page 25: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

context-sensitive

context-free

regular

Sunday, April 28, 2013

Page 26: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

• a phonology represented as ordered context-sensitive rewrite rules can be implemented as an FST

• FSTs are computationally tractable

• FSTs work equally well for parsing and generation

• OLD uses foma (open source, https://code.google.com/p/foma/) to compile phonologies to FSTs so we can put them to good use

Sunday, April 28, 2013

Page 27: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological modelsBlackfoot phonological rules (Frantz 1997)

Sunday, April 28, 2013

Page 28: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological modelsFrantz’s Blackfoot phonology as a Foma script

Sunday, April 28, 2013

Page 29: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Phonology

Ø -> s | I _ tI -> Øw -> Ø | # _...

/waanIt/ [aanist]

generate

phonological models

Sunday, April 28, 2013

Page 30: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Phonology

Ø -> s | I _ tI -> Øw -> Ø | # _...

/waanIt//waanist//aanIt/...

[aanist]

parse

phonological models

Sunday, April 28, 2013

Page 31: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

phonological models

• “But I work in OT, autosegmental representations, etc. – why should I rewrite my phonology as ordered rewrite rules?”

• practical benefits of implementation in the OLD

• probably not a bad idea to try to capture your generalizations in a variety of frameworks

Sunday, April 28, 2013

Page 32: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

• Imagine the database could auto-generate a morphology that accepted sequences of morphemes that constitute valid words and rejected those that do not

morphological models

Sunday, April 28, 2013

Page 33: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Morphology

chienchiendog‘dog’N

ssPL‘plural’Phi

lelethe‘the’D

lalathe‘the’D

chatchatcat‘cat’N

Lexicon Corpus

les chiensle-s chien-sthe-PL dog-PL‘the dogs’D-Phi N-Phi

Words Corpus

le chatle chatthe cat‘the cat’D N

D -> [ le, la ]N -> [ chien, chat ]Phi -> [s]

word -> [ N | D | N-Phi | D-Phi ]

morphological models

Sunday, April 28, 2013

Page 34: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Morphology

D -> [ le, la ]N -> [ chien, chat ]Phi -> [s]

word -> [ N | D | N-Phi | D-Phi ]

( (‘chien’, ‘dog’), ‘-’, (‘s’, ‘PL’))

‘chien-s’

morphological models

Sunday, April 28, 2013

Page 35: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Morphology

Ø ‘s-chien’

D -> [ le, la ]N -> [ chien, chat ]Phi -> [s]

word -> [ N | D | N-Phi | D-Phi ]

morphological models

Sunday, April 28, 2013

Page 36: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Frantz’BlackfootPhonology

as FST

.../-N---M-i---t---sí-hsi-hsp---M-i--y-i-//-N---M-i---t---sí-hsi-hsp---M-i--y-i--//-N---M-i---t---sí-hsi-hsp---M-i--y-i---//-N---M-i---t---sí-hsi-hsp---M-i--yM-i//-N---M-i---t---sí-hsi-hsp---M-i--yM-i-//-N---M-i---t---sí-hsi-hsp---M-i--yM-i--//-N---M-i---t---sí-hsi-hsp---M-i--yM-i---//-N---M-i---t---sí-hsi-hsp---M-i--yN-i//-N---M-i---t---sí-hsi-hsp---M-i--yN-i-//-N---M-i---t---sí-hsi-hsp---M-i--yN-i--/.../nit-ihpiyi/.../-N---M-i---t---si-hsspI-yy-S-i---//-N---M-i---t---si-hsspI-yy-I//-N---M-i---t---si-hsspI-yy-I-//-N---M-i---t---si-hsspI-yy-I--//-N---M-i---t---si-hsspI-yy-I---//-N---M-i---t---si-hsspI-yy-I-y//-N---M-i---t---si-hsspI-yy-I-y-//-N---M-i---t---si-hsspI-yy-I-y--//-N---M-i---t---si-hsspI-yy-i//-N---M-i---t---si-hsspI-yy-i-//-N---M-i---t---si-hsspI-yy-i--//-N---M-i---t---si-hsspI-yy-i---/...

[nitsspiyi]

parse

morphological parsers

Sunday, April 28, 2013

Page 37: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

BlackfootMorphology

FST extracted from database

/-N---M-i---t---si-hsspI-yy-I-/

recognize

Ø

morphological parsers

Sunday, April 28, 2013

Page 38: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

BlackfootMorphology

FST extracted from database

( (‘nit’, ‘1’), ‘-’, (‘ihpiyi’, ‘dance’))

/nit-ihpiyi/

recognize

morphological parsers

Sunday, April 28, 2013

Page 39: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

PhonologyFST

MorphologyFST

Morpho-phonological

FST

( (('nit', '1'), '-', ('ihpiyi', 'dance')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', '0')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', '3PL')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', '3pl')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', '4PL')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', 'be')), (('nit', '1'), '-', ('sspi', 'among'), '-', ('yi', 'have')), (('n', '1'), '-', ('it', LOC), '-', ('ihpiyi', 'dance')), (('n', '1'), '-', ('it', loc), '-', ('ihpiyi', 'dance')))

nitsspiyi

morphological parsers

Sunday, April 28, 2013

Page 40: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

PhonologyFST

MorphologyFST

Morpho-phonological

FST

nitsspiyi

Languagemodelcorpus

Ranker

(('nit', '1'), '-', ('ihpiyi', 'dance'))

morphological parsers

Sunday, April 28, 2013

Page 41: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

user writes a phonology

user assemblesthree corpora: lexicon, words and language

modelMorpho-phonological

parser

morphological parsers

Sunday, April 28, 2013

Page 42: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

parser

parser

parserparser

parser

morphological parsers

Sunday, April 28, 2013

Page 43: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

• imagine our collaboratively created repository of language data could be used to advance the goals of those involved in language revitalization and teaching

• imagine a linguistics field methods class being able to immediately give back to the speech community

repurposing data

Sunday, April 28, 2013

Page 44: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

repurposing data: OLD architecture

OLD RESTfulweb service

Browser AppGUI for fieldwork &

research

HTTP

JSON

Mobile Apptalking dictionary

HTTP

JSON

Mobile Applearning games

HTTPJSON

> Appget data quickly

& flexibly

HTTPJSON

Sunday, April 28, 2013

Page 45: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Formchien‘dog’

File(image)

File(audio)

recording of speaker saying

chien

image of a dog

repurposing data:

Sunday, April 28, 2013

Page 46: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Conclusions

• The OLD is an interesting and useful application for language documentation, linguistic analysis and (potentially) revitalization efforts

Sunday, April 28, 2013

Page 47: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Conclusions

• There is useful work to be done in the intersection of theoretical linguistics, language documentation, computational linguistics, language revitalization and software development

Sunday, April 28, 2013

Page 48: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Conclusions• Use the OLD for your own fieldwork

• Use the OLD in your next field methods class

• Contribute to the OLD

• contribute to the code of the core web service app

• write a user-facing app that makes use of an OLD web service!

• use it, critique it, read the docs, critique them, ...

Sunday, April 28, 2013

Page 49: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Works cited

• Chomsky, N. and Halle, M. 1968. The Sound Pattern of English. Harper and Row, New York.

• Johnson, C. D. 1972. Formal Aspects of Phonological Description. Mouton, The Hague.

• Karttunen, L and Beesley, K. R. 2001. A Short History of Two-Level Morphology. In ESSLLI-2001.

Sunday, April 28, 2013

Page 50: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

Thank you

• NWLC organizers & presenters

• brilliant & hard working field linguists

• SSHRC ITST grant no. 849-2009-0056

Sunday, April 28, 2013

Page 51: Online Linguistic Databasejrwdunham.com/files/NWLC_2013_Dunham_4.pdf · 2014-12-17 · Online Linguistic Database (OLD) • Software for creating web applications that facilitate

OLD

• www.onlinelinguisticdatabase.org

Sunday, April 28, 2013