Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The...
-
Upload
sherilyn-carson -
Category
Documents
-
view
221 -
download
0
Transcript of Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The...
Nov 21, 2005 University of Texas at Austin
The E-MELD Project
Helen Aristar Dry & Anthony Aristar
The LINGUIST List
Eastern Michigan U & Wayne State U
Nov 21, 2005 University of Texas at Austin
E-MELDElectronic Metastructure for Endangered
Languages Documentation
5 year NSF project, 2001-6 Linguist List, ELF, LDC Goal: To aid in
•…the preservation of endangered languages data•…the development of infrastructure for electronic archives
Nov 21, 2005 University of Texas at Austin
Summary of the problem (2001):
EL resources were/are Difficult to find Difficult to use Difficult to preserve
Needed: More uniformity in naming, cataloguing,
annotating, i.e., interoperable standards More knowledge of how to create digital
resources that last
Nov 21, 2005 University of Texas at Austin
Problems with EL resources
Difficult to find At distributed sites Language names ambiguous No central catalog of resources or
cataloging information (metadata) Lack of interoperability among archives
Difficult to display accurately Idiosyncratic character encoding Specific fonts needed
Nov 21, 2005 University of Texas at Austin
Problems with EL resources, 2
Difficult to compare Non-standard terminology Idiosyncratic markup & annotation
schemes Difficult to manipulate or reuse
Specific software needed (incl. specific software version), e.g. MSWord 1.0
Meaning represented via formatting, which was not documented
bold represents “headword”
Nov 21, 2005 University of Texas at Austin
Problems with EL resources, 3
Impermanent—vulnerable to:
Deterioration of the physical media
Hardware obsolescence Software obsolescence
Nov 21, 2005 University of Texas at Austin
PHONOGRAMMARCHIV - AUSTRIAN ACADEMY OF SCIENCE
slide from Dietrich Schüller, Director
Nov 21, 2005 University of Texas at Austin
Toward a Solution: E-MELD Components
Involve linguistics community in developing standards
Promote consensus about: Language Identification Metadata Annotation and markup
Teach and facilitate implementation of “best practices” in the creation of digital language documentation
Nov 21, 2005 University of Texas at Austin
Promoting consensus : annual workshops
2001, Santa Barbara, CA: The Need for Standards
E-MELD 2002, Ann Arbor, MI: Digitizing Lexical Information
E-MELD 2003, Lansing, MI: Digitizing Texts
E-MELD 2004, Detroit, MI: Databases and Best Practice
E-MELD 2005, Cambridge, MA: Linguistic Ontologies & Terminology
Nov 21, 2005 University of Texas at Austin
2006 E-MELD Workshop on Digital Language Documentation
Michigan State University June 20-22, 2006 In conjunction with the 2006 Summer
Meeting of the Linguistic Society of America
Topic: Electronic Archiving and Digital Tools: Current State & Future Directions
Please come!
Nov 21, 2005 University of Texas at Austin
Finding resources: metadata OLAC metadata standards (subcommunity of
OAI) OLAC search engine on LL site:
http://linguistlist.org/olac OLAC metadata editor on LL site:
http://linguistlist.org/olac/ore XSL Stylesheets for transformation /
presentation of OLAC metadata Ethnologue/LL language codes proposed as
ISO standard
Nov 21, 2005 University of Texas at Austin
Using resources: comparing and finding annotation
Ontologies developed (as interlanguage between markups and as search aids) GOLD: General Ontology for
Linguistic Description (morphosyntax) OPF: Ontology of Phonetic Features
(based on Ladefoged & Madison) ODIN Project: mining interlinear
glossed text on the web (Will Lewis et al)
Nov 21, 2005 University of Texas at Austin
Using resources: Tools Tools to encourage use of the ontology:
OntoElan: text annotation (modification of MPI’s Elan)
OntoGloss: stand-off annotation tool FIELD: lexical input
Tool to encourage use of Unicode CharWrite: input of Unicode characters
Facility to encourage use of OLAC metadata Stylesheet library ORE
Nov 21, 2005 University of Texas at Austin
Facilitating ‘Best Practices’ in resource creation
Creation of reference website School of Best Practices in Digital Language
Documentation http://emeld.org/school/ Addressed to the individual linguist who
creates language documentation
Nov 21, 2005 University of Texas at Austin
What should the linguist do?
To ensure that digital data endure long into the future:
1. Create an archival copy: Put the materials into an enduring file format.
2. Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed.
Nov 21, 2005 University of Texas at Austin
Organization of the School
Entrance Hall: orientation Classroom: lessons & tutorials Reading Room: bibliography Work Room: online work Tool Room: links to tools Help (incl. Ask an Expert) Case Studies: documentation of
10 ELs digitized according to best practices
Nov 21, 2005 University of Texas at Austin
Currently School has:
Documentation from 12 ELs:
Mocovi Kayardild
Monguor Potawatomi
Tofa Ega
Saliba Navajo
Biao Mien W. Sissala
(Chorote) (Nivacle)
Nov 21, 2005 University of Texas at Austin
Current Initiatives
Identify and record metadata for legacy documentation
Improve the ontology (GOLD) – incorporate suggestions from 2005 E-MELD workshop
Finish prototyped software
Nov 21, 2005 University of Texas at Austin
Future: finish prototyped software
OntoElan: ontology-aware modification of MPI’s Elan annotation tool
OntoGloss: ontology-aware stand-off annotation tool
CharWrite: downloadable tool for web-input of Unicode characters
FIELD: Field Input Environment for Linguistic Data
All but OntoGloss available through the School of Best Practices website
Nov 21, 2005 University of Texas at Austin
Current Initiatives: School of BP
Make the School even more practical Distinguish between good, better, best
practice Emphasize
explicit ‘how-to’ pages Different paths for different user types Advice from experts, e.g. “equipment on a
budget” page, Ask-An-Expert
Nov 21, 2005 University of Texas at Austin
Practices in resource creation
Good practice: ensure preservation Better practice: ensure longterm
intelligibility “We don’t want to create another
Rosetta Stone” - Whalen, 2003 Best practice: promote interoperability
Nov 21, 2005 University of Texas at Austin
School of Best Practices in Digital Language Documentation
http://emeld.org/school/
Nov 21, 2005 University of Texas at Austin
Future Directions
MultiTree LL-MAP
Nov 21, 2005 University of Texas at Austin
What is MultiTree?
3-year grant Database of all hypothesized language
relations Ultimately linked to GIS database Interface to allow linguists to input updates Panel of experts to assess input
Nov 21, 2005 University of Texas at Austin
LL-MAP
Collect geographically linked linguistic data Build this into a GIS system, allowing layers of information to be
built into a single map
Then…
Build tools for querying, annotating and discussing this data Build tools which allow new language data from linguists and
anthropologists to be incorporated into this system