Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan...
-
Upload
shannon-mclaughlin -
Category
Documents
-
view
212 -
download
0
Transcript of Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan...
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA1
Metadata
Helen Aristar DryEastern Michigan University
LINGUIST List
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 2
Outline
What is metadata? Why use OLAC metadata? How can you write OLAC
metadata for your resources?Metadata in XMLUsing ORE
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 3
Preliminaries
Language documentation is valuable only if it is findable
On the Internet, this means “findable by computational means”
Efficient search and retrieval of language resources requires the use of metadata
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 4
Metadata is: Structured data about data Similar to catalogue information Usually consists of a set of
elements, each of which describes a property of the resource
The elements of a metadata set can be encoded in different “languages,” e.g., html, xml, rdf/xml
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 5
An example
Title: Biao Min Data Creator (depositor): David Solnit Subject (linguistic field): Language
Description Subject (language): Biao Min Date created: April 5, 1982 Description: The Biao Min data on the E-
MELD site includes over 3,000 lexical items. . . . .
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 6
Example in HTML <meta name=“DC.title“ content=“Biao Min
Data” /> <meta name=“DC.creator” content=“David
Solnit” /> <meta name=“DC.subject”
content=“Language Description” /> <meta name=“DC.subject” content=“Biao
Min” /> <meta name=“DCTERMS.created”
content=“1982-04-05” /> <meta name=“DC.description” content=“The
Biao Min data on the E-MELD site includes over 3,000 lexical items. . . . .” />
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 7
Example in XML
<title> Biao Min Data </title> <creator xsi:type="olac:role"
olac:code="depositor"> David Solnit </creator>
<subject xsi:type="linguistic-field" olac:code="language_description"/>
<subject xsi:type="olac:language" olac:code="x-sil-BJE"> Biao Min </subject>
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 8
Metadata
Different metadata specifications: MARC, METS, Dublin Core, IMDI, OLAC
IMDI & OLAC designed specifically for language documentation
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 9
OLAC Metadata
Product of the Open Language Archives Community
http://www.language-archives.org/
Strengths:Ease of creationSearch & retrieval via the protocols
of the Open Archives Initiative
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 10
Open Archives Initiative
Cross-disciplinary initiative for search and retrieval of metadata from multiple archives
Establishes protocols for “harvesting” metadata records of participating archives and making them available via “Service Providers.”
Supports formation of discipline-specific sub-communities such as OLAC (Open Language Archives Community)
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 11
LINGUIST List = OLAC Gateway
LINGUIST List is the main service provider for OLAC
Harvests metadata from 27 major archives
Collects metadata from individual linguists about their language documentation
Offers search interface for over 30,000 records of language-related data
See: http://linguistlist.org/olac/
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 12
OLAC Metadata OAI uses the Dublin Core (DC) metadata
standard 15 elements (each optional & repeatable) Core vocabulary for refining elements
(dcterms) Sub-communities may qualify DC
metadata to suit their specific needs OLAC has qualified DC metadata to better
describe language resources.
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 13
OLAC Qualifies 5 of the 15 DC Elements
Language Publisher Relation Rights Source Subject Title Type
Contributor Coverage Creator Date Description Format Identifier
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 14
OLAC recommends 5 extensions:
Language OLAC language
Subject OLAC Language Linguistic Field
Type Linguistic Data Type Discourse Type
Contributor Role
Creator Role
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 15
Provides a controlled vocabulary for identifying the role of a Creator or Contributor more precisely. The vocabulary identifies approximately twenty roles that are common in the development of language resources.
Examples: depositor, signer, transcriber, respondent, editor, consultant, researcher.
Documentation:
http://www.language-archives.org/REC/role.html
Participant Role
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 16
Language Identification:
Provides codes for identifying all known languages, both living and extinct.
Applies to: Language, Subject
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 17
Linguistic Field
Provides codes for identifying the content of a resource as relevant to a particular subfield of linguistic science
Applies to: Subject Examples: anthropological_linguistics ,
applied_linguistics, cognitive_science, computational_linguistics , lexicography, discourse_analysis,
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 18
Describes the resource as representing a recognized structural type of linguistic information
Applies to: Type Examples:
Lexicon Primary text Language description Dataset (Already in DCterms).
Linguistic Data Type
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 19
Discourse Type
Provides a controlled vocabulary for identifying approximately ten discourse types. It is used with Type to identify the genre of a language resource (particularly a primary text).
Types: Interactive Discourse, Report, Singing, Oratory, Narrative, Formulaic Discourse, Procedural Discourse, Language Play, Unintelligible Speech
http://www.language-archives.org/REC/discourse.html
Jan 9, 2004Symposium on Best Practice
LSA, Boston, MA 20
See “metadata” in the E-MELD School of
Best Practices:
http://emeld.org/school/classroom/metadata
Or use the OLAC Repository Editor:
See: http://linguistlist.org/ore/
Writing metadata