Uncovering the TEI and ODD A pedagogical strip-tease Laurent Romary - Max Planck Digital Library.
Representing dictionaries with the TEI Proposal for basic guidelines Laurent Romary - Max Planck...
-
Upload
hillary-dalton -
Category
Documents
-
view
217 -
download
3
Transcript of Representing dictionaries with the TEI Proposal for basic guidelines Laurent Romary - Max Planck...
Representing dictionaries with the TEI
Proposal for basic guidelines
Laurent Romary - Max Planck Digital Library
With the help of Susanne Alt - CNRS
Background
• The P5 edition of the TEI guidelines– XML– ODD - Roma
• Modules and classes
– DTD, RelaxNG, W3C schemas
• The dictionary chapter– Very close to the P4 version– Work to be done
• Enhancing the coherence with the class system• Providing more examples• …
Proposal for today
• Browse through the main features of the dictionary chapter– Identify questionable issues– Select best practices
• Work with Roma and implement (part of) the best practices– Minimal schema that dictionary project can start
with• Bottom approach to customization
• Discuss about conformance
Dictionaries as TEI documents
• Same general document structure as any other TEI document– <teiHeader>, <text>
• Define a common strategy concerning source identification with general text sources
• Specific documentation of previous editions• Intuition that <teiCorpus> is not to be retained here
– <front>, <body>, <back>– Divisions…
• Strong case for unnumbered <div>s• Can we recommend/implement a basic dictionary oriented
typology?
Issues
[see Wuerzburg.xml]
• Providing precise guidelines for– <publicationStmt>
• Elicit the role and possible content of <publisher>
– <sourceDesc>• Base the guidelines on <biblStruct> (<biblItem
>?) and <listBibl>
Describing dictionary entries
• A variety of possible objects– <entry>, <entryFree> <superEntry>, <dictScrap>– <hom>, <re>
• First issue: dealing with the editorial workflow– Keep <dictScrap> for ongoing tagging activity
• depends on the degree of structure of the dictionary
– Stay consistent in the use of entry/entryFree/superEntry/hom
• Strong feeling for limiting ourselves to <entry>
– Point to the importance of <re>• Embedded entries
Finding the right granularity
• The core lexical unit: <entry>– Should be used coherently in a dictionary project
to gather up homogenous lexical objects
• Possible combination with:– <superEntry> to group sets of homographs
• Should only be used to record such a feature when it exists in legacy data
• Should be avoided for new editorial projects
– <hom> to subdivide senses in groups of homonyms
Example
• Recording a series of homographs with <superEntry>
<body> <entry/> <entry/> <superEntry> <entry type="hom" n="1"/> <entry type="hom" n="2"/> </superEntry> </body>
• Issues– Values of ‘n’ attribute according to the source– Values of type defined in ‘att.entryLike’
Example
• Recording a series of homographs with <hom><entry> <hom n="1"> <sense n="1"/><sense n="2"/> </hom> <hom n="2"> <sense n="1"/><sense n="2"/><sense n="3"/> </hom> </entry>
• Issues– Weak boundary between polysemes and homonyms– Why not just have separate entries?
From word to senses…
• Background– Semasiological vs. onomasiological views
on lexical data• Two complementary data organisations• Two sets of standards
– In ISO: TMF (ISO 16642) vs. LMF– In the TEI: Terminology vs. Print dictionary chapters
The LMF Model
Lexical DB
1..1
Global Info
1..1
Lexical Entry
0..n
1..1
1..1
Form
1..1
0..n
1..1
0..n
1..1
Sense
Consequences for dictionaries
• Strong <form> to <sense> orientation– <form> qualifies the entry, with the identification of
the headword and its morphological variations– <sense> is subordinated to the choice made for
<form>– Role of grammatical information
• Overall qualification of the entry• Qualification of morphological variants
• Issue– <re> does not necessarily fit into the theory
Example
• Basic structure of an <entry><entry>
<form>
<orth>chat</orth>
</form>
<sense>
<def>Petit animal familier</def>
</sense>
</entry>
Representing form and grammar
• General issues– Multiple forms
• <orth>, <pron>, etc.
– Compounds• May be represented using embedded forms
– Role of grammar (<gramGrp>)• In isolation: qualifies the entry• Within a form: marks special features associated with the
form
– Inflexions• Can be reprensented by means of additional <form>’s
Example
• A simple entry<entry>
<form><orth>chat</orth><pron>∫a</pron>
</form><gramGrp>
<pos>N</pos><gen>f<gen>
</gramGrp></entry>
Example
• Simple entry with inflected form<entry>
<form type=“lemma”><orth>chat</orth>
</form><gramGrp>
<pos>N</pos><gen>m</gen>
</gramGrp><form type=“inflected”>
<orth>chats</orth><gramGrp>
<number>p</number></gramGrp>
</form></entry>
<form>: the case of the Campe dictionary
• Step 1: Dealing with the presence of determiners<form type=“lemma”>
<form type=“determiner”>
<orth>Das</orth>
</form>
<form type=“headword”>
<orth>Aak</orth>
</form>
</form>
<form>: the case of the Campe dictionary
• Step 2: adding grammatical information<form type=“lemma”>
<form type=“determiner”><orth>Das</orth><gramGrp>
<pos value=“D”/><gen>n</gen>
</gramGrp></form><form type=“headword”>
<orth>Aak</orth><gramGrp>
<pos>N</pos><gen>n</gen>
</gramGrp></form>
</form>
<form>: the case of the Campe dictionary
• Step 3: dealing with inflected forms<form type=“inflected”>
<form type=“determiner”>
<orth>des</orth>
<gramGrp>…</gramGrp>
</form>
<form type=“headword”>
<orth><oVar><oRef/>-es</oVar></orth>
<gramGrp>
<case value=“G”>G</case>
</gramGrp>
</form>
</form>
Main arguments for the proposed changes
• Coherent use of <form> and <orth>– Accounts for a coherent access to
orthographic information in form/orth
• Coherent use of grammatical features– Danger of tag abuse with
• <gram type=“art_n”>Das</gram>– ‘type’ attribute should indicate a grammatical feature– <gram> content should be the value of that feature– Non differentiation of features (art_n -> pos + gen)
<sense>: main components
• Core elements– <def>: to provide the definition– <dicteg>
• Need to establish guidelines on the identification of sources
– <etym>: a complex issue…
Documentation des exemples
<dicteg><cit>
<q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q><biblStruct>
<author>BENOIT M, MICHEL C.</author><title>Le Parler de Metz et du pays messin</title><imprint>
<pubPlace>Metz</pubPlace><publisher>Serpenoise</publisher><date>2001</date><biblScope>p. 38</biblScope>
</imprint></biblStruct>
</cit></dicteg>
<dicteg><q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q></dicteg>
<dicteg><cit><q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q><bibl>Benoit M., Michel C., Le Parler de Metz...</bibl>
</cit></dicteg>
A quick glimpse into Roma
• A journey in three steps– Adding the PD module and generating a
schema– Checking out elements– Expressing constraints on specific values