Une approche basée sur la langue naturelle pour la modélisation de documents structurés
description
Transcript of Une approche basée sur la langue naturelle pour la modélisation de documents structurés
Yves Marcoux - OLST-RALI - 21 mars 2007 1
Une approche basée sur la langue naturelle pour la modélisation de
documents structurés
Yves MARCOUXGRDS – EBSI
Université de Montréal
Yves Marcoux - OLST-RALI - 21 mars 2007 2
A natural-language approach to modeling
Why is some XML so difficult to write?
<http://www.idealliance.org/papers/extreme/proceedings/html/2006/Marcoux01/EML2006Marcoux01.html>
Yves Marcoux - OLST-RALI - 21 mars 2007 3
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 4
Writing well-formed XML: author’s choices
• <sex><male /></sex>• <is-female>FALSE</is-female>• <gender gender="♂" />• <note>It's a boy!</note>
♂ = ♂
Yves Marcoux - OLST-RALI - 21 mars 2007 5
Writing valid XML is collaborative work
• Modeler has chosen the markup (container)
• Author supplies the contents
• Much like a form
• Collaborative work communication between parties: modeler and author
• But the modeler is gone…
Yves Marcoux - OLST-RALI - 21 mars 2007 6
Problem
• Authoring environments are:– good at conveying the syntactic intentions (or
decisions) of the modeler– not as good at conveying the semantic
intentions of the modeler
• Often, all there is is a generic ID or some slightly more developed form– Ex.: “date” in a memo
Yves Marcoux - OLST-RALI - 21 mars 2007 7
What is available?
• More or less developed forms of genIDs (and attribute names)
• General documentation of the model
• Per element (attribute) documentation
• OK for tooltips or popups
• Could we do better?
• (Applications / stylesheets are not appropriate)
Yves Marcoux - OLST-RALI - 21 mars 2007 8
Could we aim at…
• Having a semantic conversation right in the editing window?
• In the same way that there is actually a syntactic conversation?
• Yes…
Yves Marcoux - OLST-RALI - 21 mars 2007 9
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 10
Key idea
• Have modeler prepare bits of NL (prose)
• That can be intertwined with author-supplied contents to give them meaning
• Allows “fill-in”-like sentences
• And thus, a semantic conversation in the editing window
• NB: modeler segments can contain hyperlinks
Yves Marcoux - OLST-RALI - 21 mars 2007 11
Example
Facts about some US cities
City PopulationAnnual snowfall (inches)
Denver 850,000 23
Rochester 240,000 88
Palm Spring 48,000 0
Yves Marcoux - OLST-RALI - 21 mars 2007 12
Raw XML
<facts-about-US-cities> <city> <name>Denver</name> <population>850,000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <city> <name>Rochester</name> <population>240,000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city> ...</facts-about-US-cities>
Yves Marcoux - OLST-RALI - 21 mars 2007 13
Prose equivalent
Here are facts about some US cities. The city of Denver has a population of 850,000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.
Yves Marcoux - OLST-RALI - 21 mars 2007 14
Modeler prepares “peritext” segments
Element text-before text-after
facts-about-US-cities"Here are facts about some US cities."
empty
city " The city " "."
name "named " empty
population" has a population of "
empty
annual-snowfall-in-inches" and an annual snowfall of "
" inches"
Yves Marcoux - OLST-RALI - 21 mars 2007 15
Possible “semantic” view
Here are facts about some US cities. The city named Denver has a population of 850,000 and an annual snowfall of 23 inches. The city named Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city named Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.
Yves Marcoux - OLST-RALI - 21 mars 2007 16
What it allows during editing (in semantic view)
• Peritexts convey the semantic intentions of the modeler
• A semantic conversation takes place in the editing window (instead of a syntactic one)
• Fill-in sentences:– Make “tag abuse” embarrassing…– Likely to reduce some kinds of errors
• Other views / fragment viewing / hyperlink
Yves Marcoux - OLST-RALI - 21 mars 2007 17
Discussion
• This is not like defining an application– Not a stylesheet mechanism
• Peritexts (fixed here) could be allowed to vary with some parameters:– position among siblings– attribute value– etc.
• (Attributes should be treated)
Yves Marcoux - OLST-RALI - 21 mars 2007 18
Why does it work?
• Sometimes tricky (see paper), but…
• NL has very high affordance
• NL can act as it’s own metalanguage
• XML contents + NL usually mix pretty well
Yves Marcoux - OLST-RALI - 21 mars 2007 19
Intertextual semantics
• Meaning of a text fragment is given by placing it in a network of other texts
• That network can simply consist in a sentence (or “quasi-sentence”)
• Or more elaborate topology: peritexts can contain hyperlinks, determining sense-making / learning paths– Too much hyperlinking can spoil the idea!
Yves Marcoux - OLST-RALI - 21 mars 2007 20
Interpretation workflow
• d is document or fragment, H is a human• S(d) is the intertextual semantics of d• S(d) is in NL• S is machine computable• Actual meaning of d for H may vary:
– with H– for a same H, from one “reading” of S(d) to
another
d S(d) actual “meaning” of d for HS H
Yves Marcoux - OLST-RALI - 21 mars 2007 21
Interpretation workflow
d
d S(d)
H1
H1
H2
H2
H3
H3
Yves Marcoux - OLST-RALI - 21 mars 2007 22
Suggests a modeling process
• Modeler starts with the prose
• Identify peritexts
• Work out more and more abbreviated forms– Will correspond to different “views” in the
editor
• Tersest level gives markup
• Increase model usability?
Yves Marcoux - OLST-RALI - 21 mars 2007 23
Mixed content question revisited
• Known: can get rid of mixed content with<!ELEMENT text (#PCDATA)>
Example:<!ELEMENT (e1 | e2 | … | #PCDATA)*>
becomes:<!ELEMENT (e1 | e2 | … | text)*>
• Why does it feel bad?– Tags “text” are not abbreviations of any
reasonable peritexts!
Yves Marcoux - OLST-RALI - 21 mars 2007 24
Is NL too much to ask for?
• Relative to some “target” community
• Can go a long way (previous slide)
• Hyperlinks are allowed in peritexts– Allows defining “sense-making” or learning
paths
• (Almost) anything formal can be turned into NL…
Yves Marcoux - OLST-RALI - 21 mars 2007 25
NL as formalism common denominator
Expression in artificial formalism
Textbook explaining formalism STAPLER
Equivalent expression in NL
Yves Marcoux - OLST-RALI - 21 mars 2007 26
Editing setup without intertextual semantics
Modeler
Author
Valid XMLinstance or fragment
World
NL and presupposed
knowledge of target community
XML EDITOR
XML DTD
Doc. / tr.material
Yves Marcoux - OLST-RALI - 21 mars 2007 27
Editing setup with intertextual semantics
Modeler
Author
Valid XMLinstance or fragment
World
NL and presupposed
knowledge of target community
XML EDITOR
XML DTDtext-before
and text-aftersegments
NL equivalent
Yves Marcoux - OLST-RALI - 21 mars 2007 28
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 29
What it suggests
• Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design– E.g., don’t abuse hyperlinking
• Litterate modeling, litterate interfaces– Litterate interface / interaction design
• Benefit: make explicit prerequisite knowledge & sense-making / learning paths
Yves Marcoux - OLST-RALI - 21 mars 2007 30
Other possible uses of intertextual semantics
• Legal documents with multiple renditions• NLP systems that cannot treat markup
– Including full-text indexing• <ex>Hamlet</ex>• “Exit Hamlet”
• Other data models– Ex.: relational
• Normal forms
– A new look at expressivity
Yves Marcoux - OLST-RALI - 21 mars 2007 31
Future work
• Editing:– Work out a few existing / new models– Properly integrate attributes– More powerful peritext computation– Implement ideas in a real editor
• Display peritexts when chosing insertion• Hyperlinks in displayed peritexts
– Experiment with real authors
Yves Marcoux - OLST-RALI - 21 mars 2007 32
Future work
• More than peritexts?
• More than NL (icons, sound, …)?
• Compare with other semantic frameworks– Downstream semantics: Wrightson, Renear
et al.
• Other models
• Tackle litterate modeling / interface design
Yves Marcoux - OLST-RALI - 21 mars 2007 33
Merci!
Questions?