Cognitive corpus-based LSP lexicography – research and implementation issues – a case study on...
-
Upload
chester-ryan -
Category
Documents
-
view
226 -
download
1
Transcript of Cognitive corpus-based LSP lexicography – research and implementation issues – a case study on...
Cognitive corpus-based LSP lexicography – research and
implementation issues – a case study on the Multilingual Glossary on Risk Management
Gerhard BudinUniversity of Vienna
Austrian Academy of Sciences
8th of April, 2011
Motivations and Methods:Terminologies for Risk Communication• The Role of LSP Lexicography in domain communication
– Increasing the “transparency” of terms – Help negotiate a common understanding of terms in
intra-, inter- and trans-disciplinary and transcultural discourse
– Help increase the consistency of risk discourse (written and spoken) and increase understanding in target audiences
– Reduce unnecessary synonyms, disambiguate polysems, help separate homonyms
– Help create risk terminologies in many languages– Support knowledge sharing and knowledge transfer in
cooperative work environments – Support cross-cultural discourse (e.g. translation and
parallel texts)
The Domains of Risk Management
• Multidisciplinary, diverse, and fragmented - or• Transdisciplinary, overlapping, converging,
integrated, and complementary• The need for mediating between different
approaches, cultures, and discourses: – Technological, engineering, research, science– Administration, legislation, monitoring– Social, sociological, political, cultural– Domain approaches (financial, ecological, chemical,
safety, geographical, planning and forecast, health, etc.)
WIN Project (FP6 2004-2009): WP “Human Language Interoperability”• Objectives
– WP 2200 is designed to support international risk management and risk communication processes (within the WIN project and beyond)
• Achieved results (with ongoing work)– Large parallel corpora collection with risk-related texts and
lexical resources (fr, en, de, es, ro, fi, hu, ru)– Multilingual index with conceptual structure– Bibliography and codes of sources– Risk Ontology – Multilingual online terminology database
Integrative R&D Approach• A combination of theoretical approaches and their methods
in order to achieve a result that is targeted towards the needs of the project consortium and the cooperation partners– Quantitative (computational) and qualitative (intellectual)
methods of corpus analysis– Lexicographical and terminographical (word/text-oriented
and concept/knowledge-oriented) – Text linguistics and translation studies– Cross-cultural comparative approach and knowledge
system approach, multi-domain communication– Knowledge engineering, computational semantics/Web
2.0 (ontologies, frame semantics, etc.)– Cognitive Science approach (media pedagogy – eLearning,
specific learner support, interactive approach (mental lexicon), usability engineering
Motivation and Convergence of Research Interests and Contexts
• Interest in cognitive science research applied to terminology management, ontology engineering, translation technologies, E-Learning systems design and implementation
• Research Cluster 1 “Translation – Cognition – Technologies” at the Center for Translation Studies, University of Vienna
• Interdisciplinary Research Platform on Cognitive Science – Cluster on Cognitive Linguistics
• Research Priority 1 Lexicology, Terminology, and Parallel Corpora at the Institute for Corpus Linguistics and Text Technology at the Austrian Academy of Sciences
Research contexts in several projects• Previous and ongoing projects
– Dynamont• Methodology for Creating Dynamic Ontologies, BMVIT, national research
programme “Semantic Systems” – multi-dimensional ontology modelling– WIN (Wide Area Information Network on Risk Management) MGRM Multilingual
Glossary on Risk Management• IP (Integrated Project) in FP6, 2004-2008, focus on creating a multilingual
terminology and ontology of risk management – risk ontology for natural hazards– Montific - Multilingual ONTology for Internal Financial Control, a LLP project (Leonardo
da Vinci II) • Building a “learning ontology” for an eLearning environment
– STABILITY AND ADAPTATION OF CLASSIFICATION SYSTEMS IN A CROSS-CULTURAL PERSPECTIVE - European Science Foundation: COST A 31 project
• cognitive linguistics – how “classifiers” are embodied in language incl. ontologies– TES4IP - Terminology Services for the Intellectual Property Domain (Bridge project
funded by FFG, Austrian Research Agency• Term extraction, multi-word term recognition, named entity recognition, legal
vocabularies and legal ontologies• -> Ongoing study
– Cognitive Ontologies• Designing, Generating and Using Domain Ontologies
Ontology Engineering and Cognitive Science
• Cognitive Aspects have been of interest in a variety of ontology engineering approaches– Barry Smith
• Epistemological focus combined with work on domain ontologies (mainly bio-medical)
• Criticizing the epistemological foundations of terminology theory in elaborating his foundational theory of ontology
– Aldo Gangemi • DOLCE: Descriptive Ontology for Linguistic and Cognitive Engineering• Foundational theory of ontology• Many projects, also on tools and on domain ontologies
– But also many others (Guarino, Sheth, Obrst, Noy, et al) have done research on these aspects
– Some criticism, that the focus in ontology evaluation is on syntactic evaluation for computational uses (only) – the classical scenario
“Cognitive Ontologies”
• Conceptual clarification:– Ontologies of cognitive processes
• In neuroscience research, similar to other bio-medical ontologies (cognitive atlas, neuropsychiatric phenomena, ontology of cognitive objects, etc.)
– Ontologies with a focus on their cognitive aspects• DOLCE and other cognitive-oriented approaches• Constructivist epistemology for ontology building,
concerning the relation to “reality”
• Increasing convergence of these two concepts
Our own research• Our previous and ongoing projects have been focusing
on cognitive adequacy of domain ontologies and their use in knowledge acquisition in learning situations– Terminology studies as a contribution from this
perspective (related research by Nistrup Madsen/Erdman Thomsen 2005, 2009, etc.)
• Using DOLCE design patterns for multi-dimensional conceptual modeling for ontology building – the DYNAMONT project
• From domain corpora to terminologies and from there to domain ontologies – for eLearning scenarios – the MONTIFIC project– For domain experts – the WIN/MULTH/MGRM project
Moving up (and down) the Ontology Spectrum
• The challenge: from linguistic-cultural diversity of discourse and free-form lexical structures to a unified, formalized, axiomatized ontology – and back, to support human understanding and social processes such as collaborative learning
• The method: an integrative, multi-level modelling approach specifying the steps in a process-oriented workflow framework (with variable, combinable steps depending on concrete needs) for – Gradual semantic enrichment– Gradual semantic formalization– Multi- and cross-lingual referencing/alignment for text management– Constant interaction between full texts and lex-term resources
• The technology: a multi-component workbench (i.e. Dynamont-WB incl. ProTerm as a central element), using XML, RDF, OWL, SKOS, WordNet + GlobalWordnet, MLIF (containing TBX, TMX, XLIFF, LMF, TMF, etc.), FrameNet, etc.
• The advantage: full exploitation of all types of languages resources (LR) and knowledge organization systems (KOS), providing a framework not only for their semantic enrichment and formalization as ontologies but also for ontology-based multilingual authoring, text generation and translation
The global risk communication scenario
• Several projects since 1994 covering the following activities:– Thesaurus building– Creating multilingual terminology databases– Creating multilingual text corpora– Lexicographical glossary– Semantic enrichment (e.g. conceptual links, frame semantics)– Collection and analysis of relevant knowledge organization systems– Annotation of resources– Mark-up of resources (TBX, etc.)– Ontology building– Communication design
From texts and terminologies to ontologies- and back to texts
• Using the Risk scenario– Termbase
• Export XML• Domain Models – meta-models -> patterns
– Text corpus• Term extraction – comparative testing ProTerm, MultiTerm
Extract, MultiCorpora• Aligning with termbase• Convert to RDF
– Ontology import -> editor– Mappings (GMT, XML, RDF, OWL, UML, comma delimited, RDB, for
different kinds of lex-term resources, FN->OWL, etc.) • The MULTH-WIN Project as an example of methods
integration
Terminological frame semantics
• INTERVENTION (ACTOR(S), ACTIVITIES/PHASES):• RISK DETECTING (PRE-EVENT)• - R-ASSESSMENT• - R-PERCEPTION (X is risk)• - EXPERIENCE (statistics, case studies)• - OBSERVATION (monitoring)• - METHOD • - SATELLITE• - PROGNOSES• - R-ANALYSIS• - R-FEATURES• - SITUATION/CONTEXT (danger/hazard)• - SIMULATION (course of events)• - PROBALISTIC METHODS (safety)• - RELIABILITY• - R-IDENTIFICATION (DAMAGE)• - R-SOURCE• - DAMAGE CAUSE• - VULNERABILITY (DAMAGE TARGET)• - SUSCEPTABILITY (capacity/people)
Rothkegel
Terminological frame semantics
I. Pre-event B. Public awareness and planning, II. In-event: C. Events and response
afflux/Hochwasser durch AufstauBE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Aufstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]]backwater/RückstauBE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE=
Rückstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]]
Rothkegel
Dynamont architecture, tools and workflows
Phase 1: Identify the Problem
Phase 2: Structure the Problem
Phase 3: Identify Purpose and Scenario
Phase 4: Identify concepts of domain / subject matter
Phase 5: Create Knowledge Model
Phase 6: Create Application Profile
Phase 7: Create Acceptance
Phase 8: Create System
Ontology Creation
Postgres
Phase 9: Implement System
Visualization
Storage
MethodologyCollaboration
MDA-Component
The Glossary• The paper version of the glossary is used by risk managers, civil
engineers, but also teachers, students, translators, journalists, etc. • Generally, the purpose of such multilingual conceptual glossaries is to
improve domain communication and to facilitate mutual understanding across linguistic boundaries.
• The concepts of risk management and their definitions presented in this glossary were carefully selected from a large body of technical literature and authentic text corpora in the respective languages.
• These sources are referenced in the bibliography. • The multilingual glossary presented here includes 8 languages: English
and French as main pivot languages, as well as German, Spanish, Romanian, Finnish, Hungarian, and Russian.
• It comprises about 230 central concepts of risk management with about 400 definitions and about 1400 terms representing these concepts in each language (including synonyms and hyperonyms), indicating the conceptual relations between the entries.
The Glossary• The following themes are used as the macro-structure of the glossary:
– A. Risk assessment and technology assessment – B. Public perception of risk, planning, preparation and alarm, – C0. Risk events, equipment and operations, general terms– C1. Fire - events, equipment and operations– C2. Floods - events, equipment and operations– C3. Oil spills - events, equipment and operations.
• Each glossary entry follows the same micro-structure with the following information elements:– A conceptual number combined with a theme from the macro-structure– The equivalent terms in the 8 languages, accompanied by grammatical
information– The definitions of the concept in each language, including multiple
definitions that may differ from each other, accompanied by the textual source of the definition, also including structural semantic information on the concept
– Related terms and expressions.
Research issues
• Experimental settings• User studies, user modelling• Data modelling• Corpus-analysis• Multilingual – multi-domain – cross-cultural • Knowledge dynamics - Dynamic knowledge representations• Cognitive studies
Conclusions and Outlook I
• Online terminology database is continuously used• 8-language Glossary Version produced in February 2011
• Next steps in 2011:– Work in progress!– Database to be extended from 5 to 8 languages– Full text corpora to be extended– Promotion of the glossary in different user communities– Term extraction, research– Extension into more languages– More scientific publications
Conclusions and outlook II
• Research perspectives– Further research in
• Cognitive ontologies• User modelling, usability of terminological databases
and LSP dictionaries• Corpus-linguistic research – semantic annotation
modelling• Multilingual, multi-domain, cross-cultural issues
Selected References• Budin, G. Socio-terminology and computational terminology – toward an
integrated, corpus-based research approach. In: De Cilia, Rudolf et al. (eds.). Discourse, Politics, Identity. Tübingen: Stauffenburg Verlag, 2010, 21-31
• Budin, G. Semantic Systems supporting Cross-Disciplinary Environmental Communication. In: Hryniewicz, O.; Studzinski, J.; Szediw, A. (eds.). Environmental Informatics and Systems Research. Vol 2 Workshop and application papers. EnviroInfo 2007. Aachen 2007, 23-30
• CEDIM , Center for Disaster Management and Risk Reduction Technology c/o University of Karlsruhe (2005). Glossar: Begriffe und Definitionen aus den Risikowissenschaften.
• Gangemi DOLCE• Greciano, G. (2001). L'harmonisation de la terminologie en Sciences du Risque. In
Proceedings of Security Conference, Montpellier XII. Council of Europe-FER. Strasbourg, France.
• Greciano, G. (2001). Les sciences du risque: convergences interculturelles. In Proceedings of Risk Conference, Strasbourg X. Council of Europe-FER. Strasbourg, France.
• Greciano, G. (2001). Pour un glossaire combinatoire plurilingue du Risque. Proceedings of Risk-Conference, Mèze V. Council of Europe-FER.Strasbourg, France.
• Massué, J.P. (2001). "Mobilisation de la Communauté scientifique au service de l'amélioration de la gestion des risques". Mèze, FER-EUR-OPA.Strasbourg
• Nistrup Madsen/Erdman Thomsen 2005, 2009
AcknowledgementsGLOSSAIRE MULTILINGUE DE LA GESTION DU RISQUE
Français / Allemand / Anglais / Espagnol / Roumain / Finlandais / Hongrois / Russe
édité par Gertrud Gréciano, Gerhard Budin, Danielle Candel, John Humbleyavec le soutien de la Commission de l’Union Européenne, des Universités de Strasbourg, Vienne, Helsinki, de la Région Alsace, de la Délégation générale à la langue française et aux langues de
France, et de l’Académie des Sciences d’Autriche.
Auteurs: Gertrud Gréciano (Strasbourg), Gerhard Budin (Vienne),Annely Rothkegel (Chemnitz), Ulrike Hass (Essen)
Traducteurs: Cornelia Cujba (Iasi), Attila Frigyer (Budapest), Luis Gonzalez (Caracas-Paris),
Csilla Höfler-Bornemisza (Vienne), Annikii Liimatainen (Helsinki), Alexei Milko (Strasbourg-Moscou)
Coopération scientifique et technique: Steffi Baumann (Chemnitz), Aban Budin (Vienne), Christian Burghard (Chemnitz), Dimitrij Dobrovolskij (Moscou-Vienne), Eva Haas
(Munich-Ispra), Natalia Jonkova (Moscou), Andra Moga (Iasi-Vienne), Maren Runte (Essen), Julia Steuber (Essen), Virginie Tombeux (Paris), Elena Volgina (Moscou)
Thank you for your attention
Gerhard Budin
Center for Translation StudiesUniversity of Vienna
Institute of Corpus Linguistics and Text TechnologyAustrian Academy of Sciences
[email protected]://mgrm.univie.ac.at