ASIGNATURA: ECONOMÍA GLOBALIZADA PROFESOR: OSCAR LANDERRETCHE GACITUA, Ph.D. (Oxford)
Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo...
-
Upload
david-sinclair -
Category
Documents
-
view
217 -
download
0
Transcript of Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo...
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2
1 Computing Department, Lancaster University, Lancaster, UK2 School of Computer Science, Manchester University, U
A Framework to Experiment with Different NLP Techniques
Workshop - Issues in Ontology Development and UseNottingham, UK.
2007
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Index
Context
Problems
Research Question
Objectives
Framework
Brief Demo – Ontolancs –Workbench
Further Work
Context
Problems
Research Question
Objectives
Framework
Brief Demo – Ontolancs –Workbench
Further Work
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Context
• Context Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them.
Focus:
Learning taxonomic relations between concepts
Deriving a concept hierarchy organizing these concepts
Extracting the relevant domain terminology and synonyms from a text collection
Extending an existing concept hierarchy with new concepts
Discovering concepts which can be regarded as abstractions of human
thought
Populating the ontology with instances of relations and concepts
Learning non-taxonomic relations between concepts
Discovering other axiomatic relationships or rules involving concepts and relations.
Methods for term extraction can be as simple as :•counting raw frequency, •applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-
neto, 1999) or •applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999]
Methods for term extraction can be as simple as :•counting raw frequency, •applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-
neto, 1999) or •applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999]
Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]
Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Context
However, researchers have realised that the output for the ontology
learning process is far from being perfect [Cimmiano, 2005]
Philipp Cimiano, Johanna Völker, Rudi Studer Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text Information, Wissenschaft und Praxis 57 (6-7): 315-320. October 2006. see the special issue for more contributions related to the Semantic Web
Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them.
Focus:• Context
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Problem
• Problem
A challenging issue is to quantitatively evaluate the usefulness, accuracy of the techniques and combinations of techniques when applied to ontology learning [1].
A key issue not addressed yet:
Reinberg and Spyns (2005) point out the importance of the evaluation of the effectiveness of the techniques for ontology learning “To our knowledge no comparative study has been published yet on the efficiency and effectiveness of the various techniques applied to ontology learning”. (page 2)
(1) Reinberger, M. L. and P. Spyns (2005). Unsupervised text Mining for the learning of DOGMA-inspired Ontologies. Ontologies Learning from Text: methods, Evaluation and Applications, Advances in Artificial Intelligence. P. Buitelaar, Cimiano P., Magnini B. (eds.). Amsterdam, IOS Press. vol. 24,: pages 305-339.
In most cases, it is not obvious to how to use, configure and combine
techniques from different fields for a
specific domain.
In most cases, it is not obvious to how to use, configure and combine
techniques from different fields for a
specific domain.
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Research Question
Can shallow semantic analysis of the kind enabled by semantic tagging, together with a range of other statistical NLP techniques; identify key domain concepts?
Can it do it with sufficient confidence in the correctness and completeness of the result?”
• Research Question
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Background..
•Background
They implement several techniques from different fields such a knowledge acquisition, machine learning, information retrieval, natural language processing, artificial intelligence reasoning and database management.
A number of frameworks that support ontology learning process have been reported:
ASIUM OntoLT DODDLE
Tex2Onto
OntoLearn
Most frameworks use a pre-defined
combination of techniques. Thus, they
do not include any mechanism for carrying
out experiments with combinations or the
ability to include new ones.
Most frameworks use a pre-defined
combination of techniques. Thus, they
do not include any mechanism for carrying
out experiments with combinations or the
ability to include new ones.
Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.
Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
A Flexible Framework
• Framework
Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho-syntactically and semantically.
Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho-syntactically and semantically.
Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine
Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine
A existing DAML ontology can be used as a reference and to calculate precision and recall.
A existing DAML ontology can be used as a reference and to calculate precision and recall.
•Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy.
•Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy.
Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.
Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Preliminary Results
Our results are consistent with other studies. For instance, Alkula[3] suggests that the lemmatization may be a better approach than stemming.
[3]Alkula, R. 2001. From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software. Inf. Retr. 4, 3-4 (Sep. 2001), 195-208.
Some researchers use different text processing techniques such as stopword filtering, lemmatization or stemming.StopWord Filtering: [ Bloehdorn et al., 2006 ]Lemmatization: [ Buitelaar and Ramaka, 2005 ]Stemming: [ Kietz et al, 2000 ]•S. Bloehdorn and P. Cimiano and A. Hotho: Learning Ontologies to Improve Text Clustering and Classification. Proc of GFKL, 2005. •Paul Buitelaar, Srikanth Ramaka Unsupervised Ontology-based Semantic Tagging for Knowledge Markup In: Proc. of the Workshop on Learning in Web Search at the International Conference on Machine Learning, Bonn, Germany, August 2005. •J.Kietz, et al., A Method for semi-automatic ontology acquisition from a corporate intranet, in: Proc EKAW-2000 , France. 2000.
From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process.
From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process.
• Results
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Brief Demo
• Demo
Ontology Framework
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Conclusions
• Conclusions
Main challenge:
Our research project addresses an important challenge of ontology research, i.e. how quantitatively to evaluate the usefulness and accuracy of both techniques and combinations of techniques, when are applied to ontology learning.
This framework is designed as a cyclical process to experiment with different techniques. Techniques are included as a plug-in.
11
It provides support to determine what techniques or their combination provide optimal performances for ontology learning
22
Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.
Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Further Work
• Further Work
Our Project:
OntoLancs – A Flexible Framework For Ontology Learning
Including new techniques (plugin) from
different tools.
Including new techniques (plugin) from
different tools.
Future WorkFuture Work
Experimenting with techniques in a Supervised and
Unsupervised Mode
Experimenting with techniques in a Supervised and
Unsupervised Mode
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
The End
OntoLancs
Computing Department
Lancaster University 2006, UK
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Text2Onto vs. OntoLancs
Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode.
Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode.
Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques.
Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques.
Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java API’s.
Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java API’s.
• Context
• Problem
• Research Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Techniques included into OntoLancs
1. Grouping by POS2. Raw Frequency
Filtering3. POS Filtering4. Lemmatization5. Stemming6. StopWord Filtering7. Frequency Profiling8. Syntactic Pattern Co-
ocurrences9. Window-based
Collocations10.Semantic Filter
(soon)
1. Grouping by POS2. Raw Frequency
Filtering3. POS Filtering4. Lemmatization5. Stemming6. StopWord Filtering7. Frequency Profiling8. Syntactic Pattern Co-
ocurrences9. Window-based
Collocations10.Semantic Filter
(soon)