Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo...

15
• Context • Problem • Research Question Background • Framework • Results • Demo Conclusions Ricardo Gacitua 1 , Pete Sawyer 1 , Paul Rayson 1 , Scott Piao 2 1 Computing Department, Lancaster University, Lancaster, UK 2 School of Computer Science, Manchester University, U A Framework to Experiment with Different NLP Techniques Workshop - Issues in Ontology Development and Use Nottingham, UK. 2007

Transcript of Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo...

Page 1: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2

1 Computing Department, Lancaster University, Lancaster, UK2 School of Computer Science, Manchester University, U

A Framework to Experiment with Different NLP Techniques

Workshop - Issues in Ontology Development and UseNottingham, UK.

2007

Page 2: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Index

Context

Problems

Research Question

Objectives

Framework

Brief Demo – Ontolancs –Workbench

Further Work

Context

Problems

Research Question

Objectives

Framework

Brief Demo – Ontolancs –Workbench

Further Work

Page 3: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Context

• Context Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them.

Focus:

Learning taxonomic relations between concepts

Deriving a concept hierarchy organizing these concepts

Extracting the relevant domain terminology and synonyms from a text collection

Extending an existing concept hierarchy with new concepts

Discovering concepts which can be regarded as abstractions of human

thought

Populating the ontology with instances of relations and concepts

Learning non-taxonomic relations between concepts

Discovering other axiomatic relationships or rules involving concepts and relations.

Methods for term extraction can be as simple as :•counting raw frequency, •applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-

neto, 1999) or •applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999]

Methods for term extraction can be as simple as :•counting raw frequency, •applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-

neto, 1999) or •applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999]

Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]

Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]

Page 4: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Context

However, researchers have realised that the output for the ontology

learning process is far from being perfect [Cimmiano, 2005]

Philipp Cimiano, Johanna Völker, Rudi Studer Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text Information, Wissenschaft und Praxis 57 (6-7): 315-320. October 2006. see the special issue for more contributions related to the Semantic Web

Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them.

Focus:• Context

Page 5: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Problem

• Problem

A challenging issue is to quantitatively evaluate the usefulness, accuracy of the techniques and combinations of techniques when applied to ontology learning [1].

A key issue not addressed yet:

Reinberg and Spyns (2005) point out the importance of the evaluation of the effectiveness of the techniques for ontology learning “To our knowledge no comparative study has been published yet on the efficiency and effectiveness of the various techniques applied to ontology learning”. (page 2)

(1) Reinberger, M. L. and P. Spyns (2005). Unsupervised text Mining for the learning of DOGMA-inspired Ontologies. Ontologies Learning from Text: methods, Evaluation and Applications, Advances in Artificial Intelligence. P. Buitelaar, Cimiano P., Magnini B. (eds.). Amsterdam, IOS Press. vol. 24,: pages 305-339.

In most cases, it is not obvious to how to use, configure and combine

techniques from different fields for a

specific domain.

In most cases, it is not obvious to how to use, configure and combine

techniques from different fields for a

specific domain.

Page 6: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Research Question

Can shallow semantic analysis of the kind enabled by semantic tagging, together with a range of other statistical NLP techniques; identify key domain concepts?

Can it do it with sufficient confidence in the correctness and completeness of the result?”

• Research Question

Page 7: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Background..

•Background

They implement several techniques from different fields such a knowledge acquisition, machine learning, information retrieval, natural language processing, artificial intelligence reasoning and database management.

A number of frameworks that support ontology learning process have been reported:

ASIUM OntoLT DODDLE

Tex2Onto

OntoLearn

Most frameworks use a pre-defined

combination of techniques. Thus, they

do not include any mechanism for carrying

out experiments with combinations or the

ability to include new ones.

Most frameworks use a pre-defined

combination of techniques. Thus, they

do not include any mechanism for carrying

out experiments with combinations or the

ability to include new ones.

Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.

Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.

Page 8: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

A Flexible Framework

• Framework

Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho-syntactically and semantically.

Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho-syntactically and semantically.

Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine

Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine

A existing DAML ontology can be used as a reference and to calculate precision and recall.

A existing DAML ontology can be used as a reference and to calculate precision and recall.

•Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy.

•Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy.

Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.

Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.

Page 9: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Preliminary Results

Our results are consistent with other studies. For instance, Alkula[3] suggests that the lemmatization may be a better approach than stemming.

[3]Alkula, R. 2001. From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software. Inf. Retr. 4, 3-4 (Sep. 2001), 195-208.

Some researchers use different text processing techniques such as stopword filtering, lemmatization or stemming.StopWord Filtering: [ Bloehdorn et al., 2006 ]Lemmatization: [ Buitelaar and Ramaka, 2005 ]Stemming: [ Kietz et al, 2000 ]•S. Bloehdorn and P. Cimiano and A. Hotho: Learning Ontologies to Improve Text Clustering and Classification. Proc of GFKL, 2005. •Paul Buitelaar, Srikanth Ramaka Unsupervised Ontology-based Semantic Tagging for Knowledge Markup In: Proc. of the Workshop on Learning in Web Search at the International Conference on Machine Learning, Bonn, Germany, August 2005. •J.Kietz, et al., A Method for semi-automatic ontology acquisition from a corporate intranet, in: Proc EKAW-2000 , France. 2000.

From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process.

From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process.

• Results

Page 10: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Brief Demo

• Demo

Ontology Framework

Page 11: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Conclusions

• Conclusions

Main challenge:

Our research project addresses an important challenge of ontology research, i.e. how quantitatively to evaluate the usefulness and accuracy of both techniques and combinations of techniques, when are applied to ontology learning.

This framework is designed as a cyclical process to experiment with different techniques. Techniques are included as a plug-in.

11

It provides support to determine what techniques or their combination provide optimal performances for ontology learning

22

Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.

Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.

Page 12: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Further Work

• Further Work

Our Project:

OntoLancs – A Flexible Framework For Ontology Learning

Including new techniques (plugin) from

different tools.

Including new techniques (plugin) from

different tools.

Future WorkFuture Work

Experimenting with techniques in a Supervised and

Unsupervised Mode

Experimenting with techniques in a Supervised and

Unsupervised Mode

Page 13: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

The End

OntoLancs

Computing Department

Lancaster University 2006, UK

Page 14: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Text2Onto vs. OntoLancs

Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode.

Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode.

Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques.

Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques.

Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java API’s.

Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java API’s.

Page 15: Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

• Context

• Problem

• Research Question

• Background

• Framework

• Results

• Demo

• Conclusions

• Further Work

Techniques included into OntoLancs

1. Grouping by POS2. Raw Frequency

Filtering3. POS Filtering4. Lemmatization5. Stemming6. StopWord Filtering7. Frequency Profiling8. Syntactic Pattern Co-

ocurrences9. Window-based

Collocations10.Semantic Filter

(soon)

1. Grouping by POS2. Raw Frequency

Filtering3. POS Filtering4. Lemmatization5. Stemming6. StopWord Filtering7. Frequency Profiling8. Syntactic Pattern Co-

ocurrences9. Window-based

Collocations10.Semantic Filter

(soon)