Towards a Multilingual Ontology for Ontology-driven Content Mining in Social Web Sites

Post on 29-Dec-2014

597 views 0 download

description

 

Transcript of Towards a Multilingual Ontology for Ontology-driven Content Mining in Social Web Sites

Towards a Multilingual Ontology for Ontology-driven Content Mining in

Social Web Sites

Marcirio Silveira Chaves1 - marcirioc@uatlantica.pt

Cássia Trojahn2 - cassia.trojahn@inrialpes.fr

1Universidade Atlântica, Oeiras, Portugal 2 INRIA & LIG, Grenoble, France

Workshop on Cross-Cultural and Cross-Lingual Aspects of the Semantic WebShanghai, China, November 7th, 2010

In conjunction with the 9th International Semantic Web Conference (ISWC2010)

Motivation

• Social Semantic Web is highly dependent on the development of multilingual ontologies.

• Only 2.5% of the ontologies in the OntoSelect library is multilingual.

• (Multilingual) Hotel domain ontologies are rare.

• Multilingual comments need to be processed.

• Ontology-driven mining of comments from Social Web sites.

November 7th 2C3LSW2010

Context

• Customer Knowledge Management (CKM)

– Customer Relationship Management (CRM) and

– Knowledge Management (KM).

• Multilingual comments to support CKM

November 7th 3C3LSW2010

Outline

• Multilingual Ontology Application

• Hontology

• Related Ontologies

• Extending Hontology

• Conclusion

• Ongoing Work

November 7th C3LSW2010 4

Multilingual Ontology Application

November 7th 5C3LSW2010

Social webdata

Social webdata

Social webdata

ExtractionTransformation

Loading

ExtractionTransformation

Loading

Commentannotator

Commentannotator

Multilingual ontologyMultilingual ontology

Ontology augmenter

Ontology augmenter

User interface

User interface

Knowledgebase Expert

Data pre-processing Ontology enrichement

SearchingComments annotation

CKMCKM

Manager

Hontology

• Development Methodology

– Identify existing ontologies on related domains

– Select the main concepts and properties– Organize concepts and properties hierarchically into categories

– Translate the ontology (manual)– Expand concepts and properties based on comments

– Translate the new concepts and properties (manual)

– Generate the ontology in several formats

November 7th 6C3LSW2010

November 7th 7C3LSW2010

Hontology

• Category: contains all the types of categories into which a Hotel can be classified, e.g., tourist, comfort, and luxury.

• Facility: includes the utility options offered by each hotel, e.g., beauty salon, kids club, and pool bar.

• Hospitality: contains the existing kinds of hotels, e.g., hostel, pension, and motel.

November 7th 8C3LSW2010

Hontology

• Hotel: details the kind of hotels, e.g., bunker, cave, and capsule.

• Leisure: lists the leisure options, e.g., gym, jacuzzi, and sauna.

• Points of interest: often mentioned in comments about the hotels, e.g., stadium, museum, and monument.

• Room: splits into Hostel Room and Hotel Room, which have different kinds and nomenclature for rooms.

November 7th 9C3LSW2010

Hontology

• Hontology supports three languages

– English, French and Portuguese

• 97 concepts

• 9 object properties

• 25 data properties

November 7th 10C3LSW2010

Hontology

Related Ontologies

Mondeca HarmoNET Travel Itinerary

Hontology# concepts 1000 54 8 97# properties n.a. 166 24 34

# instances Zero Zero Zero ZeroDomain Tourism Tourism Travel HotelMultilingual No No No YesUse Mondeca

ProjectAccommodation and events

n.a. Hotel Sector Support Decision

Public freely available

No Yes Yes Yes

November 7th 11C3LSW2010

Extending Hontology

• Ontology augmenter

• Multilingual ontology matching

• Machine-learning methods

• (Semi)-automatically multilingual extension

• Hontology can be used as a multilingual resource to cross-language information retrieval.

November 7th 12C3LSW2010

• Ontology augmenter Term correlation: considers potential terms mentioned in

the comments, which are present in Hontology.

• ``Rooms are comfortable, but pillows are very hard'' the terms ``pillow'' (in the ontology) and ``room'' (not in the ontology) should be probably related through a property linking them in Hontology.

• Once the ontology is enriched with the term ``pillow'', a comment containing, for instance, only the sentence ``Pillows are very hard'' can be found under the concept ``room''.

November 7th 13C3LSW2010

Extending Hontology

• Ontology augmenter

Rules (or lexical patterns): comments usually contain a set of common adjectives, e.g., good, cheap, and soft.

• Using lexical patterns and extract relevant terms which are preceding or succeeding the adjective,

• ``Air-conditioned is loud'', ``Small bathroom''.

November 7th 14C3LSW2010

Extending Hontology

• Ontology augmenter

Synonyms

• elements that must be considered in the improvement of Hontology.

• they have already being considered in the process of adding labels to the concepts.

• This task can be extended with the help of dictionaries and lexical resources within an automatic process.

November 7th 15C3LSW2010

Extending Hontology

November 7th C3LSW2010 16

Ongoing work

(1) enrich Hontology by using potential terms from comments

(2) exploit Hontology in Multilingual Ontology Matching (i.e., creating between Hontology and other ontologies)

(3) include labels in other languages

(4) exploit the issues related to ontology localization and internationalization.

• Main contribution

– to make available for the community, a multilingual ontology that can be used as a baseline for many usages and applications in the context of the Multilingual Semantic Web.

November 7th 17C3LSW2010

Final Remarks