Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés...
-
Upload
emmeline-beasley -
Category
Documents
-
view
216 -
download
0
Transcript of Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés...
Social Tags and Linked Data for Ontology Development:
A Case Study in the Financial Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia, ocorcho}@fi.upm.esOntology Engineering Group
Universidad Politécnica de Madrid, Spain
Universitat Jaume I, Castellón
de la Plana, Spain
State University, Florida, USA
June 2014
FPI grant BES-2008-007622
2Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
FolksonomiesIntroduction
Java Programming language
Tutorial
Web 2.0User-
generated Content
Social Networks
Tools for organizing, sharing & discovering
Information
Java Programming language
Tagging Systems
Folksonomy
Java Java Persistent Access
Database Knowledge Base
3Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
• Vocabulary emerges around resources and usersGolder and Huberman (2006), Marlow et al. (2006)• Maintained by a large user community• Flexible (No restricted)• Up-to-date
• Emergent semantics from the aggregation of individual classifications Gruber (2007), Mika (2007), Specia and Motta (2007)
Folksonomies
Folksonomies as a source of knowledge
Introduction
4Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Folksonomies
Statistical-based Ontology-based
State of the art
Tag
Sim
ilarit
y M
easu
res
Ont
olog
y G
ener
atio
n
relation?
Two tags are related if..
Hybrid approaches
Ontology Folksonomy
Ontology
Ontology
Cattuto et al. (2008)Markines et al. (2009)
Körner et al. (2010)Benz et al. (2011)
Heymann and Garcia-Molina. (2006)Begelman et al. (2006)Hamasaki et al. (2007)
Jäschke et al. (2008) Kennedy et al. (2007)
Mika (2007)Benz et al. (2010)
Limpens et al. (2010)
Angeletou et al. (2008)Cantador et al. (2008) García-Silva et al. (2009)Maala et al. (2008)Passant (2007)Tesconi et al. (2008))
Giannakidou et al. (2008)Specia and Motta (2007).
5Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
FolksonomiesState of the art
Mika, 2007 Stat Yes Del,Oth Yes Yes No Yes Onto Desc Study NoHamasaki et al., 2007 Stat Yes Pol No Yes Yes No Onto Task-based No
Jaschke et al., 2008 Stat Yes Del,Bib Yes Yes No No Hier Desc Study NoLimpens et al., 2010 Stat Semi Oth No No Yes Yes Enri Pres/Rec No
Begelman et al., 2006 Stat Yes Del,Raw Yes Yes No No Clus Desc Study NoKennedy et al., 2007 Stat Yes Fli Yes Yes Yes Yes Inst Pres/Rec No
Heyman & Garcia Molina, 2006 Stat Yes Del,Cit No Yes No No Hier Task-based NoBenz et al., 2010 Stat Yes Del No Yes Yes Yes Hier Pres/Rec No
Giannakidou et al., 2008 Hyb Yes Fli Yes Yes Yes No Clus No NoSpecia & Motta, 2007 Hyb Semi Del,Fli Yes Yes Yes Yes Onto Desc Study No
Angeletou et al., 2008 Ont Yes Fli Yes Yes Yes Yes Enri Pres/Rec NoCantador et al., 2008 Ont Yes Fli,Del Yes Yes No Yes Inst Pres/Rec No
Tesconi et al., 2008 Ont Yes Del Yes Yes Yes Yes Enri Pres/Rec NoPassant, 2007 Ont No Oth Yes Yes Yes Yes Enri Desc Study No
Maala et al., 2008 Ont Yes Fli Yes Yes No Yes Enri Desc Study No
Disambi-guation
Sem. Ident
Output Evaluation Domain Knowledge
Approach Type Auto Dat Src. Select. & Cleaning
Context Ident.
Statistical-based• Most of the approaches do not distinguish
between classes and instances• Relation semantics is limited to some
types and is not precesily defined• No domain knowledge
LimitationsOntology-based• All the approaches produce either
enrichments or instances (No Classes)• Relations are not identified• No domain knowledge
Hybrid• Semi-automatic ontology generation• No domain knowledge
6Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Proposal
Goal: Generate a domain baseline ontology, containing classes and relationships, out of folksonomy information.
Folksonomy
Terminology
ExtractionList of domain terms
Domain Experts
Semantic Elicitatio
nLinked Open Data*
*“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
drive the extraction of domain classes and relationships from LOD
Domain relevant resources (URL)
7Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
We propose a process to extract domain knowledge from large and generic knowledge bases which is driven by the domain terminology in the folksonomy
• It may save time in the ontology development process
• It allows ontology engineers to understand the domain with a limited participation of domain experts.
• Smaller and more focused ontologies which are potentially easier to understand and maintain.
• complex queries and reasoning task may execute faster on smaller data sets
• In observance of methodological practice, our technique harvests community knowledge and reuses existing ontologies
• The Ontology has links to external classes and relationships available in the Linked Open Data cloud.
Benefits
8Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Challenges
Problem: Tags lack semanticsAmbiguitySynonymsAcronymsMorphological variations
PluralsSingularsVerb Conjugations
Misspellings
9Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Goal: To extract domain terminology from the folksonomy
Folksonomy A = U x T x R, G = (V,E) where V = U T R, and E ={(u, t, r)|(u, t, r) A}∪ ∪ ∈Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)| ((u, t∃ m, ri) A ^ (u, t∈ n, rj) A ^ t∈ m= tn)}
Spreading Activaction
Seeds: Domain relevant resources from Domain Experts
Nodes weighted with an activation value used to start the search.
Activation value spreads to adjacent nodes by an activation function.
Activation function: ~ Shared tags between the visited node and the source node, and the source node activation value.
Activation function > threshold: Node marked as activated and the spreading continuous to adjacent nodes.
Tags of activated nodes are collected as domain terms.
Terminology ExtractionApproach
10Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Semantic ElicitationApproach
Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012)A García-Silva, I Cantador, Ó CorchoInternational Journal on Semantic Web and Information Systems 8 (3), 24-41
• Normalize the tag to the standard notation of DBpedia resource titles• Search for a resource with a label equal to the normalized tag using SPARQL
• If not exists: Use an spelling suggestion service and search again• If exists: Check if it is related to a disambiguation resource
• If true: retrieve disambiguation candidates
Select the most similar candidate to the tag context• Vector space model• Candidate Resources represented using their textual descriptions • Tag represented using its context (i.e, cooccurrent tags)• Selection of most similar candidate using Cosine
• If false: Select the resource (Default sense in Wikipedia)
Goal: To relate domain terms (tags) to DBpedia resources
11Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Semantic ElicitationApproach
Goal: Identify classes from resources
• Use ask constructor to verify if the entity is a class
• If not:
• Create queries to traverse all the possible paths of equivalent relations between the entity and a class in the RDF graph
# Query 1.ASK{<resource> <rdf:type> <rdfs:Class>}
# Query 2SELECT ?classWHERE{ <resource> ?rel1 ?class. ?class <rdf:type> <rdfs:Class>FILTER (?rel1 = <owl:sameAs>) }
# Query 3SELECT ?classWHERE{ <resource> ?rel1 ?node. ?node ?rel2 ?class. ?class <rdf:type> <rdfs:Class>FILTER((?rel1 = <owl:sameAs>) &&(?rel2 = <owl:sameAs>))}
RelFinder: Revealing Relationships in RDF Knowledge Bases. Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and Timo Stegemann In: Proceedings of the 4th International Conference on Semantic and Digital Media Technologies (SAMT 2009), pages 182-187. Springer, Berlin/Heidelberg, 2009.
12Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Semantic ElicitationApproach
Goal: To identify relations between classes
• For each pair of classes• Create queries to traverse all the
possible paths between two classes in the RDF graph, and retrieve the relationships.
Caveats
• May result in adding non relevant domain information to the ontology
• Large path• Path passes through abstract
concepts or relationships• cyc:ObjectType• umbel:RefConcept
13Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Semantic ElicitationApproach
Minimizing the risk to add non relevant information to the ontology• Keep the path length short
• Our experiments show satisfactory results with short path lengths that allow us to enrich the initial set of classes while preserving the precision of the ontology
• Avoid high level concepts• Create lists of high level concepts collected from the knowledge base vocabularies
to filter out the paths containing those concepts• Knowledge base core vocabularies are usually well documented
• http://umbel.org/specications/vocabulary• http://mappings.dbpedia.org/server/ontology/classes/• http://www.cyc.com/kb/thing
• Use semantic similarity distances• Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy• Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc.
14Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Experiment in the financial DomainEvaluation
Finance vocabulary
Input
Evaluation
15Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
Experiment in the financial DomainEvaluation
Terminology Extraction
Finance Ontology
Finance vocabulary
16Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
• Ran the process with an activation threshold 0.8• The ontology produced consists of 187 classes, 378 relations of 8 different types,
and 12 modules.
Inspecting a financial ontologyEvaluation
17Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
A
Evaluation
Class Precision = 80.67%, Relation Precision=96.4%
Inspecting a financial ontologyEvaluation
Ontology Modules
Module Precision (Class) Module Precision (Class)Organization 77,80% Stock Exchange 84,60%Company 88,50% Money Transactions 100%Person 55,60% Country 100%Union 3,74% Research 100%Banker 100% Driver 0%Human 100% Member 100%
18Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
• We have generated a method for automatically developing domain ontologies• Limited user participation• We benefit from the aggregation of the individual classifications to
extract an emergent domain vocabulary• In accordance with methodological guidelines we reuse existing
knowledge (The Web of Data)• We tap into existing links between data sets to collect related
semantic information• We avoid, to some extent, semantic mismatches• We avoid heterogeneous representations
• In practice, we expect the method will be used by ontology engineers to generate baseline ontologies that can be refined later according to the
ontology requirements.
Conclusions
19Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
• Develop a method to assess automatically the validity of the relationships found in the linked data cloud:• OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights• However:
• Stock Exchange is an organization
• Exchange of User Rights is an event
• The use of semantic similarity measures to decide whether to include or not relationships found setting up a path between two classes.
• To be able to discover and use datasets in the linked data cloud that cover the domain of interest.
Future Work
Social Tags and Linked Data for Ontology Development:
A Case Study in the Financial Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia, ocorcho}@fi.upm.esOntology Engineering Group
Universidad Politécnica de Madrid, Spain
Universitat Jaume I, Castellón
de la Plana, Spain
State University, Florida, USA
June 2014
FPI grant BES-2008-007622