Folksonomy-Based Collabulary Learning

of 26 /26
1 Folksonomy-Based Collabulary Learning Leandro Balby Marinho, Krisztian Buza, Lars Schmidt-Thieme {marinho,buza,schmidt-thieme}@ismll.uni- hildesheim.de Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany

Embed Size (px)

description

Folksonomy-Based Collabulary Learning. Leandro Balby Marinho, Krisztian Buza, Lars Schmidt-Thieme {marinho,buza,schmidt-thieme}@ismll.uni-hildesheim.de Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany. Chill out. Classic Music. Jazz. Chopin. - PowerPoint PPT Presentation

Transcript of Folksonomy-Based Collabulary Learning

  • Folksonomy-Based Collabulary LearningLeandro Balby Marinho, Krisztian Buza, Lars Schmidt-Thieme{marinho,buza,schmidt-thieme}@ismll.uni-hildesheim.de

    Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany

  • Motivation Scenario

  • Motivation Scenario

  • OutlineProblem DefinitionCollabulary LearningFolksonomy EnrichmentFrequent Itemset Mining for Ontology Learning from FolksonomiesRecommender Systems for Ontology EvaluationExperiments and ResultsConclusions and future work

  • Problem DefinitionSemantic Web suffers from knowledge bottleneckFolksonomies can helpHow?Voluntary annotators Educated towards shareable annotationHow?Through a collabulary

  • Problem DefinitionA possible solution to the shortcomings of folksonomies and controlled vocabulary is a collabulary, which can be conceptualized as a compromise between the two: a team of classification experts collaborates with content consumers to create rich, but more systematic content tagging systems.

    Wikipedia article on Folksonomies(http://en.wikipedia.org/wiki/Folksonomy)

  • Problem DefinitionAn ontology with concepts and a knowledge base with f is called a collabulary over and Problem:Learn a collabulary that best represents folksonomy and domain-expert vocabulary

  • Collabulary Learning

  • Folksonomy to trivial ontologyUserResourceTag

  • Matching Concepts

  • Additional tag assignments

  • Expert conceptualization

  • Frequent Itemsets for Learning Ontologies from FolksonomiesMost of the approaches rely on co-occurrence modelsIn sparse structures positive correlations carry essential information about the dataProject folksonomy to transactional database and use state of the art frequent itemsets mining algorithms

  • Frequent Itemsets for Learning Ontologies from FolksonomiesAssumptions for relation extraction from frequent intemsetsHigh Level TagThe more popular a tag is, the more general it isA tag x is a super-concept of a tag y if there are frequent itemsets containing both tags such that sup({x})sup({y}) Frequency The higher the support of an itemset, stronger correlated are the items on itLarge Itemset Preference is given for items contained in larger itemsets

  • Frequent Itemsets for Learning Ontologies from Folksonomies

  • Recommender Systems for Ontology EvaluationOntologies can facilitate browsing, search and information finding in folksonomiesThey should be evaluated in this respectRecommender Systems are programs for personalized information findingLet the recommender tell which is the best ontology

  • Recommender Systems for Ontology EvaluationTask Recommend useful resourcesApplication Ontology-based collaborative filteringOntologies A trivial ontology (folksonomy), domain-expert and collabularyGold Standard Test SetPorzel, R., Malaka, R.: A task-based approach for ontology evaluation. In: Proc. of ECAI 2004, Workshop on Ontology Learning and Population, Valencia, Spain

  • Recommender Systems for Ontology EvaluationUser 1 := (res1:=1)TZiegler, C., Schmidt-Thieme, L., Lausen, G.: Exploiting semantic product descriptions for recommender systems. In: Proc. of the 2nd ACM SIGIR Semantic Web and Information Retrieval Workshop (SWIR 2004), Sheffield, UK

  • Experiments and resultsDatasetsLast.fm (folksonomy)Musicmoz (domain-expert ontology)Only the resources contained in both were considered

  • Experiments and resultsFolksonomy EnrichmentEdit distance to handle duplications

  • Frequent Itemsets for Learning Ontologies from Folksonomies

  • Frequent Itemsets for Learning Ontologies from Folksonomies

  • Recommender Systems for Ontology Evaluation

    Top-10 best recommendations / Allbut1 protocolNeighborhood size 20Recall:=Number of hits / Number test users

    Recall

  • Conclusions and Future workConclusionsFolksonomies can alleviate knowledge bottleneckUsers need to be educated towards more shareble vocabulary thoughCollabularies can helpOur ContributionsDefinition of the collabulary learning problemAn approach for enriching folksonomies with domain expert knowledgeA new algorithm for learning ontologies from folksonomiesA new benchmark for task-based ontology evaluationFuture WorkNon-taxonomic relations ?Different enrichment strategies ?Optimized structure for the task with constraints ?

  • Thanks for your attention!

  • Frequent Itemsets for Learning Ontologies from Folksonomies

    Lets start with a motivating scenario. Lets suppose you are trying to organize your digital music collection through the use of tags. You hear a peace of Chopin, you find it relaxing and because of that you assign the tag chill out to it. This is perfectly fine with you and this will probably help you to refind this music in the future, but the question is will it also help others?! Chill out is a very broad concept and there is a lot of music to chill, so If you really want to share this music with other users you should choose a more well agreed vocabulary. Domain experts are great at that, because they usually define a vocabulary where most of the people agree with. If you ask an expert though, he would probably tag chopin with classic music, a more well agreed tag than chill out.

    Sometimes the users try to make their resources more shareable, but since they are not specialists, they can choose concepts that are not well agreed for the resource in question. This is very typical scenario in folksonomies; users are free to tag and are even allowed to make mistakes. Thats maybe why folksonomies are so popular.

    When I think about folksonomy users I think about my flat. A friendly mess, but a mess where I usually can find what I want. But if I want to make it more sociable, if I want to invite you to go there, I probably would need the help of an expert of making things look nicer; my mother. Now you can see, my flat looks nicer, but I can not find my stuff anymore.

    Thats the problem we tried to tackle in this paper; to find a trade off between the friendly mess of folksonomies and the more systematic world of domain experts

    Here is the outline of the talk. We start by defining the problem we want to address. Than we show our approach which is based in decomposed in two sub-tasks enriching a folksonomy with domain-expert knowledge and use frequent itemset mining techniques to learn an ontology over the enriched folksonomy.The Semantic Web needs Ontologies; Ontologies need annotators; annotators are usually expensive; but folksonomy have voluntary annotators. But voluntary annotators do not necessarily mean good annotators. One needs to educate folksonomy users towards more well agreed vocabulary. The question is how to do it?! We argue that this could be done through a collabulary.We define a collabulary for our purposes as a special ontology (e.g. domain-expert ontology), where the concepts belong to the exclusive union between the tags of a folksonomy and an ontology, and the instances belong to the exclusive union between the instances of an ontology and the resources of a folksonomy.

    Here we have a simplified definition of folksonomies, ontologies and knowledge base.

    A folksonomy is a tuple F where U, T and R are a finite set and Y is a set of ternary relations called tag assignment.An ontology is a tuple O where C is a set of concept identifiers and minor or equal subscript C is a partial order on C with an unique top element called root.A knowledge base is a structure with a set of instance identifiers I and a function iot association concepts to instances.

    Of course that given this definition of collabulary, there would be several ways to do it. Here we show our approach and argue why it is good.Here we have an overview of our approach. We basically take a folksonomy and domain expert ontology as input, and generate a folksonomy enriched with domain expert knowledge. We then use frequent itemsets mining techniques to induce an ontology, in this case a collabulary, from it.This is quite easy to do, we just connect all the tags to the root and check the tag assignments to identify the instances of each concept.Notice that now we have an ontology matching problem. To do that we want a well defined similarity measure that do not depend on the lexical layer of the ontologies, since relying on syntactic description of tags is not suitable, given the uncontrolled vocabulary of folksonomy users.

    For that we used the well known Jaccard Coefficient, where for two concepts A and B coming from different ontologiesThe non-hierarchical property of folksonomies can somewhat restrict the capabilities of users for finding information, as the browsing is constrained to a flat structure where tag relations are disregarded. Why learning ontologies in the first place?!