Semantic Need: Guiding Metadata Annotations by Questions People #ask
-
Upload
hans-joerg-happel -
Category
Technology
-
view
120 -
download
5
description
Transcript of Semantic Need: Guiding Metadata Annotations by Questions People #ask
Semantic NeedGuiding Metadata Annotations by Questions People #ask
Hans-Jörg Happel, FZI Karlsruhe, Germany2010-11-09 @ 9th Int. Semantic Web Conference (ISWC 2010), Shanghai, China
Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2
• SMW is a popular Semantic Web application that allows to annotate Wiki pages semantically
• Semantic interpretation of the existing Wiki categories• Syntax extension for [[Wiki links]]
– Relations to other pages: [[Capital::Abuja]]– Literals: [[Inhabitants::182418]]
Semantic MediaWiki (SMW)
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 3
Structured Queries in SMW• SMW also allows for structured queries
{{#ask: [[Category:Country]] [[OnContinent::Africa]] |?area |?...}}
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 4
SMW resembles the Semantic Web in small
SMW Query Result{{#ask:
[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 5
???
…?
What happend to „Nigeria“?
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 6
Info might be missing
…not annotated properly
Different property
name
Distributed data source not available
Semantic Gaps• Observation:
– „Semantic gap between supply and demand on the Semantic Web” [Mik09]
– Due to the evolutionary nature of the (Semantic) Web (OWA)
• What is missing? – i.e.:– KB: Axioms that are known (e.g. statements about Nigeria)– XKB: Axioms not yet known but people would like to know
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 7
Towards Semantic Need• Research questions
– How to identify „Semantic Gaps“?– Do „Semantic Gaps“ exist?– If yes, how to close these gaps?
• Research approach– Propose heuristics– Explorative: Analyze Public Semantic Web– Constructive: Design and evaluate tools
88Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China
Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 9
Idea: Guide Annotation by Information Needs• Means for deriving information needs
– (Structured) queries– Information access/browsing– Context– …?
• We chose to focus on queries– Explicit; can be captured easily– Express a „demand“ [Mik09]– Recur across time and different people (at least in
IR! [Smy05, Tee06, Zha09])
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 10
Identifying „Semantic Gaps“• Focus on
– Conjuctive queries– Retrieving instances and their properties
• Core elements{{#ask: [[Category:Country]]
[[OfContinent::Africa]]|?hasArea|?population|?hasCapital|?Currency}}
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 11
Printout Statement
Printout Statement
ConditionsConditions
Semantic Gap Heuristic #1: Near Matches• Instance I KB is considered a near match of a
query q if:– I is not in the result set of q in KB
– There is at least one conjunctive query atom of q for which I is part of the result set
– I would be in the result set of q in KB XKB
• Correspondingly, we consider q to have an incomplete result set if it has „near matches“
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 12
Semantic Gap Heuristic #1: Example{{#ask:
[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 13
Egypt 1.001.449 km2 83.082.869 Cairo Egyptian pound
Lacks annotation [[OnContinent::Africa]]
„Near Match“
Lacks annotation [[OnContinent::Africa]]
„Near Match“
Semantic Gap Heuristic #2: Missing Printout Values• Instance I KB is considered to have missing
printout values for a query q if:– I is part of the result set of q– q contains a printout statement x for which no property
value of I exists in the KB
• Note: Technically, „missing printout values“ can be considered equivalent to near matches– SPARQL requires „OPTIONAL“ modifier to yield missing
printout values– SMW-QL allows to set printout values required
• Correspondigly, we consider q to have an sparse result set if it has at least one „missing printout value“
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 14
Semantic Gap Heuristic #2: Example
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 15
„Missing Printout Values“
„Missing Printout Values“
Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 16
Public SMW Analysis: Design• Goal
– Do „Semantic Gaps“ exist?– Find out significance of missing result values and near
matches in real world queries
• Crawling public SMW installations– Collected ~200 public SMW installations via overview lists and
search engines– Selection of 8 SMW instances (filtered based on data and
technical reasons and random choice)– Those have on average 1880 annotations and 35 inline
queries
• Checking for sparse & incomplete query results– Analyzing 25 (out of 285) queries (only ASK-Queries, online
"Table"-output format, only queries with printout statements resp. conjunctions)
– 17 of these queries were located on Template pages
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 17
Public SMW Analysis: Results• Printout-Values
– In average, 16% of cells in a result set were empty due to missing annotations (up to 63% for certain queries)
Allows for identifying a total of 296 missing printout values– Validation showed that 13 out of 15 manually investigated
empty cells could be considered missing information
• Near matches– In average, 22% of all potential result pages of a query lack a
selective annotation (up to 94% for certain queries) Allows for identifying a total of 147 potentially missing
annotations for “selective” properties– Validation showed that 10 out of 15 manually investigated
near matches could be considered missing information
• Note: based on evaluation conditions, only around 9% of the overall inline queries were analyzed
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 18
Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 19
Extension:Semantic Need: Idea• Goal:
– How to close „Semantic Gaps“?– Guide the creation of semantic annotations in SMW
• Design principles– „Need-driven Knowledge Sharing“ [Hap09b]– People are willing to contribute missing information, if
they recognize that there is concrete demand– Derived from related work and supported by user studies
• Core features– Capture and store needs (i.e. #ask-queries)– Guide annotations by extending and modifying the SMW
user interface based on information need heuristics (i.e. „near matches“ and „missing printout values“
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2020
Screenshot: In-Page Annotation
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 22
HintHint Sources of needSources of need
Semantic Need Online Survey: Design
• 34 questions on SMW and Semantic Need• Target group: SMW experts (via mailinglist,
invitation) • Data collected in June/July 2010• 30 complete answers (out of 58)
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 23
Semantic Need Online Survey: Semantic Need can help• Problem patterns do occur
– Sparse result set: 12/30 considered problematic
– Incomplete result set: 23/30 considered problematic
• Stressed in free text• Core issue: „invisibility“ of the issue
• Usage of SMW differs– „Structured“ settings focus on quality– „Open“ settings focus on guidance– Semantic Need generally considered helpful by
both groups
24Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China
Maintenance practices: mostly ad hoc• Methods & tools used to maintain semantic data
– (7: n.a.; due to given external data model)– 12: none– 5: „simple“– 7: „advanced“ (e.g. scripts, documentation, team
decisions)
• How to find missing annotations for a given page– 6: Compare similar pages („extensional“)– 7: Check schema („intensional“)– 4: Text analysis– 10: Use query
25Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China
Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 26
Insights• „Semantic Gaps“ do exist
– Information needs are a valuable source to find them
– „Missing printout values“ and „near matches“ seem to be useful heuristics
– Especially „incomplete result sets“ are considered problematic
• No systematic guidance & gardening of SMW knowledge bases
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 27
Design Implications• Semantic Annotation
– Issue: Costly, often driven by a pre-defined ontology structure
– Idea: Consider “incentives for annotation” [Han05]
• Semantic Search– Issue: Decoupling of provision & access– Idea: Consider information needs
• Need specification/ontology• Maintain semantic query logs
• Data Quality/Gardening/Maturing– Issue: The Semantic Web evolves continuously– Idea: Allow for better data quality modeling
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 28
Summary & Outlook• Main contributions
– How to identify „Semantic Gaps“ Heuristics based on queries
– Do „Semantic Gaps“ exist? Yes– If yes, how to close these gaps? Semantic Need
• Next steps– Large scale analysis of „Semantic Gaps“ (more public SMW
instances)– Provide stable implementation und gather feedback from
field usage of Semantic Need
• Further research opportunities– Use needs to guide the sharing of semantic annotations– Use needs to create schema-level mappings or for
class/property evolution– Many more (Semantic query logs, UI, Incentives, …)
29Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China
References• Extension:Semantic Need
– http://amazonas.fzi.de/semanticneed / (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Semantic_Need
• Extension:Woogle4MediaWiki (for non SMW-Wikis)– http://amazonas.fzi.de/wooglenative/ (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Woogle4MediaWiki
• Literature– [Han05] Handschuh, Siegfried: Creating ontology-based metadata by annotation for the semantic web,
Dissertation, 2005– [Hap09b] Hans-Jörg Happel: Towards Need-driven Knowledge Sharing in Distributed Teams. In
Proceedings of the 9th International Conference on Knowledge Management (I-KNOW 2009)– [Hap09c] Hans-Jörg Happel: Social Search and Need-driven Knowledge Sharing in Wikis with Woogle. In
Proceedings of the 5th international Symposium on Wikis and Open Collaboration (Orlando, Florida, October 25 - 27, 2009). WikiSym '09. ACM, New York, NY, 1-10.
– [Mik09]: Mika, P., Meij, E., Zaragoza, H.: Investigating the semantic gap through query log analysis. In: International Semantic Web Conference. Lecture Notes in Computer Science, vol. 5823, pp. 441–455. Springer (2009)
– [Smy05] Smyth, Barry ; Balfe, Evelyn ; Freyne, Jill ; Briggs, Peter ; Coyle, Maurice ; Boydell, Oisin: Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. In: User Modeling and User-Adapted Interaction 14 (2005), Nr. 5, S. 383–423.
– [Tee06] Teevan, Jaime ; Adar, Eytan ; Jones, Rosie ; Potts, Michael: History repeats itself: repeat queries in Yahoo’s logs. In: SIGIR’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2006, S. 703–704.
– [Zha09] Zhang, Dell ; Lu, Jinsong: What queries are likely to recur in web search? In: SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2009, S. 827–828.
Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 30
The Semantic Web: Problems• Lack of resources: you might not annotate everything
– Metadata creation is costly– Access to metadata might be restricted to different spheres of sharing
(private, friends, world…)– “..probably the most important [open question] for the Semantic Web. How
to create incentives for annotation?” (Handschuh 2005, p198) [12]
• Lack of guidance: you might annotate the wrong things– „ Semantic gap between supply and demand on the Semantic Web” [Mik09]– The two processes of metadata creation and metadata use are decoupled
concerning time and actors– Existing annotation approaches drive the annotation process by the pre-
defined ontology structure
No unified theory, why metadata is created and how it is shared– Semantic Web Vision does not address the creator side of metadata –
it spends a lot of effort to convince people using the Semantic Web but not contributing to it
31Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China