Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets
-
Upload
gong-cheng -
Category
Technology
-
view
386 -
download
4
description
Transcript of Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets
Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets
Gong Cheng, Yanan Zhang, Yuzhong Qu
Websoft Research GroupState Key Laboratory for Novel Software Technology
Nanjing University, China
Association search
Association search
?
?
?air pollution autism
Association search
You
?
?
?
?
Association search on the Web of documents
associations hidden in text
Association search on an entity-relation graph
Alice Bob
article-A
paper-A conf-A
conf-B
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
associations exposed as graph
association = path
Alice Bob
paper-A conf-AinProcOfsecondAuthor reviewer
paper-B conf-BinProcOffirstAuthor chair
paper-B paper-CcitesfirstAuthor firstAuthor
paper-D paper-CcitessecondAuthor firstAuthor
paper-D article-AextendssecondAuthor firstAuthor
Challenge
over 1,000 associationsin DBpedia
(within 4 hops)
How to explore them?
Exploration methods (1)
• Clustering
• Facets
cluster = pattern
paper-A conf-AinProcOfsecondAuthor reviewer
paper-B conf-BinProcOffirstAuthor chair
Paper ConferenceinProcOfauthor role
Common super-property Common class
Position 1 Position 2 Position 3 Position 4 Position 5
associations
pattern
match
Problem: To recommend k patterns
paper-A conf-AinProcOfsecondAuthor reviewer
paper-B conf-BinProcOffirstAuthor chair
paper-B paper-CcitesfirstAuthor firstAuthor
paper-D paper-CcitessecondAuthor firstAuthor
paper-D article-AextendssecondAuthor firstAuthor
Step 1: Mining all significant patterns
Paper ConferenceinProcOfauthor role
paper-A conf-AinProcOfsecondAuthor reviewer
paper-B conf-BinProcOffirstAuthor chair
paper-B paper-CcitesfirstAuthor firstAuthor
paper-D paper-CcitessecondAuthor firstAuthor
paper-D article-AextendssecondAuthor firstAuthor
frequency = 2/5 > threshold
Formulated as frequent itemset mining
1. transaction = associationitem = <position, class> or <position, property>
2. Mining frequent itemsets
3. itemset pattern
paper-A conf-AinProcOfsecondAuthor reviewer
<1, secondAuthor><1, author>
<2, ConfPaper><2, Paper>
<3, inProcOf> <4, Conference> <5, reviewer><5, role>
Position 1 Position 2 Position 3 Position 4 Position 5
Formulated as frequent itemset mining
1. transaction = associationitem = <position, class> or <position, property>
2. Mining frequent itemsets
3. itemset pattern
paper-A conf-AinProcOfsecondAuthor reviewer
<1, author><2, ConfPaper><2, Paper>
<3, inProcOf> <4, Conference><5, role>
Position 1 Position 2 Position 3 Position 4 Position 5
Formulated as frequent itemset mining
1. transaction = associationitem = <position, class> or <position, property>
2. Mining frequent itemsets
3. itemset pattern
paper-A conf-AinProcOfsecondAuthor reviewer
<1, author><2, ConfPaper><2, Paper>
<3, inProcOf> <4, Conference><5, role>
Paper ConferenceinProcOfauthor role
Step 2: Finding k frequent, informative, and small-overlapping patterns
• Frequency (as previous)
• Informativeness
• Overlap
Step 2: Finding k frequent, informative, and small-overlapping patterns
• Frequency (as previous)
• Informativeness• informativeness of a class = self-information of its occurrence
(more informative = having fewer instances)e.g. ConfPaper > Paper
• informativeness of a property = entropy of its values(more Informative = having more diverse values)
e.g. is-author-of > nationality
• Overlap
Paper ConferenceinProcOfauthor role
Step 2: Finding k frequent, informative, and small-overlapping patterns
• Frequency (as previous)
• Informativeness
• Overlap• Ontological overlap: holding subClassOf/subPropertyOf relations
• Contextual overlap: matched by common associations in the results
Paper PapercitesfirstAuthor author
ConfPaper ConferenceinProcOfauthor role
ontological overlap
Formulated as multidimensional 0-1 knapsack
• Find k patterns thatmaximize frequency*Informativeness (goal)
and not share considerably large overlap (constraints)
• Solved by a greedy algorithm
Exploration methods (2)
• Clustering
• Facets• facet values = classes of entities and properties
appearing in associations in the results
• Problem: To recommend k facet values(solved in a similar way)
paper-A conf-AinProcOfsecondAuthor reviewer
ConfPaper Paper Conference
Demo based on DBpediaws.nju.edu.cn/explass
Demo based on DBpediaws.nju.edu.cn/explass
facet values(classes)
facet values(properties)
Demo based on DBpediaws.nju.edu.cn/explass
an expanded pattern
a collapsed pattern
associations not matching any pattern above
User study
• 26 association exploration tasks over DBpedia• Derived from QALD queries and
“People also search for”
• Example: Suppose you will write an article about the associations between Abraham Lincoln and George Washington. Use the given system to explore their associations and identify several themes to discuss in the article.
• 20 subjects
• 3 approaches• Explass: clustering + facets
• RelClus: clustering into a hierarchy of patterns
• RF: facets only (similar to RelFinder)
from QALD
Post-task questionnaire results
Usability scores (SUS)
User behavior
Conclusion
1. Provide patterns wisely.• To avoid deep, complicated hierarchy
• To avoid very general, almost meaningless concepts
2. Combine patterns and facets wisely.• Patterns as meaningful summaries of results
• Facets as filters for refining the search
Filters Summaries of results
Future work
• Performance optimization• (online) path finding
• (online) frequent itemset mining
• Exploring associations between several entitiesor, a data set
Questions?