Towards Exploratory Relationship Search: A Clustering-based Approach

Post on 15-Jun-2015

452 views 0 download

Tags:

description

Presented at JIST2013, Seoul, Korea.

Transcript of Towards Exploratory Relationship Search: A Clustering-based Approach

Towards Exploratory Relationship Search: A Clustering-Based Approach

Yanan Zhang, Gong Cheng, Yuzhong QuNanjing University, China

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Relationship search

Searching graph-structured data

relatonship = path

Too many results!

Exploratory relationship search

• Exploring a set of relationships interactively and continuously

clustering(our solution: RelClus)

faceted categories(RelFinder)

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Challenges

• How to meaningfully label a cluster?• How to make sense of a cluster hierarchy?• How to measure similarity between clusters?

Agglomerative hierarchical clustering• Initially: relationships singleton clusters• Then: progressively merge the most similar pair

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Relationship pattern

• High-level abstraction of relationships– Vertices: entities or classes– Edges: properties (undirected)

How to meaningfully label a cluster?

• Using a leastest common relationship pattern– Vertices: leastest common classes (or entities)– Edges: leastest common properties

Person

label({R4, R5}) = P1

P1

R4

R5

How to make sense of a cluster hierarchy?

• subPatternOf ( )⊑– Vertices: s.t. subClassOf (or instance-type)– Edges: s.t. subPropertyOf

P3

P2

P1

P2 P⊑ 3, P1 P⊑ 3

How to measure similarity between clusters?

• sim(Ci,Cj) = how many commonalities they share

which are exactly captured by label(Ci C∪ j)– Measure: -log (probability of seeing label(Ci C∪ j))

i.e. the information content associated with label(Ci C∪ j)– Probability estimation: based on the data set

P3

P2

P1

A running exampleP3

P2

P1

R4

R5

R1

R2

R3

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Design• Data set: DBpedia• Systems

– RList: just a list of all results– RFacet: w/ faceted categories (similar to RelFinder)– RClus: w/ hierarchical clustering (our solution)

• Participants and tasks– 2 participants provide searh tasks

• 3 (well-defined) lookup tasks• 3 (open) exploratory search tasks

– 15 participants carry out tasks

• Metrics– Questionnaire– SUS– User feedback

Questionnaire results

Some inspiring user feedback

• Dislike deep hierarchies• Expect more concise visualization• Need more cognitive support

Performance testing

Outline

• Motivation• Challenges• Approach• Evaluation• Conclusion

Conclusion

• Goal: clustering-based exploratory relationship search• Approach: pattern-centric

• Future work– Combining faceted categories and hierarchical clustering– Going beyond them