Social Relation Based Scalable Semantic Search Refinement

14

Click here to load reader

description

This is a talk presented at the 2009 Asian Scalable Semantic Data Processing Workshop co-located with the 2009 Asian Semantic Web Conference.

Transcript of Social Relation Based Scalable Semantic Search Refinement

Page 1: Social Relation Based Scalable Semantic Search Refinement

1

Social Relation Based Scalable Social Relation Based Scalable Semantic Search RefinementSemantic Search RefinementYi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3,

Zhisheng Huang4, Yan Wang1

1. International WIC Institute, Beijing University of Technology, China2. Carnegie Mellon University, USA

3. Maebashi Institute of Technology, Japan4. Vrije University Amsterdam, the Netherlands

Page 2: Social Relation Based Scalable Semantic Search Refinement

2

Motivation• Vague/Incomplete queries over large scale semantic data

(How to get more refined queries to reduce the size of the result set?).• Large scale semantic data vs most relevant data for a specific user

Diversity for different users in the context of large scale semantic data

User interests Network of friends, collaborators, etc.

Interests based search refinement

Search refinement through social relationship

Group interests based search refinement

Page 3: Social Relation Based Scalable Semantic Search Refinement

3

Social Relations and Social Networks

• Approximate power law distribution not many authors who have a lot of coauthors, and most of the authors are with very few coauthors.

• Considering the scalability issue, when the number of authors expand rapidly, it will not hard to rebuild the coauthor network since most of the authors will just have a few links.

Fig. 1: Coauthor number distribution inthe SwetoDBLP dataset.

Fig. 2: log-log diagram of Figure 1.

• Most of the social networks follow the power law distribution.• Using the FOAF vocabularies, the DBLP coauthor network is created.

Page 4: Social Relation Based Scalable Semantic Search Refinement

4

Search Refinement through Social Relationship

Satisfied Authors withoutsocial relation refinement

Satisfied Authors withsocial relation refinement

Carl Kesselman (312) Thomas S. Huang (271)

Edward A. Fox (269) Lei Wang (250)

John Mylopoulos (245) Ewa Deelman (237)

...

Hans W. Guesgen (117) *Virginia Dignum (69) *John McCarthy (65) *Aaron Sloman (36) *

Carl Kesselman (312)Thomas S. Huang (271)

...

Table 1: A partial result of the expert finding search task“Artificial Intelligence authors”(User name: John McCarthy).

In an enterprise setting, if the found experts have some previous relationship with the employer, the cooperation may be smoother.

Bridging two separate datasets together and help to refine the expert finding task.

Domain experts dataset

Coauthor Network dataset

User URIs

Page 5: Social Relation Based Scalable Semantic Search Refinement

5

Social Network based Interest Retention Models for Search Refinement

Page 6: Social Relation Based Scalable Semantic Search Refinement

6

Obtaining the Retained Interests• Are retained interests appeared more frequently than others?

(Frequency) Total Interest : • Except for frequency, what else is important to correctly obtain retained

interests?Forgetting mechanism in cognitive memory retention

(exponential function model, power function model) [Anderson, Schooler 1991].

∑ ==

n

jjimiTI

1),()(

Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 889-894, 1993.

(Frequency and Recency) Memory Retention: ;bT bP Ae P AT− −= =

Page 7: Social Relation Based Scalable Semantic Search Refinement

7

Obtaining the Retained Interests

[Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose Quesada and Yi Zeng, March 30, 2009.[Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009.[Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009, July 31-August 1, 2009.

• (Frequency and Recency) Exponential Model for Interest Retention :

• (Frequency and Recency) Power Model for Interest Retention :

,

1( ) ( , ) i jn bT

jEIR i m i j Ae−

== ×∑

,1( ) ( , )n b

i jjPIR i m i j AT −

== ×∑

Page 8: Social Relation Based Scalable Semantic Search Refinement

8

Obtaining the Retained Interests• To some extend, current

interests are relevant to interest retention.Using the power law model, under A=0.855, and b=1.295, we selected all the authors whose publication numbers are above 100, and we predict their top 9 interests from 2000 to 2007 using interest retention (1226 persons). 49.54% of this samples can predict 3 out of 9 interests.

Figure 7a: A comparative study of total research interests from 1990 to 2008 and retained interests in 2009 (based on both the power law and exponential law models)

Figure 7b: Difference on the contribution values from papers published in different years

• We analyzed research Interest retention for all the 615,124 computer scientists based on the SwetoDBLP dataset. We released the “computer scientists’ research interest RDF dataset :

http://www.iwici.org/dblp-ssehttp://wiki.larkc.eu/csri-rdfFigure 7c: A Comparison of Total Interests and Interest Retentions

of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP)

Page 9: Social Relation Based Scalable Semantic Search Refinement

9

Retained Interests in a Social Environment

Top 9 Retained Interests

Top 9 Group Retained Interests

9

9

1

1 (i RI )( , ) ,

0 (i RI )

( ) ( , ),

topp

topp

n

p

E i p

GIR i E i p=

⎧ ∈⎪= ⎨∉⎪⎩

=∑

Web 7.81 Search 35

Search 5.59 Retrieval 30

Retrieval 3.19 Web 28

Information 2.27 Information 26

Query 2.14 System 19

Engine 2.10 Query 18

Minining 1.26 Analysis 14

Challenge … Text …

Analysis … Model …

Group Retained Interest :

Group Retained Interests :• Diversity• Consistency

Top 9 interests retention of a user and his group interests retention. (Ricardo A. Baeza-Yates, based on May 2008 version of SwetoDBLP).

For most prolific authors in DBLP (publication number >50): 5161 personsOn average, 52.55% of an individual’s retained interests are consistent with his/her group retained interests.

Carlos Castillo

Ricardo A. Baeza-Yates

Web

PageRank

Network

Spam

Search

DetectionAnalysis

Link

ContentWeb

Search

RetrievalInformation

Query

Analysis

Challenge

Engine Mining

Page 10: Social Relation Based Scalable Semantic Search Refinement

10

Search Refinement by Interests from Different Perspectives

• Vague/incomplete queries may produce too many results that the users have to wade through.

• Research interests may be very related with search tasks.

• Research interests can be evaluated from various perspectives.(1) Total Interests;(2) Retained Interests;(3) Co-author Group retained interests;

Page 11: Social Relation Based Scalable Semantic Search Refinement

11

Refinement with Retained interests, group retained interests

8 requests to DBLP authorswere sent out.

7 replied.

Participants 7 DBLP authors:• Preference order 100% :

• Preference order 100% :

• Preference order 83.3% :

• Preference order 16.7% :

2, 3 1List List List

2 3List List≈

2 3 1List List List>

3 2 1List List List>

Page 12: Social Relation Based Scalable Semantic Search Refinement

12

Future Research

Page 13: Social Relation Based Scalable Semantic Search Refinement

13

Semantic Similarity---- Obtaining More Accurate Interest Descriptions and

Observations of Interest Dynamics

Figure 14. Consistent interests without consideration of semantic similarity.

Carlos Castillo

Ricardo A. Baeza-Yates

Web

PageRank

Network

Spam

Search

DetectionAnalysis

Link

ContentWeb

Search

RetrievalInformation

Query

Analysis

Challenge

Engine Mining

Carlos Castillo

Ricardo A. Baeza-Yates

Web

PageRank

Network

Spam

Search

DetectionAnalysis

Link

ContentWeb

Search

RetrievalInformation

Query

Analysis

Challenge

Engine Mining

Figure 15. Consistent interests with consideration of semantic similarity.

search retrieval 0.645

search query 0.552

search pagerank 0.813

retrieval query 0.467

retrieval pagerank 0.293

query pagerank 0.098

logic reasoning 0.667

logic inference 0.606

reasoning inference 0.909

ontology OWL 0.805

Table . Some examples on semantic similarities based on Normalized Google Distance.

Page 14: Social Relation Based Scalable Semantic Search Refinement

14

Thank You!