LinkSUM: Using Link Analysis to Summarize Entity Data

25
KIT The Research University in the Helmholtz Association INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB) www.kit.edu LinkSUM: Using Link Analysis to Summarize Entity Data Andreas Thalhammer , Nelia Lasierra, and Achim Rettinger 16 th International Conference on Web Engineering (ICWE 2016) 08.06.2016 Lugano

Transcript of LinkSUM: Using Link Analysis to Summarize Entity Data

Page 1: LinkSUM: Using Link Analysis to Summarize Entity Data

KIT – The Research University in the Helmholtz Association

INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB)

www.kit.edu

LinkSUM: Using Link Analysis to Summarize Entity Data

Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger

16th International Conference on Web Engineering (ICWE 2016) 08.06.2016

Lugano

Page 2: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

2

Outline

Introduction

Approach: LinkSUM

Related Resources

Predicate Selection

Configuration

Evaluation

Quantitative

Qualitative

Conclusions

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Page 3: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

3

INTRODUCTION

LinkSUM: Using Link Analysis to Summarize Entity Data

08.06.2016

Page 4: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

4

Motivation: Entity Summarization (I)

Examples for entities:

Movies:

Pulp Fiction

Kill Bill vol. 1

Books:

1984

A farewell to arms

People:

John Travolta

Arnold Schwarzenegger

etc.

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Example for data about an entity:

Page 5: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

5

Motivation: Entity Summarization (II)

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

How to decide:

Which facts should we show?

Facts are unranked in the

knowledge base.

Entities have individual

features (even if they are of

the same type).

Page 6: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

6

Idea

Use link analysis for selecting facts.

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Step 1: Select top-k important related resources.

Step 2: Select the most relevant connecting predicate.

Strongly relevance-oriented.

Lightweight.

Avoids redundancy.

Page 7: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

7

APPROACH: LINKSUM

LinkSUM: Using Link Analysis to Summarize Entity Data

08.06.2016

Page 8: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

8

Related Resources (I)

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Compute PageRank [1] scores (pr) of entities with (untyped) links that

occur in textual descriptions of entities (i.e., Wikipedia).

l(r) – set of incoming links of r.

c(r) – number of outgoing links of r.

d – damping factor (usually 0.85).

Example:

dbpedia:Category:English-language_films 220.961

dbpedia:Quentin_Tarantino 137.403

dbpedia:John_Travolta 105.771

dbpedia:Miramax_Films 993.986

... ...

Page 9: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

9

Related Resources (II)

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Use Backlinks [2] for finding strong connections:

Example:

Pulp FictionQuentin

Tarantino

director

dbpedia:Quentin_Tarantino dbpedia:Roger_Avary

dbpedia:Bruce_Willis dbpedia:Tim_Roth

dbpedia:John_Travolta dbpedia:Ving_Rhames

dbpedia:Samuel_L._Jackson dbpedia:Amanda_Plummer

dbpedia:Harvey_Keitel dbpedia:Lawrence_Bender

dbpedia:Miramax_Films dbpedia:Sally_Menke

dbpedia:Uma_Thurman dbpedia:Maria_de_Medeiros

dbpedia:Andrzej_Sekuła dbpedia:Rosanna_Arquette

dbpedia:Christopher_Walken dbpedia:Eric_Stoltz

Page 10: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

10

Related Resources (III)

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Combined score for related resources:

Linear combination.

Normalized PageRank scores.

Indicator function on the set of Backlinks of e (bl(e)).

Parameter α (alpha) to be estimated.

Page 11: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

11

Predicate Selection

Problem: multiple predicates connect two resources.

Approaches:

Frequency (FRQ)

#times the predicate is used

Exclusivity (EXC)

1 / (N + M)

Description (DSC):

#domain + #range + #label

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

and combinations

of those, e.g. (FRQ * EXC)

Pulp

Fiction

Quentin

Tarantino

starring

director

Page 12: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

12

CONFIGURATOIN

LinkSUM: Using Link Analysis to Summarize Entity Data

08.06.2016

Page 13: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

13

Dataset and Measure

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Introduced in Gunaratna et al. [3].

Contains human-created summaries of 50 entities

(DBpedia 3.9, outgoing relations).

Includes seven top-5 and seven top-10 summaries for each entity.

The dataset was created by 15 experts from the Semantic Web field.

Used measure:

Page 14: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

14

Configuration

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

top-5 top-10

Parameters:

α – linear combination of PageRank and Backlinks.

Predicate selection – combinations of FRQ, EXC, and DSC.

Best configuration: α = 0.8 / α = 0.9, FRQ*EXC*DSC

Page 15: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

15

EVALUATION

LinkSUM: Using Link Analysis to Summarize Entity Data

08.06.2016

Page 16: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

16

Setup: Quantitative Evaluation

Compare results to the FACES system (introduced in [3]).

FACES:

Semantically diverse predicates via clustering.

Basic ranking heuristic for selecting cluster representatives.

Dataset and quality measure: like in configuration.

Evaluated configurations:

config-1: α = 0.8, FRQ*EXC*DSC

config-2: α = 0.9, FRQ*EXC*DSC

Significance testing:

Wilcoxon Signed-Rank Test with two tails.

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Page 17: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

17

Results: Quantitative Evaluation

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

SO: Subject-Object pairs (predicates not considered).

SPO: Full triple.

Significance with respect to both LinkSUM configurations (p < 0.05).

Significance with respect to the best LinkSUM configuration (p < 0.05).

Standard deviation.

SD

Page 18: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

18

Setup: Qualitative Evaluation

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Scenario: Search Engine Result Page (SERP).

20 users, 10 entities (from the FACES dataset).

Page 19: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

19

Results: Qualitative Evaluation

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

In some cases the task is

subjective.

Reasons for:

Selection

- the presented related

resources are relevant for

the entity.

Rejection

- redundancy.

- related resources do not

characterize the entity.

Page 20: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

20

CONCLUSIONS

LinkSUM: Using Link Analysis to Summarize Entity Data

08.06.2016

Page 21: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

21

Conclusions

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

LinkSUM improves on the state of

the art in entity summarization.

LinkSUM is lightweight and can be

applied in other scenarios, e.g.

Web sites with semantic

annotations.

Semantic MediaWikis.

Entity summarization in SERP

scenarios:

Focus should be on selecting

relevant resources.

Redundancies at the object level

should be avoided.

Page 22: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

22

Lessons learned and future directions

Selected facts should provide information about the entity (main

difference to recommender systems).

Summaries and rankings are often subjective (but general tendencies

are noticeable).

Established quantitative evaluation datasets are still missing (although

different research efforts already targeted that problem).

Presentation aspects are very important (these should be neutralized in

qualitative evaluation [4]).

Personalization and contextualization of entity summaries is becoming

an important field (LinkSUM can serve as a basis).

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Page 23: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

23 08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Questions?

[email protected]

@thalhamm

Page 24: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

24

Resources

DBpedia PageRank dataset:

http://people.aifb.kit.edu/ath/#DBpedia_PageRank

LinkSUM:

http://km.aifb.kit.edu/services/link/

International Workshop on Summarizing and Presenting Entities and

Ontologies:

http://km.aifb.kit.edu/ws/sumpre2015

http://km.aifb.kit.edu/ws/sumpre2016

FACES:

http://wiki.knoesis.org/index.php/FACES

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data

Page 25: LinkSUM: Using Link Analysis to Summarize Entity Data

Institute of Applied Informatics and Formal

Description Methods (AIFB)

25

References

1. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search

engine. In Proceedings of the 7th International Conference on World Wide

Web 7. Elsevier, 1998.

2. J. Waitelonis and H. Sack. Towards exploratory video search using linked

data. Multimedia Tools and Applications, 59:645–672, 2012. 10.1007/s11042-

011-0733-1.

3. K. Gunaratna, K. Thirunarayan, and A. P. Sheth. FACES: Diversity-Aware

Entity Summarization Using Incremental Hierarchical Conceptual Clustering.

In Proceedings of the 29th AAAI Conf. Artificial Intelligence, 2015, Austin,

Texas, USA., 2015.

4. A. Thalhammer and S. Stadtmüller. SUMMA: A Common API for Linked Data

Entity Summaries. In Engineering the Web in the Big Data Era. Springer,

2015.

08.06.2016 LinkSUM: Using Link Analysis to Summarize Entity Data