A Linked Data Scalability Challenge: Frequently Reused Concepts Lose their Meaning

29
Paolo Pareti University of Edinburgh ACM Web Science Conference 29/6/2015 The Semantic Richness of Linked Data Concepts Vocabulary Reuse Damages Semantics!

Transcript of A Linked Data Scalability Challenge: Frequently Reused Concepts Lose their Meaning

Paolo ParetiUniversity of Edinburgh

ACM Web Science Conference 29/6/2015

The Semantic Richness of Linked Data Concepts

Vocabulary Reuse Damages Semantics!

The Problem

is a

What does class membership tell us?

:x Cat

Semantic Richness

The more facts we can infer about :x,

knowing that :x it is a Cat,

the more Semantically Rich the concept Cat is.

Semantic Richness

The more facts we can infer about :x,

knowing that :x it is a Cat,

the more Semantically Rich the concept Cat is.

Does it have a tail?Is it a mammal?

Semantic Richness

is NOT

Specificity / Information Content

For example, this might have been the set of entitiesin the original definition of the concept Cat.

However, after some time, people started using the term Cat in a more generic way.

Some entities were defined as Cats, despite not being animals.

Even t-shirts could be defined as Cats.

And why not, maybe even some trees...

is a

So what do you actually know about :x, if on the Web anything can be a Cat?

:x Cat

A Linked Data Challenge

The more a concept gets reused…… the least Semantically Rich it becomes.

A Linked Data Challenge

The more a concept gets reused…… the least Semantically Rich it becomes.

Frequently reused concepts lose their meaning.

http://www.w3.org/2002/07/owl#sameAs

This problem already affects highly reused concepts, such as owl:sameAs *

* H. Halpin, P. J. Hayes, J. P. McCusker, D. L. McGuinness, and H. S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. In The Semantic Web - ISWC 2010, volume 6496 of Lecture Notes in Computer Science, pages 305–320. Springer Berlin Heidelberg, 2010.

http://dbpedia.org/resource/Edinburgh

owl:sameAs

owl:sameAs

http://dbpedia.org/resource/Edinburgh

owl:sameAs

owl:sameAs

Originally designed to represent strict equality, owl:sameAs is often (mis)used to represent weaker relations.

http://dbpedia.org/resource/Edinburgh

owl:sameAs

owl:sameAs

In this example, the usage of owl:sameAs is incorrect, as Edinburgh, a picture of Edinburgh

and the location of Edinburgh are three different things.

A Simple Measure of Semantic Richness

We define a measure based on:● the number of common patterns,● and their frequency.

For example: if X is a cat, what can we say about X?

● X is a mammal (frequency: 1.00)● X has a tail (frequency: 0.99)● ...

A Simple Measure of Semantic Richness

Intuitively:● The more patterns, and the more frequent they are,

the more semantically rich the concept is.

Measure motivated by:● Number of Features theory ● Inductive Learning

Main advantage:● Can be automatically and efficiently computed over

large datasets.

DBpedia Ontology

DBpedia OntologyThe DBpedia ontology tree, plotted according to the Semantic Richness of its concepts (each line represents a subclass relation). As we would

expect, Semantic Richness is highly correlated with specificity.

Loss of Semantic Richness in foaf:Person

Loss of Semantic Richness in foaf:PersonHow quickly does Semantic Richness decrease when reusing

a concept? We looked at the concept of foaf:Person as defined in ten different datasets.

Loss of Semantic Richness in foaf:Person

Loss of Semantic Richness in foaf:PersonAs we add external entities of type foaf:Person into a dataset, the

Semantic Richness of this concept quickly decreases. In particular, it falls below the average Semantic Richness

of the original datasets (dotted line).

The Challenge

How can concepts be openly reused on the Web,while at the same time remaining semantically rich?

The end,any questions?