From Linked Data to Tightly Integrated Data

97
From Linked Data to Tightly Integrated Data May 2014 Gerard de Melo Tsinghua University, Beijing From Linked Data to Tightly Integrated Data May 2014 Gerard de Melo Tsinghua University, Beijing

description

Invited Talk at the 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing. Reykjavik, Iceland, 27th May 2014 The ideas behind the Web of Linked Data have great allure. Apart from the prospect of large amounts of freely available data, we are also promised nearly effortless interoperability. Common data formats and protocols have indeed made it easier than ever to obtain and work with information from different sources simultaneously, opening up new opportunities in linguistics, library science, and many other areas. In this talk, however, I argue that the true potential of Linked Data can only be appreciated when extensive cross-linkage and integration engenders an even higher degree of interconnectedness. This can take the form of shared identifiers, e.g. those based on Wikipedia and WordNet, which can be used to describe numerous forms of linguistic and commonsense knowledge. An alternative is to rely on sameAs and similarity links, which can automatically be discovered using scalable approaches like the LINDA algorithm but need to be interpreted with great care, as we have observed in experimental studies. A closer level of linkage is achieved when resources are also connected at the taxonomic level, as exemplified by the MENTA approach to taxonomic data integration. Such integration means that one can buy into ecosystems already carrying a range of valuable pre-existing assets. Even more tightly integrated resources like Lexvo.org combine triples from multiple sources into unified, coherent knowledge bases. Finally, I also comment on how to address some remaining challenges that are still impeding a more widespread adoption of Linked Data on the Web. In the long run, I believe that such steps will lead us to significantly more tightly integrated Linked Data.

Transcript of From Linked Data to Tightly Integrated Data

Page 1: From Linked Data to Tightly Integrated Data

From Linked Data toTightly Integrated Data

May 2014

Gerard de MeloTsinghua University, Beijing

From Linked Data toTightly Integrated Data

May 2014

Gerard de MeloTsinghua University, Beijing

Page 2: From Linked Data to Tightly Integrated Data

25 Years of the World Wide Web:1989−2014

25 Years of the World Wide Web:1989−2014

http://geekcom.wordpress.com/2009/03/19/

Tim Berners-Lee

Gerard de Melo

Page 3: From Linked Data to Tightly Integrated Data

25 Years of the World Wide Web:1989−2014

25 Years of the World Wide Web:1989−2014

http://geekcom.wordpress.com/2009/03/19/

Tim Berners-Lee Documents forhuman viewingDocuments forhuman viewing

Gerard de Melo

Page 4: From Linked Data to Tightly Integrated Data

From Text to Structured DataFrom Text to Structured Data

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

IE

Source: Marko Grobelnik, Dunja Mladenic. KDD 2007.

NAME TITLE ORGANIZATIONBill Gates CEO MicrosoftBill Veghte VP MicrosoftRichard Stallman founder Free Soft..

Gerard de Melo

Page 5: From Linked Data to Tightly Integrated Data

The Semantic WebThe Semantic Web

http://geekcom.wordpress.com/2009/03/19/

Tim Berners-Lee

col-league

born in Frankfurt

describedby

created by

Publish datain the right formright from the start

Publish datain the right formright from the start

createdby

Gerard de Melo

Page 6: From Linked Data to Tightly Integrated Data

The Semantic WebThe Semantic Web

Assign URIs not just toDocuments, also to People, etc.

Assign URIs not just toDocuments, also to People, etc.

http://www.demelo.org/gdm/#GDMhttp://dblp.l3s.de/d2r/page/publications/conf/cikm/MeloW09

Assign URIs to Predicates (Edge Types)Assign URIs to Predicates (Edge Types)

created by

http://purl.org/dc/elements/1.1./creator

Gerard de Melo

Page 7: From Linked Data to Tightly Integrated Data

Challenge:Simplify Publishing

Challenge:Simplify Publishing

Gerard de Melo

Page 8: From Linked Data to Tightly Integrated Data

Challenge:Simplify Publishing

Challenge:Simplify Publishing

http://www.gauson.com/blog/2007/12/09/minimal-template-for-blogspot/

Gerard de Melo

Page 9: From Linked Data to Tightly Integrated Data

Challenge:Simplify Publishing

Challenge:Simplify Publishing

Freebase:Better UI butnot universal

Freebase:Better UI butnot universal

Gerard de Melo

Page 10: From Linked Data to Tightly Integrated Data

Big Knowledge GraphsBig Knowledge Graphs

Gerard de Melo

Page 11: From Linked Data to Tightly Integrated Data

Big Knowledge Graphs

YAGO2. Hoffart et al. WWW 2011.

YAGO2. Hoffart et al. WWW 2011.

Gerard de Melo

Page 12: From Linked Data to Tightly Integrated Data

Lexical Knowledge Bases

Gerard de Melo

Page 13: From Linked Data to Tightly Integrated Data

Etymological Wordnet

LREC 2014Poster Session P17

16:45-18:05

LREC 2014Poster Session P17

16:45-18:05

Also Christian Chiarcos

today

Also Christian Chiarcos

today

Gerard de Melo

Page 14: From Linked Data to Tightly Integrated Data

Lexical Intensity OrderingsLexical Intensity Orderings

goodgood

okayokay

greatgreat

superbsuperb

<

<

<

weak

strong

de Melo & BansalTransactionsof the ACL,

2013.

de Melo & BansalTransactionsof the ACL,

2013.

Gerard de Melo

Page 15: From Linked Data to Tightly Integrated Data

Metaphors: ICSI MetaNet Project

Gerard de Melo

Page 16: From Linked Data to Tightly Integrated Data

Common-Sense

Relations,Properties,

Comparisons

Tandon et al.WSDM 2014.

Tandon et al.AAAI 2014.

Tandon et al.AAAI 2011.

Common-Sense

Relations,Properties,

Comparisons

Tandon et al.WSDM 2014.

Tandon et al.AAAI 2014.

Tandon et al.AAAI 2011.

WebChild: Common-SenseWebChild: Common-Sense

Gerard de Melo

Page 17: From Linked Data to Tightly Integrated Data

Input: Keywords, the World's Data

Output:Address User's Needs

Linked Data in UseLinked Data in Use

Gerard de Melo

Page 18: From Linked Data to Tightly Integrated Data

Linked Data In Use

Gerard de Melo

Page 19: From Linked Data to Tightly Integrated Data

Linked Data In Use

used in IBM's Jeopardy!-winning Watson system

Gerard de Melo

Page 20: From Linked Data to Tightly Integrated Data

The PlanThe Plan

Linked Data

Really Linked Data

Integrated Data

Tightly Integrated Data

Page 21: From Linked Data to Tightly Integrated Data

The PlanThe Plan

Linked Data

Really Linked Data

Integrated Data

Tightly Integrated Data

Page 22: From Linked Data to Tightly Integrated Data

Really Linked DataReally Linked Data

Just converting toRDF is trivial

Just converting toRDF is trivial

Gerard de Melo

Page 23: From Linked Data to Tightly Integrated Data

Really Linked DataReally Linked Data

use entitiesinstead of

literals wherepossible

use entitiesinstead of

literals wherepossible

Book 23Book 23 “Franz Kafka”“Franz Kafka”author

Gerard de Melo

Page 24: From Linked Data to Tightly Integrated Data

Really Linked DataReally Linked Data

use entitiesinstead of

literals wherepossible

use entitiesinstead of

literals wherepossible

Book 23Book 23

“Franz Kafka”“Franz Kafka”

authorAuthor 14Author 14

name

PraguePrague

born in

Gerard de Melo

Page 25: From Linked Data to Tightly Integrated Data

Really Linked DataReally Linked Data

use entitiesinstead of

literals wherepossible

use entitiesinstead of

literals wherepossible

Performance 1Performance 1 “en”“en”language

Performance 2Performance 2 “English”“English”language

Performance 3Performance 3 “engl.”“engl.”language

Gerard de Melo

Page 26: From Linked Data to Tightly Integrated Data

Really Linked DataReally Linked Data

use entitiesinstead of

literals wherepossible

use entitiesinstead of

literals wherepossible

Performance 1Performance 1 language

Performance 2Performance 2 EnglishEnglishlanguage

Performance 3Performance 3 language

http://lexvo.org/id/iso639-3/eng

Gerard de Melo

Page 27: From Linked Data to Tightly Integrated Data

Vocabulary / Ontology Re-UseVocabulary / Ontology Re-Use

http://lov.okfn.org/

Gerard de Melo

Page 28: From Linked Data to Tightly Integrated Data

Vocabulary / Ontology Re-UseVocabulary / Ontology Re-Use

Gerard de Melo

Page 29: From Linked Data to Tightly Integrated Data

Vocabulary / Ontology Re-UseVocabulary / Ontology Re-Use

Gerard de Melo

Page 30: From Linked Data to Tightly Integrated Data

Linked Data CloudLinked Data Cloud

Gerard de Melo

Page 31: From Linked Data to Tightly Integrated Data

Linked Data CloudLinked Data Cloud

Gerard de Melo

Page 32: From Linked Data to Tightly Integrated Data

Identifiers and Cross-LinkageIdentifiers and Cross-Linkage

Arguably more important than RDF as a format

Example: Google Knowledge Graph

Buy intorich existingeco-systems

Buy intorich existingeco-systems

Gerard de Melo

Page 33: From Linked Data to Tightly Integrated Data

Focal Point: WordNet

UWN (CIKM 2009):over 1,000,000 words in over 100 languages

Gerard de Melo

Page 34: From Linked Data to Tightly Integrated Data

UWN/MENTA: Universal WordNetUWN/MENTA: Universal WordNet

Gerard de Melo

Page 35: From Linked Data to Tightly Integrated Data

Lexvo.org

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Cyrllic(Script) Cyrllic(Script)

Ukraine Ukraine

GeoNames

Ukraine Ukraine

owl:sameAs

UkrainianUkrainianUkrainianUkrainian

Ukraine Ukraine

Gerard de Melo

Page 36: From Linked Data to Tightly Integrated Data

Lexvo.org

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Cyrllic(Script) Cyrllic(Script)

Ukraine Ukraine

UkrainianUkrainianUkrainianUkrainian

Ukraine Ukraine

My Resource

UkrainianUkrainian

Lexvo.org APIIdentifiers

.getLanguageURIforISO639P1("uk")

Gerard de Melo

Page 37: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Lexvo.org APIIdentifiers

.getTermURI("car", "eng")

RDF “car”@en l:means sumo:Automobile

lexvo:term/eng/car l:means sumo:Automobile

Gerard de Melo

Page 38: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Gerard de Melo

Page 39: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Gerard de Melo

Page 40: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Gerard de Melo

Page 41: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.org

Semantic WebSemantic WebJournal 2014Journal 2014Semantic WebSemantic WebJournal 2014Journal 2014 Gerard de Melo

Page 42: From Linked Data to Tightly Integrated Data

Focal Point: Lexvo.orgFocal Point: Lexvo.org

Lexvo.orgLexvo.org

Roget'sThesaurus

Roget'sThesaurus

WordNetEvocation Links

WordNetEvocation Links

EtymologicalWordNet

EtymologicalWordNet

PropBanklexicon

PropBanklexicon

NomBanklexicon

NomBanklexicon

MPQA SubjectivityLexicon

MPQA SubjectivityLexicon

MPQA SubjectivityLexicon

MPQA SubjectivityLexicon

AFINNAffective Lexicon

AFINNAffective Lexicon

CMU Pronunciation

Dictionary

CMU Pronunciation

Dictionary

Gerard de Melo

Page 43: From Linked Data to Tightly Integrated Data

Linked EntitiesLinked Entities

Source: Gerhard Weikum. For a few Triples more.

Gerard de Melo

Page 44: From Linked Data to Tightly Integrated Data

Linked EntitiesLinked Entities

Gerard de Melo

Page 45: From Linked Data to Tightly Integrated Data

LINDA: Creating Links

Gerard de Melo

Page 46: From Linked Data to Tightly Integrated Data

LINDA: Creating Links

Gerard de Melo

LINDA:Böhm et al.CIKM 2012

LINDA:Böhm et al.CIKM 2012

Page 47: From Linked Data to Tightly Integrated Data

LINDA: Creating Links

Gerard de Melo

LINDA:Böhm et al.CIKM 2012

LINDA:Böhm et al.CIKM 2012

Page 48: From Linked Data to Tightly Integrated Data

LINDA: Creating Links

Gerard de Melo

LINDA:Böhm et al.CIKM 2012

LINDA:Böhm et al.CIKM 2012

Page 49: From Linked Data to Tightly Integrated Data

LINDA: Creating LinksLINDA: Creating LinksLINDA: Creating LinksLINDA: Creating Links

LINDA:Böhm et al.CIKM 2012

LINDA:Böhm et al.CIKM 2012

Scale to Billion Triples Challenge Datasetdespite dependenciesScale to Billion Triples Challenge Datasetdespite dependencies

Gerard de Melo

Page 50: From Linked Data to Tightly Integrated Data

Lexvo.org

SameAs LinksSameAs Links

Ukraine Ukraine

GeoNames

Ukraine Ukraine

owl:sameAs

Ukraine Ukraine

Leibnizian Identity

For all x:x=x

For all x, y, p:x=y => p(x)=p(y)

Gerard de Melo

Page 51: From Linked Data to Tightly Integrated Data

Identity vs. Near-IdentityIdentity vs. Near-Identity

OfficialStandard& Leibniz

Automaticlinkers &

sameas.org

Einstein's Miracle YearEinstein's

Miracle Year

owl:sameAs

EinsteinEinstein

Gerard de Melo

Page 52: From Linked Data to Tightly Integrated Data

Merging Lexical Resources

ACL 2010AAAI 2013ACL 2010AAAI 2013

Gerard de Melo

Page 53: From Linked Data to Tightly Integrated Data

Merging Lexical Resources

ACL 2010AAAI 2013ACL 2010AAAI 2013

Gerard de Melo

Page 54: From Linked Data to Tightly Integrated Data

Identity ConstraintsIdentity Constraints

Idea: Idea: Exploit Dataset-specificUnique NamesAssumptions

Idea: Idea: Exploit Dataset-specificUnique NamesAssumptions

dbpedia: Pauldbpedia: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paula

dbpedia: Pauladbpedia: Paula

freebase: Paulfreebase: Paul

Gerard de Melo

Page 55: From Linked Data to Tightly Integrated Data

Identity ConstraintsIdentity Constraints

Idea: Idea: Exploit Dataset-specificUnique NamesAssumptions

Idea: Idea: Exploit Dataset-specificUnique NamesAssumptions

dbpedia: Pauldbpedia: Paul

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paula

dbpedia: Pauladbpedia: Paula

freebase: Paulfreebase: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

Gerard de Melo

Page 56: From Linked Data to Tightly Integrated Data

Identity ConstraintsIdentity Constraints

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paulafreebase: Paulfreebase: Paul

dbpedia: Pauldbpedia: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

Use set-based formalism to Use set-based formalism to account for exceptions + account for exceptions + to avoid quadratic number of to avoid quadratic number of pairwise constraintspairwise constraints

Use set-based formalism to Use set-based formalism to account for exceptions + account for exceptions + to avoid quadratic number of to avoid quadratic number of pairwise constraintspairwise constraints

dbpedia: Pauladbpedia: Paula

Gerard de Melo

Page 57: From Linked Data to Tightly Integrated Data

Identity ConstraintsIdentity Constraints

Add edge weightsAdd edge weightsAdd edge weightsAdd edge weights

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paulafreebase: Paulfreebase: Paul

2 2

1

1

1

1

dbpedia: Pauldbpedia: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

dbpedia: Pauladbpedia: Paula

Goal: Consistency Goal: Consistency minimizing weightedminimizing weightededge deletionsedge deletions

Goal: Consistency Goal: Consistency minimizing weightedminimizing weightededge deletionsedge deletions

Gerard de Melo

Page 58: From Linked Data to Tightly Integrated Data

Capture separation betweennodes, which requiresedge deletions along all paths

Capture separation betweennodes, which requiresedge deletions along all paths

AlgorithmAlgorithm

See Paper for details, incl. relationship toHungarian Algorithm and Graph Cuts

See Paper for details, incl. relationship toHungarian Algorithm and Graph Cuts

Gerard de Melo

Page 59: From Linked Data to Tightly Integrated Data

AlgorithmAlgorithm

Leighton & Rao style Leighton & Rao style Region GrowingRegion GrowingLeighton & Rao style Leighton & Rao style Region GrowingRegion Growing

dbpedia: Pauldbpedia: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paula

dbpedia: Pauladbpedia: Paula

freebase: Paulfreebase: Paul

2 2

1

1

1

1

Gerard de Melo

Page 60: From Linked Data to Tightly Integrated Data

AlgorithmAlgorithm

Leighton & Rao style Leighton & Rao style Region GrowingRegion GrowingLeighton & Rao style Leighton & Rao style Region GrowingRegion Growing

dbpedia: Pauldbpedia: Paul

dbpedia:Paulie (redirect)

dbpedia:Paulie (redirect)

musicbrainz: Paulie

musicbrainz: Paulie

dblp: Pauladblp: Paula

dbpedia: Pauladbpedia: Paula

freebase: Paulfreebase: Paul

2 2

1

1

1

1

Gerard de Melo

Page 61: From Linked Data to Tightly Integrated Data

ExperimentsExperiments

BTC: BTC: Large Linked Data Web crawl, 20GB gzipped

sameas.org:sameas.org:Most well-known collections of sameAs links,aggregated from various Linked Data sources

BTC: BTC: Large Linked Data Web crawl, 20GB gzipped

sameas.org:sameas.org:Most well-known collections of sameAs links,aggregated from various Linked Data sources

Gerard de Melo

Page 62: From Linked Data to Tightly Integrated Data

Identity ConstraintsIdentity Constraints

Gerard de Melo

Page 63: From Linked Data to Tightly Integrated Data

ExperimentsExperiments

>500,000 node pairs,>500,000 node pairs,but algorithm removesbut algorithm removesonly 280,000 edgesonly 280,000 edges

>500,000 node pairs,>500,000 node pairs,but algorithm removesbut algorithm removesonly 280,000 edgesonly 280,000 edges

Gerard de Melo

Page 64: From Linked Data to Tightly Integrated Data

Identity LinksIdentity Links

Must distinguish identity fromnear-identityCan automatically identify 500,000 inconsistent URI pairsFix using LP Graph Algorithm

Must distinguish identity fromnear-identityCan automatically identify 500,000 inconsistent URI pairsFix using LP Graph Algorithm

Use more specific properties!

lvont:strictlySameAs (Lexvo.org)skos:closeMatch

etc.

Use more specific properties!

lvont:strictlySameAs (Lexvo.org)skos:closeMatch

etc.Gerard de Melo

Page 65: From Linked Data to Tightly Integrated Data

Questions?Questions?

Image: Question Answering over Linked Data Workshop

Gerard de Melo

Page 66: From Linked Data to Tightly Integrated Data

The PlanThe Plan

Linked Data

Really Linked Data

Integrated Data

Tightly Integrated Data

Page 67: From Linked Data to Tightly Integrated Data

Taxonomic Links

a user wantsa list of

„Art Schools in Europe“

Gerard de Melo

Page 68: From Linked Data to Tightly Integrated Data

Multilingual Taxonomies

a Swedish user wants

a list of

„Konstskolor i Europa“

Gerard de Melo

Page 69: From Linked Data to Tightly Integrated Data

MENTA

200+ Wikipedia editions200+ Wikipedia editionsWordNetWordNetEtc.Etc.

200+ Wikipedia editions200+ Wikipedia editionsWordNetWordNetEtc.Etc.

Gerard de Melo

Page 70: From Linked Data to Tightly Integrated Data

Predict Individual Identity Links:WordNet-WikipediaArticle-RedirectArticle-Categoryetc.

Predict Individual Identity Links:WordNet-WikipediaArticle-RedirectArticle-Categoryetc.

MENTA

Gerard de Melo

Page 71: From Linked Data to Tightly Integrated Data

MENTA

Predict Individual Taxonomic Links:Article → CategoryCategory → WordNet

Predict Individual Taxonomic Links:Article → CategoryCategory → WordNet

Page 72: From Linked Data to Tightly Integrated Data

MENTA

Gerard de Melo

Page 73: From Linked Data to Tightly Integrated Data

Taxonomic Links:MENTA

Gerard de Melo

Page 74: From Linked Data to Tightly Integrated Data

Taxonomic Links:MENTA

Use Identity ConstraintAlgorithm to form equivalence classes

Use Identity ConstraintAlgorithm to form equivalence classes

Markov Chain RandomWalk with Restartsto Rank Parents

Markov Chain RandomWalk with Restartsto Rank Parents Gerard de Melo

Page 75: From Linked Data to Tightly Integrated Data

Taxonomic Links:MENTA

Gerard de Melo

Page 76: From Linked Data to Tightly Integrated Data

UWN/MENTA

CIKM 2010CIKM 2010Best Paper AwardBest Paper AwardCIKM 2010CIKM 2010Best Paper AwardBest Paper Award Gerard de Melo

Page 77: From Linked Data to Tightly Integrated Data

MENTA: Multilingual Entity Taxonomy

UWN/MENTA (de Melo & Weikum 2010)

● multilingual extension of WordNet, with 800,000 words in 250 languages

● 4,8 million instances/classesfrom multilingual Wikipedia editions

Gerard de Melo

Page 78: From Linked Data to Tightly Integrated Data

UWN/MENTA

multilingual extension of WordNet forword senses and taxonomical information over 200 languages

Gerard de Melo

Page 79: From Linked Data to Tightly Integrated Data

Questions?Questions?

Image: Question Answering over Linked Data Workshop

Gerard de Melo

Page 80: From Linked Data to Tightly Integrated Data

The PlanThe Plan

Linked Data

Really Linked Data

Integrated Data

Tightly Integrated Data

Page 81: From Linked Data to Tightly Integrated Data

Challenge: Locked Away DataChallenge: Locked Away Data

Hard to runadvanced algorithmsover a SPARQLinterface

Many sites don'tprovide downloads.

Hard to runadvanced algorithmsover a SPARQLinterface

Many sites don'tprovide downloads.

Gerard de Melo

Page 82: From Linked Data to Tightly Integrated Data

Challenge: Lost DataChallenge: Lost Data

http://sparqles.okfn.org/

Servers offlinePoor archivingServers offlinePoor archiving

Dumps need to be archived and integrated.

Dumps need to be archived and integrated.

Gerard de Melo

Page 83: From Linked Data to Tightly Integrated Data

Challenge: UpdatesChallenge: Updates

Need to be able toupdate when data changes

Need to be able toupdate when data changes

Need algorithmic solutions, not one-time process.

Need algorithmic solutions, not one-time process.

YAGO2s: Biega et al. 2013Gerard de Melo

Page 84: From Linked Data to Tightly Integrated Data

Requirement: Integration Algorithm Pipelines

Requirement: Integration Algorithm Pipelines

Gerard de Melo

Input: Various Data

Input: Various Data

Output:

Tightly IntegratedData

Output:

Tightly IntegratedData

Page 85: From Linked Data to Tightly Integrated Data

Lexvo.orgLexvo.org

Semantic WebSemantic WebJournal 2014Journal 2014Semantic WebSemantic WebJournal 2014Journal 2014 Gerard de Melo

Page 86: From Linked Data to Tightly Integrated Data

Lexvo.orgLexvo.org

Gerard de Melo

Page 87: From Linked Data to Tightly Integrated Data

Lexvo.orgLexvo.org

Page 88: From Linked Data to Tightly Integrated Data

Lexvo.orgLexvo.org

Page 89: From Linked Data to Tightly Integrated Data

Lexvo.orgLexvo.org

Semantic WebSemantic WebJournal 2014Journal 2014Semantic WebSemantic WebJournal 2014Journal 2014 Gerard de Melo

Page 90: From Linked Data to Tightly Integrated Data

Most large-scale knowledge bases have ground facts only

But language is much more expressive

Knowledge GraphsKnowledge Graphs

bornIn(Einstein,Ulm)acquired(Microsoft,Powerset)bornIn(Einstein,Ulm)acquired(Microsoft,Powerset)

● All humans are mortal.● At least three but not more than 10 people

know this secret.● Three years ago, most people believed that

Microsoft would buy Yahoo within months.

● All humans are mortal.● At least three but not more than 10 people

know this secret.● Three years ago, most people believed that

Microsoft would buy Yahoo within months.

Gerard de Melo

Page 91: From Linked Data to Tightly Integrated Data

Challenge: TimeChallenge: Time

Temporal scope missingTemporal scope missing

Source: Gerhard Weikum. For a few Triples more.

Gerard de Melo

Page 92: From Linked Data to Tightly Integrated Data

OWL, RDFS, Description LogicsOWL, RDFS, Description Logics

WebProtégéhttp://protege.stanford.edu/

Limit expressivityto get decidability.

Focus on classhierarchies

and propertyaxioms.

Limit expressivityto get decidability.

Focus on classhierarchies

and propertyaxioms.

Cannot create new rulese.g. to model

“grandparent”, “uncle”,“legal adult”!

Cannot create new rulese.g. to model

“grandparent”, “uncle”,“legal adult”!

Gerard de Melo

Page 93: From Linked Data to Tightly Integrated Data

ReasoningReasoning

Humans cannot act before being born(or, actually, before being conceived)

(=>(and

(human ?HUMAN)(birthdate ?HUMAN ?T)(agent ?PROCESS ?HUMAN))

(beforeOrEqual(daysBefore (BeginFn ?T) 365)(BeginFn (WhenFn ?PROCESS))))

Humans cannot act before being born(or, actually, before being conceived)

(=>(and

(human ?HUMAN)(birthdate ?HUMAN ?T)(agent ?PROCESS ?HUMAN))

(beforeOrEqual(daysBefore (BeginFn ?T) 365)(BeginFn (WhenFn ?PROCESS))))

Page 94: From Linked Data to Tightly Integrated Data

Reasoning: SPASS-XDBReasoning: SPASS-XDB

Gerard de Melo

Page 95: From Linked Data to Tightly Integrated Data

Search Interfaces

“Which companies were created during the last century in Silicon Valley ?”

YAGO2:WWW 2011

Best Demo Award

YAGO2:WWW 2011

Best Demo Award

Gerard de Melo

Page 96: From Linked Data to Tightly Integrated Data

Common-Sense Inference

Gerard de Melo

I found the following restaurant near your current location:

La Dolce Vita Pizza. 2318 Columbus Ave.

I'd rather have somethinghealthier

Tandon et al.AAAI 2014

Tandon et al.AAAI 2014

Page 97: From Linked Data to Tightly Integrated Data

Conclusion

Really Linked Data► Shared Identifiers► Proper Interlinking

Integrated Data► Taxonomical Integration

Tightly Integrated Data► Processing Pipelines► Towards Common-SenseInference

Gerard de Melo

[email protected]@demelo.org