ESWC SS 2012 - Tuesday Tutorial Dan Brickley and Denny Vrandecic: Linked Open Data

Post on 03-Jul-2015

148 views 0 download

Transcript of ESWC SS 2012 - Tuesday Tutorial Dan Brickley and Denny Vrandecic: Linked Open Data

Linked Open Data Dan Brickley, Google Denny Vrandečić, Wikimedia

Session Linked data, Tuesday, 9:45-11:15

2

Agenda !   Notation !   Linked Open Data principles !   Applied LOD principles !   Application: schema.org !   Application: Wikidata !   Open questions !   Hands-On Intro: On links !   Hands-on: Exploration !   Hands-on: SPARQL !   Hands-on: Spark

22/05/2012

3

dbpedia:Kalamaki

Notation !   URIs here generally abbreviated with CURIEs (e.g. http://dbpedia.org/resource/Kalamaki = dbpedia:Kalamaki) !   Entities and literals are labeled rectangles !   Blank nodes are circles !   Triples are arrows labeled with property connecting subject and object

22/05/2012

dbpedia:Kalamaki fb:likes

4

LOD PRINCIPLES Background

22/05/2012

5

Linked Open Data principles

1.  Use URIs as names for things

2.  Use HTTP URIs so that they can be looked up

3.  Provide results in standard formats (e.g. RDF, SPARQL)

4.  Link to other URIs

22/05/2012

6

WHY SEMANTIC COMPUTING? LOD Application

7

8

1 2 10 100 10 D

Field 1: Tag !   0 = Face up !   1 = Face down Field 2: Suit !   1 = Clubs !   2 = Diamonds !   3 = Hearts !   4 = Spades

Field 3: Rank !   1 = Ace !   2..10 = 2..10 !   11 = Jack !   12 = Queen !   13 = King Field 4: Address next card Field 5: “Human-readable”

Example from Donald Knuth, The Art of Computer Programming, Chapter 1

9

1 2 10 100 10 D

10

1 2 10 100 10 D

card

s:ne

xt

11

1 2 10 100 10 D

cards:d10

card

s:ne

xt

cards:card

12

http://example.org/cards/d10!!   Oh, an unknown term !   It is an HTTP URI!

GET /cards/d10 HTTP/1.1!HOST www.example.org!Accept: text/rdf+n3, application/rdf+xml!

!!

HTTP/1.1 200 OK!Content-type: text/n3; charset-UTF-8!

!

cards:d10 rdf:type cards:Card ;! rdfs:label “10 of diamonds”@en ;! cards:suit cards:diamonds ;! cards:rank cards:rank-10 .!

22/05/2012

13

1 2 10 100 10 D

cards:d10

cards:diamonds

cards:rank-10

card

s:ne

xt

cards:card cards:rank

14

cards:card

1 2 10 100 10 D

cards:d10

cards:diamonds

cards:rank-10

“10 of Diamonds”@en

card

s:ne

xt

cards:rank

rdfs:label

15

cards:card

1 2 10 100 10 D

cards:d10

cards:facedown cards:diamonds

cards:rank-10

“10 of Diamonds”@en

card

s:ne

xt

cards:rank

rdfs:label

16

cards:card cards:d10

cards:facedown cards:diamonds

cards:rank-10

“10 of Diamonds”@en

“10”^xsd:int

color:red

“Karo 10”@de card

s:ne

xt

cards:rank

rdfs:label

17

cards:d10

cards:facedown cards:diamonds

cards:rank-10

“10 of Diamonds”@en

“10”^xsd:int

color:red

“Karo 10”@de card

s:ne

xt

cards:card cards:rank

rdfs:label

cards:suit ○ cards:color ⊑ cards:cardcolor

18

Programming function color(card) { if ((card[2] == 1) or (card[2] == 4)) { return 1; } else { return 2; } }

function color(card) { if ((card.suite == cards.clubs) or (card.suite == cards.spades)){ return cards.black; } else { return cards.red; } }

function color(card) { return 2 – int((card[2] == 1) or (card[2] == 4)); }

cards:cardcolor select ?color where { card cards:cardcolor ?color }

Classic Symbolic constants

Wannabe Hacker Semantic

Where is the knowledge? How do I edit it?

19

20

cards:d10

cards:facedown cards:diamonds

cards:rank-10

“10 of Diamonds”@en

“10”^xsd:int

color:red

“Karo 10”@de card

s:ne

xt

cards:card cards:rank

rdfs:label

color:yellow

cards:suit ○ skat:color ⊑ cards:cardcolor

21

22

cards:d10

cards:facedown cards:diamonds

cards:rank-10

“10 of Diamonds”@en

“10”^xsd:int

color:red

“Karo 10”@de

card

s:ne

xt

cards:card cards:rank

rdfs:label

color:yellow color:purple

poker:color

cards:suit ○ poker:color ⊑ cards:cardcolor cards:cardcolor

23

BUT THAT ARE KNOWLEDGE-BASED SYSTEMS AS DONE FOR DECADES!

24

CHRIS WELTY, IBM

“In the Semantic Web, it is not the ‘Semantic’ which is new, it is the ‘Web’ which is new.”

25

cards:d10

cards:diamonds

color:red

cards:card

color:yellow color:purple

poker:color

aifb:Elena

fb:li

ke

26

Elena

AIFB

Purple

Tatort

Diamond

10-Diamond Queen-Diamond

Queen

King

KIT

Culture

University

Karlsruhe

Education China

Ceylon

India

Airline

Asia

Hotel Restaurant Enterprise

Airport Advertisment

Animal Vegeterian restaurant

Cosmos

TV Show

Inchineon Mumbay Airport

Mumbay

Human

Carbon

Diamond

Lao Tse Religion

Philosophy

Semantic Web

27

Semantic Web

22/05/2012

2007

28

Semantic Web

22/05/2012

2008

29 22/05/2012

2009

30 22/05/2012

2010

31 22/05/2012

2011

32

SCHEMA.ORG Applications

22/05/2012

33

Schema.org A quick look.

34

35

36

37

Yandex

38

event

place

intangible LocalBusiness

Organization

CivicStructure

CreativeWork

Landform

UserInteraction

39

For example?

40

41

<div itemscope itemtype="http://schema.org/VideoObject">!  <h2>Video: <span itemprop="name">My Title</span></h2>!  <meta itemprop="duration" content="T1M33S" />!  <meta itemprop="thumbnailUrl" content="thumbnail.jpg" />!  <meta itemprop="embedUrl"!    content="http://example.com/videoplayer.swf?video=123" />!  <object ...>!    <embed type="application/x-shockwave-flash" ...>!  </object>!  <span itemprop="description">Video description</span>!</div>!

Type: http://schema.org/VideoObject name = My Title duration = T1M33S thumbnailurl = thumbnail.jpg embedurl = http://www.example.com/videoplayer.swf?video=123 description = Video description

42

(this is almost all you need to know about RDF, incidentally)

43

WIKIDATA Applications

22/05/2012

44

45

Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Complete list

Berlin edit | x

Continent Europe [3 sources]

Country Germany [2 sources]

Population 3,499,879 As of November 30 2011 Method Extrapolation

[1 source]

3,500,000 As of 2012 Method Estimate

[2 sources]

[further values]

Phone prefix 030 since June 1973

[2 sources]

0311 before June 1973

[1 source]

Mayor Klaus W| [no source]

Registration license B [1 source]

Area 891,85 km” [2 sources]

Twin city Los Angeles [no sources]

[new statement]

edit

edit

Klaus Wowereit German politician Klaus Wunderlich German musician Klaus Wagner Stalker of the British royal family Klaus Wagner German mathematician Klaus Waldeck Austrian musician and lawyer

Capital of Germany Also known as: City of Berlin

46

Hauptseite Inhalt API Zufällige Seite Spende an Wikidata Interaktion Hilfe Über Wikidata Benutzerportal Letze Änderungen Sprachen Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Vollständige Liste

Berlin edit | x

Kontinent Europa [3 Quellen]

Land Deutschland [2 Quellen]

Einwohner 3.499.879 Stand 30. November 2011 Methode Fortschreibung

[1 Quelle]

3.500.000 Stand 2012 Methode Schätzung

[2 Quellen]

[weitere Werte]

Telefonvorwahl 030 Seit Juni 1973

[2 Quellen]

0311 Vor Juni 1973

[1 Quelle]

Bürgermeister Klaus W| [keine Quellen]

Amtliches Kennzeichen B [1 Quelle]

Fläche 891,85 km” [2 Quellen]

Partnerstadt Los Angeles [keine Quellen]

[neue Aussage]

edit

edit

Klaus Wowereit Deutscher Politiker Klaus Wunderlich Deutscher Musiker Klaus Wagner Stalker der Britischen Königsfamilie Klaus Wagner Deutscher Mathematiker Klaus Waldeck Österreichischer Musiker und Anwalt

Hauptstadt von Deutschland Auch bekannt als: Stadt Berlin

47

Application: Infoboxes !  Now: every article calls an

infobox with local values

!  In Wikidata: one page with values

! Wikipedias fill infoboxes with Wikidata values

48

49

OPEN QUESTIONS Or: A few dozen possible paper, project and thesis topics

50

UNFINISHED WORK Open questions

51

52

Unfinished work !   What does a unifying logic look like? !   How do we export proofs? !   How do we validate proofs? !   How do we express trust? !   How does the crypto stack really work? !   What are usable interfaces to the Semantic Web? !   How are Semantic Web applications created?

53

IDENTITY AND REPRESENTATION

Open questions

54

http://simpsons.com/id/Bart

http://rdf.freebase.com/id/en.bart_simpson

http://en.wikipedia.org/wiki/Bart_Simpson http://dbpedia.org/resource/Bart_Simpson

http://en.wikipedia.org/wiki/Bart_Simpson

Bart

4030

Bart Simpson

(Character ID on ComicbookDB)

55

Identity and representation !   Is there anything out there? !   How to find the right identifier? !   How to know what an identifier identifies? !   What about the multitude of identifiers? !   How do we know that two identifiers identify the same entity? !   How do we know that two identifiers identify different entities? !   Without this, can we still usefully apply statistical techniques? !   What about creating new identifiers? !   What if identifiers are ambiguous? !   How to find representations for entities fitting my UI? !   How to choose a representation?

56

TRUST AND DIVERSITY Open questions

57

Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Complete list

Berlin edit | x

Continent Europe [3 sources]

Country Germany [2 sources]

Population 3,499,879 As of November 30 2011 Method Extrapolation

[1 source]

3,500,000 As of 2012 Method Estimate

[2 sources]

[further values]

Phone prefix 030 since June 1973

[2 sources]

0311 before June 1973

[1 source]

Mayor Klaus Wowereit [no source]

Registration license B [1 source]

Area 891,85 km” [2 sources]

Twin city Los Angeles [no sources]

[new statement]

edit

edit

Capital of Germany Also known as: City of Berlin

58

A statement in Wikidata

Population 3,499,879 As of November 30 2011 Method Extrapolation

[2 sources]

3,500,000 As of 2012 Method Estimate

[1 source]

Berlin

59

A statement in Wikidata

Population 3,499,879 As of November 30 2011 Method Extrapolation

[2 sources]

3,500,000 [1 source]

Berlin

Berlin 3499879 population

Statement1

item property

value

3500000 population

2011-11-30 Extrapolation

as of method

60

A statement in Wikidata

Population 8,000 As of 15th century Method Estimate

[2 sources]

3,500,000 [1 source]

Berlin

Berlin 8000 population

Statement1

item property

value

3500000 population

15th century Estimate

as of method

Statement2

property value

61

A statement in Wikidata

Population 3,499,879 As of November 30 2011 Method Extrapolation

[2 sources]

3,500,000 [1 source]

Berlin

Berlin 3499879 population

Statement1 Source1

item property

value

reference

3500000 population

Statement2

property value

Source2

2011-11-30 Extrapolation

as of method

Source3

reference

62

Trust and diversity !   How to express provenance information? !   How to store provenance of data? !   Can provenance information be expressed such that the data is still

easily accessible? !   How to query data with provenance information? !   How to deal with genuinely diverse data? !   How to match diverse vocabularies? !   How to deal with noisy data? !   Is reification really necessary? !   Do named graphs provide solutions? !   Use one graph per statement?

63

UNITS AND ACCURACY Open questions

22/05/2012

64

Units and accuracy !   How to express “17th century” next to literal dates? !   How to express heterogeneous accuracies? !   Is a functional value of 40,000km really inconsistent with 39,987km? !   How to express confidence values? !   How to express units? !   Is 176cm equal to 5ft9? 177cm too? Is equality transitive? !   How to express ranges (e.g. property “active” for bands)?

22/05/2012

65

SERIALIZATIONS Open questions

66

http://simpsons.com/id/Bart

http://simpsons.com/id/Marge

http://family.org/id/parent

http://simpsons.com/id/Lisa

Bart

http://www.w3.org/2000/01/rdf-schema#label

Marge parent

Lisa

Child

sibling

Adult

http://family.org/id/sibling

http://family.org/id/Child

http://family.org/id/Adult

http://www.w3.org/1999/02/22/rdf-syntax-ns#type

67

<?xml version=“1.0” encoding=“UTF-8”?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=“http://www.w3.org/2000/01/rdf-schema#” xmlns:family=“http://family.org/id/”> <rdf:Description rdf:about=“http://simpsons.com/id/Marge”> <rdf:type rdf:resource=“http://family.org/id/Adult”/> <rdfs:label>Marge</rdfs:label> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Bart”> <rdfs:label>Bart</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> <family:sibling rdf:resource=“http://simpsons.com/id/Lisa”/> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Lisa”> <rdfs:label>Lisa</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> </rdf:Description> </rdf:RDF>

68

@prefix rdf ‘http://www.w3.org/1999/02/22-rdf-syntax-ns#’ @prefix rdfs ‘http://www.w3.org/2000/01/rdf-schema#’ @prefix family ‘http://family.org/id/’ @prefix simpsons ‘http://simpsons.com/id/’ simpsons:Marge rdf:type family:Adult ; rdfs:label ‘Marge’ . simpsons:Bart rdf:type family:Child ; rdfs:label ‘Bart’ ; family:parent simpsons:Marge ; family:sibling simpsons:Lisa . simpsons:Lisa rdf:type family:Child ; rdfs:label ‘Lisa’ ; family:parent simpsons:Marge .

{ “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” }

69

Child(Bart). sibling(Bart, Lisa). parent(Bart, Marge).

Bart is a son of [[parent::Marge]] and the brother of [[sibling::Lisa]].

0 HEAD 1 FILE simpsons 1 GEDC 2 VERS 5.5 0 @I1@ INDI 1 NAME Marge /Bouvier/ 2 SURN Simpson 1 SEX F 1 FAMS @F1@ 0 @I2@ INDI 1 NAME Bart /Simpson/ 1 SEX M 1 FAMS @F1@ 0 @I3@ INDI 1 NAME Lisa /Simpson/ 1 SEX F 1 FAMS @F1@ 0 @F1@ FAM 1 WIFE @I1@ 1 CHIL @I2@ 1 CHIL @I3@ 0 TRLR

{ “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” }

70

Serializations !   Do all tools need to understand all serializations? !   Are all serializations lossless? !   How to ensure they are up-to-date? !   What about current tools that don’t understand anything? !   Is the data sufficiently complete? !   How to seamlessly ground and lift data to RDF?

71

ONTOLOGIES Open questions

72

Ontologies

!   “An ontology is a formal specification of a shared conceptualization” !   Defines concepts and their formal relations to each other !   You can understand a concept without having a word for it !   Axiom not possible in OWL L, can only be approximated

parent ○ brother = uncle

Bart

Marge Selma Sideshow Bob ⚭Homer ⚭Herb

parent ○ sister ○ husband V ⊑ sibling

73

Ontologies

!   “An ontology is a formal specification of a shared conceptualization”

!   Strict taxonomies !   Bart a FictionalPerson

! owl:sameAs !   GDR sameAs Germany

!   Classes as individuals !   Eagle a EndangeredSpecies

! rdfs:domain and rdfs:range ! family:child rdfs:range foaf:Person

!   “Unauthorized” extensions ! foaf:favouriteMovie

74

Ontologies !   How to achieve and measure sharedness? !   Who defines the semantics of a term? !   How to achieve correctness? !   Does sharedness mean correctness? !   How to overcome limitations on expressivity? !   How to deal with wishes for more expressivity? !   How to deal with undecidability? !   What does inconsistency mean? !   How to deal with brittleness?

75

PRIVACY Open questions

76 76

77 77

78

Privacy !   How to ensure privacy? !   What does privacy mean? !   How to publish linked data that is not open? !   What about the ethics of combining data?

79

SCALABILITY Open questions

80

Web Data Commons !   Extracts data from Common Crawl (5b pages, 20 TB compressed) !   65,408,946 domains with triples !   1,222,563,749 typed entities !   3,294,248,653 triples ! www.webdatacommons.org

22/05/2012

81

Scalability !   How to efficiently use Semantic Web data? !   How to select the appropriate set? !   How to cache it? !   How to deal with frequent updates? !   How to deal with SPARQL endpoints vs RDF? !   How to do federated queries? !   Who pays for it and when?

82

QUESTIONS?

83

WHAT ABOUT THE LINKS? Introduction to Hands-On

22/05/2012

84

What are the links in "linked data"?

Are they links between things?

Are they links between documents?

How exactly do the "Web hyperlinks" we know and love relate to the factual "typed links" of data modeling?

85

Links and Links !   These questions motivate and drive the Linked Data project, and

have been with the Web from the start. !   They explain our most boring debates ("http-range-14"). !   And show how 'Semantic Web' is a project to improve the

mainstream Web itself.

86

87

In the beginning...

(1989, 1994, ...)

88

89

90

91

92

93

94

95

96

What's in a (hyper)link?

!   Does a node in the graph stand for 'Stephen Fry'-the-Person? or 'a page about Stephen Fry'?

!   What about when there are multiple pages about the same person? in different voices? sometimes disagreeing?

!   RDF thinks in triples, but data management is often in quads: asking who-said-what in SPARQL

97

1989 again

One flat graph? What if we disagree?

98

A Graph of Graphs?

!   Classic WWW hypertext is a top-level document graph. !   Those documents make claims about the world; factual

graphs, e.g. schema.org, RDFa. !   SPARQL let's us store and query all this. !   Each Web 'node' may give us its own 'nodes and links'

description, including links.

99

100

IMDB

BBC

stephenfry.com

Freebase

sameas.org

dbpedia.org

NewYorkTimes RottenTomatoes

VIAF

101

We can emphasize the landscape of sites/datasets...

(No single 'correct' view)

102

(No single 'correct' view)

We can emphasize the landscape of sites/datasets...

103

Or we can zoom in, and see how records can be merged / flattened into a single set of triples...

104

Summary

!   Linked datasets, pages, real world things...

!   ... all of these are represented in RDF datasets.

!   To query this hands on, we can use SPARQL to ask questions, and 'named graphs' to organize factual claims into groups.

105

EXPLORATION Hands-on

106

Hands-on !   You will explore datasets with SPARQL about Stephen Fry

!   SPARQL yourself and your colleagues

!   Spark: SPARQL on the Web

107

Thinking about data !   We made a data/ folder for you !   Real public RDF data about a real person !   Sources: DBpedia, Freebase, VIAF, sameas.org, New York Times,

Identi.ca, BBC, Rotten Tomatoes, IMDB and us. !   I’ll briefly introduce the data now, then see info/data-and-queries-

intro.txt

http://192.168.0.20:8080/openrdf-workbench/repositories/Tuesday

108

What to do

!   “Get your hands dirty” with real Linked Data !   If you hit a problem, make a note of it - & ask! !   Most files have RDF describing Stephen Fry; he is real and

human, please bear that in mind. !   Study the shape and patterns of the data, ask yourself

questions, using SPARQL to explore.

109

Questions

!   What RDF schemas/ontologies do you see? !   How are people and other things identified? !   Are there common patterns across sources? !   Can you write queries that integrate these? !   What bugs in the data are there? How do you think they got

there?

110

Internet Detectives !   for each triple, can you figure out “how it got there”? in whose voice

is it? !   is there a real schema? (if the Wifi is up) !   how would you check its truth? who “said” it and how could a

machine tell? !   which sources (or parts) aggregate different points of view within a

single RDF graph?

111

data-and-queries-intro.txt !   See the info/ folder for more details - SPARQL setup and some

querying tutorial. !   Goal is to study the Linked Data Web and understand how it might

evolve. !   Identify project and research topics, and ways of helping to improve

the Web.

112

SPARQL YOURSELF Hands-on

113

SPARQL yourself

http://192.168.0.20:8080/openrdf-workbench/repositories/Students/query

http://192.168.0.20:8080/openrdf-sesame/repositories/Students

SPARQL endpoint

SPARQL Web Form

114

SPARK Hands-on

115

Spark

116

Spark visualizations

117

Spark visualizations

118

Exercise

119

Exercise

120

Semantic MediaWiki

121

Semantic MediaWiki - Export

122

Task

!   Let’s add semanticweb.org as an additional source in order to add Dan from there to the lists of the “Friends of Spark”.

!   Expand spark.zip, then check test/index.html

123 22/05/2012