Linking Open Data

37
Linking Open Data Linking the world of data from LOD mailinglist Acknowledgement for Tom Heath (Talis) Ying Ding ([email protected] ) http://info.slis.indiana.edu/~di ngying/

description

Linking Open Data. Linking the world of data from LOD mailinglist Acknowledgement for Tom Heath (Talis). Ying Ding ( [email protected] ) http://info.slis.indiana.edu/~dingying/. What is now. User generated content is growing tremendously Isolated contents need deadly to get connected. - PowerPoint PPT Presentation

Transcript of Linking Open Data

Page 1: Linking Open Data

Linking Open Data

Linking the world of data

from LOD mailinglist

Acknowledgement for Tom Heath (Talis)

Ying Ding ([email protected])

http://info.slis.indiana.edu/~dingying/

Page 2: Linking Open Data

What is now

User generated content is growing tremendously

Isolated contents need deadly to get connected.

The world is connected, so do the data, information and knowledge

Page 3: Linking Open Data

Old terms

Data -- sensing the worldWhat you sense (see, hear, smell, touch…)

Information – perceiving the worldPerceive the sensed data

Knowledge – contextualizing informationComprehend the perceived informationAdd context

Context ultimately determines what’s actually what.

Page 4: Linking Open Data

What is our daily life

Access dataManipulate data (add, delete, change)Process data

Generate information (tables, forms)Create knowledge (reports, papers..)

Page 5: Linking Open Data

Data is our life

Data is our daily breadDo we have identifier for data?

Not really important if data is small and individualReally important if data is huge and connected

? Should we need identifier for our data? Why do we need our name, or social security number

? Can you refer to someone without identifier?a person with good heart----

Page 6: Linking Open Data

Make our busy life less messy

We just got 24 hours per day, not moreAdd identifier to our data

Give the everyone-agreed-unique-identifier to each data -- the perfect world of our dreamlandWe will not have any integration problem, most of the IT

departments can be closedDifferent groups give different identifiers to the same

data – we can live with that, it is more real in our daily life, standardization bodies and IT guys are helping us.

We are happy that we can refer to data

Page 7: Linking Open Data

Where are our data

In computerOn the Web

In my paper notesIn printed books…

Data are being digitalized and are available onlineWeb Data

Page 8: Linking Open Data

Web data

Data on the Web Online journal Blog Wiki …

Data in physical world Yourself Table Book in library Computer you are using …

The boundary is blurring Paper is both in your hand and on the Web

Page 9: Linking Open Data

How to refer data

Web dataDOI (Digital Object Identifier)OpenID (people, …) URI (blog, wiki, homepage, …)…

Page 10: Linking Open Data

URI (Uniform Resource Identifier)

To identify or name a resource on the InternetThe main purpose is to enable interaction with

representations of the resource over a network, typically WWW, using specific protocols –from WikipediaURN – like a person’s name

urn:isbn:0-486-27557-4 – Book of “Romeo and Juliet”URL – like a street address

http://www.slis.indiana.edu

Page 11: Linking Open Data

Linked Data

A term coined by Tim Berners-LeeIt describes HTTP-based Data Access by

Reference for the WebCurrent web is changing from hypertext links

(link documents) to hyperdata links (linking data)Data are small components of the resources It drills deep to the details of the resources

Linked data provides a powerful mechanism for meshing disparate and heterogeneous data

Page 12: Linking Open Data

Vision from Sir Berners-Lee

“The Semantic Web isn’t just about putting data on the web. It is about making links”.

Four Rules for linking data Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information

(URI dereferencing) Include links to other URIs, so that they can discover more

things “Breaking them does not destroy anything, but misses an

opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web”

Page 13: Linking Open Data

W3C SWEO Linking Open Data Project

Project aims toPublish existing open license datasets as linked

data on the webInterlink things between different data sourcesDevelop clients and applications that consume

linked data from the web

Page 14: Linking Open Data

Bubbles in May 2007

Over 500M RDF triples

Around 120K RDF links between data sources

Page 15: Linking Open Data

Bubbles in April 2008

>2B RDF triples

Around 3M RDF links

Page 16: Linking Open Data

Bubble now

Page 17: Linking Open Data

Organization participating in the LOD community

AcademicMIT, Univ Southampton, DERI, Open Univ,

Univ London, Univ Hannover, Penn State Univ, Univ Leipzig, Univ Karlsruhe, Joanneum (AT), Free Univ Berlin, Cyc, SouthEast Univ (CN), …

CommercialBBC, OpenLink, Talis, Zitgist, Garlik, Mondeca,

Renault, Boad Interactive

Page 18: Linking Open Data

What are Linked Data?

Linked Data require RDFWhy not XML?

Different model theory

But not all RDF data are linked dataYou have to compliant your RDF data

according to the four rules mentioned by Berners-Lee

What is RDF?

Page 19: Linking Open Data

Basic Ideas behind RDF

RDF uses Web identifiers (URIs) to identify resources

RDF describes resources with properties and property valuesEverything can be represented as triples

The essence of RDF is the (s,p,o) triple

Resource(subject)

Value(object)

Property

(predicate)

Subject has a property with value “object ” (s,p,o)

Page 20: Linking Open Data

RDF Triples

Triple A Resource (Subject) is anything that can have a URI: URIs

or blank nodes A Property (Predicate) is one of the features of the Resource:

URIs A Property value (Object) is the value of a Property, which

can be literal or another resource: URIs, literal, blank nodes

Resource(subject)

Value(object)

Property

(predicate)

Literals can be the object of an RDF statement, but cannot be the subject or the predicate

Page 21: Linking Open Data

Do you have linked data

Linked data are just RDF triples

How can I get RDF triplesRelational database:

D2R tools can convert them for youRDFizers from SIMILE:

Can convert JPEG, MARC/MODS, OAI-PMH, OCW(MIT Open Course), Email, BibTex, Java, Javadoc, etc. to RDF

<rdf:Description about=“http://example.org/smith#albert”> <fam:hasChild rdf:Resource="http://example.org/smith#brian">  <fam:hasChild rdf:Resource="http://example.org/smith#carol"></rdf:Description>

Page 22: Linking Open Data

Thumb of the rules

Understand your dataWhat do you want to have in your dataDo not reinvent – REUSE!

Potential ontologies/vocabularies• FOAF, SIOC, Geo

URI AliasesDifferent URIs for the same non-information

resource (Berlin, etc.)owl:sameAs to link these URI aliases

Page 23: Linking Open Data

More principles

Linked Data is simply about using the Web to create typed links between data from different sources.

The principle of Linked data is to:Use the RDF data model to publish structured

data on the webUse RDF links to interlink data from different

data sources.Use HTTP URIs to identify resource

To avoid other URI schemes (URNs or DOIs)

Page 24: Linking Open Data

Power of Linked Data

ying foaf:Person

rdf:type

Ying Ding

foaf:name

Stefanfoaf:knows

db:Galway

72K

dp:population

dp:Cities_in_Ireland

skos:subjectdp:Dublin

foaf:based_near

skos:subject

dblp:publications

foaf:publication

Page 25: Linking Open Data

How to become a bubble

Publishing your bubbleAre you ready?

Dereferencing HTTP URIsInformation resources (resources available on the

web):• HTTP GET HTTP response code 200 OK

Non-information resources (real-word objects that exist outside of the web):

• HTTP GET HTTP 303 See Other (303 redirect)

You are not your homepage, but you can be dereferenced by your homepage

Page 26: Linking Open Data

Publish your bubble

Step 1: Choosing URIsUse HTTP URIs for everything (http://)Make it dereferenable

Try to use the existing dereferencable URIs to represent common things (city, music, artist, etc.): http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies

For instance: Geonames, DBpedia, Musicbrainz, dbtune, RDF Book Mashup

Keep implementation info out of your URIsKeep your URIs stable and persistent

Page 27: Linking Open Data

Publish your bubble

Step 1: Choosing URIs

http://dbpedia.org/resource/Berlin http://dbpedia.org/page/Berlin http://dbpedia.org/data/Berlin

http://id.dbpedia.org/Berlin http://pages.dbpedia.org/Berlin http://data.dbpedia.org/Berlin

http://dbpedia.org/Berlin http://dbpedia.org/Berlin.html http://dbpedia.org/Berlin.rdf

Reference: Sauermann et al.: Cool URIs for the Semantic Web (tutorial on URI dereferencing and content-negotiation)

Page 28: Linking Open Data

Publish your bubble

Step2: choose the vocabularies to represent information Reuse terms from well-known vocabularies wherever possible

Friend of a Friend (FOAF)Dublin Core (DC)Semantically-Interlinked Online Communities (SIOC)Description of a Project (DOAP)Simple Knowledge Organization System (SKOS)Creative Commons (CC)More:

http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies

You should only define new terms yourself if you cannot find required terms in existing vocabularies

Page 29: Linking Open Data

Publish your bubble

Step2: choose the vocabularies to represent information If you really have to define your own vocabularies:

Do not define new vocabularies from scratchProvide for both humans and machines (rdf:comments,

rdfs:label)Make term URIs dereferenceableMake use of other people’s termsState all important information explicitlyDo not create over-constrained, brittle models, leave

some flexibility for growth

Page 30: Linking Open Data

Publish your bubble

Step3: Link your bubble with other bubblesRDF links enable browsers and crawlers to

navigate between data sources and to discover additional data.foaf:knows, foaf:based_near, foaf:topic_interestowl:sameAs (map different URI aliases)

Page 31: Linking Open Data

Publish your bubble

Step3: Link your bubble with other bubblesAuto-generating RDF Links:

ISBN for books (e.g., RDF Book Mashup)

More complex property-based algorithmsInterlinking DBpedia and GeonamesInterlinking Jamendo and MusicBrainz

<http://dbpedia.org/resource/Harry_Potter_and_the_Half-Blood_Prince> owl:sameAs <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088>

Page 32: Linking Open Data

Publish your bubble

Recipes for publishing different information as Linked Data on the WebThings must be identified with dereferenceable HTTP

URIs If such a URI is dereferenced asking for the MIME-type

application/rdf+xml, a data source must return an RDF/XML description of the identified resource

URIs that identify non-information resources should return HTTP 303 redirect

Besides RDF links to resources within the same data source, RDF descriptions should also contain other RDF links to link to other resources, so that you can browse the web of data.

Page 33: Linking Open Data

Test your bubble

Step4: test and debug linked dataVapour linked validation service: a linked data

validator (http://vapour.sourceforge.net/)Use Linked browsers to see whether your

information display correctly and your RDF links workTabulator, Marbles, OpenLink RDF Browser, Disco

Page 34: Linking Open Data

Welcome to the bubble world

Very excited!Then what is my contribution and benefit?

Add more data to RDF dataIncrease semantic content……Bring Web to its full potential!

Page 35: Linking Open Data

What LOD can bring?

It will lift current document web up to a data webLOD browsers can let you navigate between

different data sources by following RDF links.It can drill down to the lower granularity of the

informationallowing you for more fine search on the webmaking the question-answer search on the Web

possiblemeshing up different data through RDF linksMaking the built-on-top application easier

Page 36: Linking Open Data

Document Web vs. Data Web

Document Web Glued by hyperlinks Data are HTML pages Query result is HTML

pages, which can not be further processed

Data are just interlinked, but not integrated

Data access through different APIs

Data Web Glued by RDF links Data are RDF triples Query result is RDF

triples which can be easily further processed (e.g., web services)

Data are interlinked and integrated, and links are typed

Data access through a single and standardized access mechanism (maybe it will called in the future LOD API?)

Page 37: Linking Open Data

More about LOD

LOD Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/

LinkingOpenData Tutorial on how to publish LOD data

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/Further readings and tools

W3C Track LOD WWW2008 http://www.w3.org/2008/Talks/WWW2008-W3CTrack-LOD.pdf

Linked Data Planet in New York 2008 http://linkeddata.org/slides/2008-06-nyc-ldp.pdf

LDOW2008 workshop in WWW2008 http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-

369/ ISWC 2008 LOD tutorial

http://events.linkeddata.org/iswc2008tutorial LOD mailinglist