Linked Data Tutorial

65
Faculty of Mathematics and Physics Charles University in Prague Linked Data Tutorial* Tomáš Knap , Jindřich Mynarz, Martin Nečaský, Jakub Stárka February 16, 2012 rtially based on slides of Chris Bizer [8])

description

Linked Data Tutorial presented for the open data community on February 16., 2012, Prague.

Transcript of Linked Data Tutorial

Page 1: Linked Data Tutorial

Faculty of Mathematics and PhysicsCharles University in Prague

Linked Data Tutorial*Tomáš Knap, Jindřich Mynarz, Martin Nečaský, Jakub Stárka

February 16, 2012

(*Partially based on slides of Chris Bizer [8])

Page 2: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 2

Motivation

Page 3: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 3

Motivational ScenarioBasic data Employees Departments Public

contracts Budget Expenses

WWW page of the institution

Business Register ÚFIS Buyer‘s

Profile ISVZUS gov.cz

Data Consumer: Show me suppliers of the public contracts for the Ministry of Finance (MF) in the region Liberec. Show me the data on the Google maps in iPhone. For every public contract, I am also looking for the aggregation of all the payments made by MF, link to their budget and responsible person.

• Where can I get the data about public contracts, responsible persons, expenses, and budget of MF?

• How should I aggregate and link the data? • How can I observe the data on the map?

Page 4: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 4

Current Common Practise

1 – MF public contracts

2 – MF public contracts + employees

3 - Expenses

Consumer did not discovered

?

?

?

Information Integration very time consuming, boring, and ineffective!

Basic data Employees Departments Public

contracts Budget Expenses

WWW page of the institution

Business Register ÚFIS Buyer‘s

Profile ISVZUS gov.cz

Page 5: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 5

Linked Data - Basics

Page 6: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 6

Linked Data

• Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web using Semantic Web technologies and standards Semantic Web is the goal, Linked Data provides

the means to reach the goal

Page 7: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 7

Linked Data Principles

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those

names.3. When someone looks up a URI, provide useful RDF

information4. Include RDF statements that link to other URIs so

that they can discover related things.

[Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006]

Page 8: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 8

Architecture of the Classic Web Single global information space Small set of simple standards:

‒ HTTP URI • globally unique ID • retrieval mechanism

‒ HTML as document format‒ Hyperlinks to connect everything

Applications work on top of the complete information space

Page 9: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 9

Web 2.0 APIs and Mashups No single global dataspace Shortcomings:

‒ API have proprietary interfaces‒ No hyperlinks between data items

within different APIs‒ Mashups are based on a fixed set

of data sources

Web APIs slice the Web into Walled Gardens!

Page 10: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 10

Linked Data

• Extend the Web with a single global dataspace By using RDF to publish structured data on the Web By setting links between data items within different data

sources. Physically distributed, behaves like single dataspace

Page 11: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 11

RDF Data Model

• Flexible graph-based data model [2]

• HTTP URIs take the role of global primary keys.

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin

Page 12: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 12

Resolving URIs over the Web

• The HTTP protocol brings together identification and retrieval

Page 13: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 13

Following Links deeper into the Web

Page 14: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 14

Pubby – Linked Data Browser

http://dbpedia.org/page/Český_Krumlov

Page 15: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 15

Properties of the Web of Linked Data

• Global, distributed data space build on a simple set of standards RDF, URIs, HTTP

• Entities are connected by links creating a global data graph that spans data sources enables the discovery of new data sources

• Data-coexistence Everyone can publish data to the Web of Linked Data Everyone can express their personal view on things

Page 16: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 16

Linked Data Deployment on the Web..

Is it real?

Page 17: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 17

W3C Linking Open Data Project

• Grassroots community effort to Publish existing open license datasets as Linked

Data on the Web Interlink things between different data sources

Page 18: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 18

Linked Data Cloud 2007

Page 19: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 19

Linked Data Cloud 2009

Page 20: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 20

Linked Data Cloud 2011

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.pdfhttp://thedatahub.org/

Page 21: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 21

More Statistics

http://stats.lod2.eu/stats

Page 22: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 22

Uptake in Governmental Domain

• The EU is publishing LinkedData EuroStat

‒ http://estatwrap.ontologycentral.com/

• National efforts The Government is releasing public data

‒ http://data.gov.uk/‒ Lots of initiatives in Great Britain

Budget in Germany‒ http://bund.offenerhaushalt.de/

Open Data in Catalonia‒ http://opendata.gencat.cat/en/dades-obertes.html

Page 23: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 23

Data.gov.uk

http://data.gov.uk/organogram/cabinet-office

Page 24: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 24

Linked Data Applications

? ? ??Linked Data Browsers

Page 25: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 25

Search Engines - Sig.ma

http://sig.ma

Page 26: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 26

Mashups – Public Contracts On the Map

http://gd.projekty.ms.mff.cuni.cz:2021/new/map.html

Page 27: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 27

Mashups – Crime, Transport, Education

http://apps.seme4.com/see-uk/

Page 28: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 28

Other Applications

• Browsers: Disco Hyperdata Browser

‒ http://www4.wiwiss.fu-berlin.de/rdf_browser/ OpenLink RDF Browser

‒ http://ode.openlinksw.com/

• Search Engines Falcons

‒ http://ws.nju.edu.cn/falcons/ Watson

‒ http://watson.kmi.open.ac.uk/WatsonWUI/

• Mashups

Page 29: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 29

Linked Data Applications - SummaryLinked Data Browsers Search Engines Linked Data Mashups

Page 30: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 30

Publishing Linked Data

Page 31: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 31

Publishing Tasks – Bizer 38

• 1. Make data available as RDF via HTTP Requires ways to serialize RDF data model

• 2. Set RDF links pointing at other data sources• 3. Make your data self-descriptive

Page 32: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 32

RDF/XML

• W3C Recommendation, 2004 [2]

Page 33: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 33

Turtle Syntax

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dataModel: <http://www.w3.org/2000/10/swap/pim/contact#> .@prefix myContact: <http://www.w3.org/People/EM/contact#> .

myContact:me rdf:type dataModel:Person ;dataModel:fullName "Eric Miller".dataModel:mailbox <mailto:[email protected]>.dataModel:personalTitle "Dr.".

• W3C Team Submission, 2011, [4]

Page 34: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 34

RDFa

• A way to directly add RDF to XHTML pages Provides new attributes to handle additional

markup• W3C Recommendation, 2008 [5]• HTML is not extendable

• most RDFa parsers will recognize RDFa attributes in any version of HTML

Page 35: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 35

RDFa

• Provides new attributes to handle additional markup, reuses existing About, resource, … Href, src, …

• Used with any supported element, prefered: Span, div (in the body) a (linking element) Meta, link (in the header)

Page 36: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 36

RDFa Example

• XHTML page http://example.com/alice/posts/42

• Original XHTML codeAll content on this site is licensed under <a href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.

• XHTML + RDFaAll content on this site is licensed under <a rel=“cc:license" href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.

• RDF triples destilled from XHTML+RDFa<http://example.com/alice/posts/42> cc:license <http://cc.org/licenses/by/3.0/>.

Page 37: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 37

RDF store + Linked Data Interface

• Virtuoso + pubby

Page 38: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 38

D2R server

• A way how to publish data in relational databases as Linked Data

• Requests from the Web are rewritten into SQL queries via the mapping. on-the-fly translation eliminates the need for replicating the data into a dedicated RDF

triple store.

Page 39: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 39

Publishing Tasks

1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive

Page 40: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 40

2. Set RDF links

<http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> .

• There are tools to help you generate links Silk [6]

Page 41: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 41

Publishing Tasks

1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive

Page 42: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 42

3. Make your data self-descriptive

• Increase the usefulness of your data and ease data integration

• Aspects of self-descriptiveness 1. Reuse terms from common vocabularies 2. Enable clients to retrieve the schema 3. Publish schema mappings for proprietary terms 4. Metadata

‒ Provide provenance metadata‒ Provide licensing metadata‒ Provide data-set-level metadata using voiD

Page 43: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 43

About Vocabularies

• We have to be able to define the meaning of the subject, properties Vocabularies, e.g. Public contracts ontology

Page 44: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 44

Public Contracts Ontology

http://purl.org/procurement/public-contracts#

Page 45: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 45

RDFS

• RDFS = RDF Schema W3C recommendation

‒ http://www.w3.org/TR/rdf-schema/ Vocabulary for RDF

‒ Definition of classes• is:Student rdf:type rdfs:Class

‒ Definition of properties• is:name rdf:type rdfs:Property

‒ Domains and ranges of properties• is:name rdfs:domain is:Student• is:name rdfs:range xsd:string

Page 46: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 46

OWL

• OWL = Web Ontology Language W3C recommendation

‒ http://www.w3.org/TR/owl2-overview/ Ontologies

‒ More complex constructs• Class or property equivalences• Cardinality restrictions• …

Page 47: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 47

3. Make your data self-descriptive

• Increase the usefulness of your data and ease data integration

• Aspects of self-descriptiveness 1. Reuse terms from common vocabularies 2. Enable clients to retrieve the schema 3. Publish schema mappings for proprietary terms 4. Metadata

‒ Provide provenance metadata‒ Provide licensing metadata‒ Provide data-set-level metadata using voiD

Page 48: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 48

3.1 Reuse Terms from Common vocabularies• Common Vocabularies

Friend-of-a-Friend for describing people and their social network SIOC for describing forums and blogs SKOS for representing topic taxonomies Organization Ontology for describing the structure of organizations GoodRelations provides terms for describing products and business

entities Music Ontology for describing artists, albums, and performances Review Vocabulary provides terms for representing reviews

• Common sources of identifiers (URIs) for real world objects LinkedGeoData and Geonames locations GeneID and UniProt life science identifiers DBpedia wide range of things

Page 49: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 49

3.2 Enable Clients to retrieve the Schema

• Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.

• If we discover in data URI:<http://opendata.cz/data/p6/contract/ocz_art_5161> http://purl.org/procurement/public-contracts#awardDate "2011-11-11"^^<http://www.w3.org/2001/XMLSchema#date> ;

• We resolve the URI and get the definition:

RDFS or OWL definition

Page 50: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 50

3.3 Publish Schema Mappings

pc:Tender a owl:Class ;rdfs:subClassOf gr:Offering .

pc:AwardCriterion a owl:Class ; owl:equivalentClass loted:AwardCriteria.

• Simple Mappings: rdfs:subClassOf, rdfs:subPropertyOf owl:equivalentClass, owl:equivalentProperty

• Complex mappings – R2R [7]

Page 51: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 51

3.4 Metadata

• Licenses • Data Provenance• Dataset description

Page 52: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 52

Consuming Linked Data

Page 53: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 53

Overview

• URI -> Description Pubby

• Keyword -> Description Sig.ma

• SPARQL query language [8] SQL for RDF databases

Page 54: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 54

SPARQL Example

• Contracts of the given supplier

Page 55: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 55

SPARQL Example - Result

Page 56: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 56

Issues of the Simple Consuming Scenarios

• How to aggregate the data if the links are missing, the data models (ontologies) differs?

• How to deal with data quality? Everybody can say whatever he wants!

• Solution: We are developing an infrastructure for cleaning, linking, and aggregating Linked Data Reusing existing technologies, such as Silk

Page 57: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 57

ODCleanStore

• Cleaning the data Custom cleaners

• Linking the data Silk

• Graphical user interface

• Smart data consuming Data aggregation (due to

links, ontology mappings)

Conflict resolution Data provenance

Page 58: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 58

Motivational Scenario - RecallBasic data Employees Departments Public

contracts Budget Expenses

WWW page of the institution

Business Register ÚFIS Buyer‘s

Profile ISVZUS gov.cz

Data Consumer: Show me suppliers of the public contracts for the Ministry of Finance (MF) in the region Liberec. Show the data on the Google maps in iPhone. For every public contract, I am looking for the aggregation of all the payments made by MF, link to their budget and responsible person. • Where can I get the data about public

contracts, responsible persons, expenses, and budget of MF?

• How should I aggregate and link the data? • How can I observer the data on the map?

Page 59: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 59

Goal

ODCleanStore

Basic data Employees Departments Public

contracts Budget Expenses

WWW page of the institution

Business Register ÚFIS Buyer‘s

Profile ISVZUS gov.cz

Page 60: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 60

Conclusions

Page 61: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 61

Linked Data vs. Open Data

• Open data – 3 stars! 4th star: Single and flexible model (RDF) is missing 5th star: Links

Open data are raw data, which are freely available on the Web to:• Everyone• Anytime• For whatever purpose

Page 62: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 62

Conclusions and Take Away MessageThe Power Of Linked Data (5 star data)

• Web-scale data publishing with web-based discovery mechanisms

• Distributed annotation – make comments about observations, data series, points on the map

• Easy to reuse Huge potential when connecting to the cloud, linking the data,

the benefits are growing as the amount of data published as Linked Data is increasing

• Integration on data level• Easy to extend (new data properties as required, no need

to be planned up-front)• Easy to merge – no name clashes!

Page 63: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 63

Future Steps

• If you managed to get interesting data, try to publish them as Linked Data! We can help you with the whole lifecycle – creating, publishing,

maintenance of the data Just create RDF data, we will publish it for you Just let us know (send us the data), we can publish it Publish data in the same way, but use global identifiers

according to LD principles• When the infrastructure (ODCleanStore) is ready, you can

just send us the RDF data using web service and we will do all the other stuffs – clean, link, and provide aggregated views.

Page 64: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 64

Thank You!

Page 65: Linked Data Tutorial

16th February 2012 | Linked Data Tutorial 65

References

• Textbook: Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. http://linkeddatabook.com/

• [2] http://www.w3.org/TR/rdf-primer/• [3] http://www.w3.org/TR/REC-rdf-syntax/• [4] http://www.w3.org/TeamSubmission/turtle/• [5] http://www.w3.org/TR/rdfa-syntax/• [6] http://www4.wiwiss.fu-berlin.de/bizer/silk/• [7] http://www.w3.org/TR/rdf-sparql-query/• [8] http://

static.lod2.eu/isslod/Bizer-LOD2-IndianSummerSchool.pdf