Finding and consuming (Linked) Open Data

download Finding and consuming (Linked) Open Data

If you can't read please download the document

Transcript of Finding and consuming (Linked) Open Data

Finding and consuming (Linked) Open Data

Christophe Guret (@cgueret)

March 8, 2012

http://latc-project.eu

http://www.vu.nl

http://ehumanities.nl

The next two hours

Open DataWhat is it? Why opening data?

How to find Open Data

How to consume it

Hands-on session

Linked Data & Linked Open DataWhat is it? Relation with Open Data?

How to get Linked Data

Ways to consume it

Hands-on session

http://www.flickr.com/photos/-jvl-/4983920242

Open Data

Open Data

A piece of content or data is open if anyone is free to use, reuse, and redistribute it subject only, at most, to the requirement to attribute and share-alike.

http://opendefinition.org/

Why opening data?

Data has more value than applications

Data is more used if it's easier to use it

Credit: Dorothea Salo, http://www.slideshare.net/cavlec/rdf-rda-and-other-tlas

Open Data for Public institutions

Improve transparencyActive citizenship and data journalism

Create new opportunitiesDevelop need-focused applications almost for freeSee all AppsforX challenges (Amsterdam, Nederland, )

http://opendatachallenge.org/

Let businesses sell services around the data

Improve efficiencyHelp share data within institutions

Open Data for Researchers

Consider data as an assetLike papers, can be referenced to

Like papers, open access for increased usage

Better scienceReproducibility of experiments

Cross usage of data sets in different studies

Improve transparency (and decrease fraud?)

Data workflow

Search for the relevant data sets

Do data integration and clean up

Visualise and/or analyse the data

Re-publish integrated and curated data

Data workflow

Search for the relevant data sets

Do data integration and clean up

Visualise the data

Re-publish integrated and curated data

Three ways to search for data

Generic search engine with specific targetUse keywords or keywords + file type

Browse data archivesFocused around particular topic(s)

Explored by facets and keywords

Use data portalsYellow pages for data archives, faceted search

Hubs for both data and applications

Using a search engine

Data archive Dryad

Data archive Easy

Data portal Overheid.nl

Data portal Publicdata.eu

Data portal Kasabi

Data catalogs

Data workflow

Search for the relevant data sets

Do data integration and clean up

Visualise the data

Re-publish integrated and curated data

Data integration

Unify the different data in a single formatXLS + PDF + CSV => CSV

Integrate the dataConnect the bits and pieces

Curate the dataFix errors in the data

Process the data in preparation for its usageStemming, removal of stop words,

Normalisation of values

Data integration

Unify the different data in a single formatXLS + PDF + CSV => CSV

Integrate the dataConnect the bits and pieces

Curate the dataFix errors in the data

Process the data in preparation for its usageStemming, removal of stop words,

Normalisation of values

Use Linked Data tosave time there!

Data workflow

Search for the relevant data sets

Do data integration and clean up

Visualise the data

Re-publish integrated and curated data

Visualise data DataMarket

Visualise data Google explorer

Visualise data Microsoft explorer

Visualise data WolframAlpha

Data workflow

Search for the relevant data sets

Do data integration and clean up

Visualise the data

Re-publish integrated and curated data

Publish processed data

How?Send to data archive

Publish on web sites

Why?Re-usability

Community process (if I do it, other will do it)

Scientific process

Hands on session

In 2001, what were the council election results in the county of Warwickshire (UK) ?

What is the evolution of literacy rate in Tanzania since 1988 ?

Can you make this plot of unemployment ratesusing the Google Public data explorer ?

Linked Data

http://www.flickr.com/photos/erikcharlton/3337465138

Linked Data & Linked Open Data

What is the problem?

Frank and Christophe publish some open data

Roi wants to combine and enrich it

Marvel icons: mermer, DeviantArt

KennissenStad

ChristopheAmsterdam

PeterBarcelona

DavidParijs

Frank

VillePays

BarceloneEspagne

ParisFrance

AmsterdamPays-Bas

Christophe

Roi

WWWWWW

What is the problem?

Data integration issueKennissen, Stad, Ville, Pays ?

Paris = Parijs ?

Amsterdam = Amsterdam ?

Lot of work for the data consumer

KennissenStad

ChristopheAmsterdam

PeterBarcelona

DavidParijs

VillePays

BarceloneEspagne

ParisFrance

AmsterdamPays-Bas

+

=

?

Why is this so problematic?

Un-even balance of information

Christophe and Frank have more of it than Roi

Solution: share more information

Amsterdam = Amsterdam ?Replace Amsterdam by Amsterdam, Netherlands

Kennissen, Stad, Ville, Pays ?Provide a description for the meaning of the columns as a separate document

Paris = Parijs ?Use English names instead of local ones

But is that enough?

There could still be several Amsterdam, NetherlandsPrecise until 100% certain of uniqueness

Documentation of columns is one more thing to consume to use the data

It's hard to enforce the usage of a single language to name things

Linked Data idea

Data integration at the data levelDefine things in the data set

Use unambiguous identifiers for the things

Associate descriptions to the identifiers

Connect things together

2Name fr is ParisName nl is Parijs...1Name is Christophe...

Works in

Linked Data and the Web

Proposal: use the Web as a platformIdentifiers = URIs

Descriptions = de-referenced documents

ex:Christophedbpedia:Amsterdamex:worksIn

Use of compact URIsdbpedia = http://dbpedia.org/resource/ex = http://example.org/

This is a tripleThis is a resource

What is at dbpedia:Amsterdam ?

Benefits of Linked Data

Data model of triples and resources:Everything defined as described things and relations

Cope easilly with heterogeneous descriptions

Easy to cross-reference things between data sets

The network contains both the data and its description

Use the Web and other open standards (RDF, SPARQL, ...)

ex:Acquaintanceex:Christopheex:Peterex:Daviddbpedia:Amsterdamdbpedia:Barcelonadbpedia:Parisex:worksInex:worksInex:worksInrdf:typerdf:typerdf:typeFrank publishes his data

KennissenStad

ChristopheAmsterdam

PeterBarcelona

DavidParijs

Christophe re-use part of Frank's data to publish his data

ex:Acquaintanceex:Christopheex:Peterex:Daviddbpedia:Amsterdamdbpedia:Barcelonadbpedia:Parisdbpedia:Netherlandsdbpedia:Spaindbpedia:Franceex:worksInex:worksInex:isInex:isInex:worksInex:isInrdf:typerdf:typerdf:typeVillePays

BarceloneEspagne

ParisFrance

AmsterdamPays-Bas

Roi add some more information

ex:Acquaintanceex:Christopheex:Peterex:Daviddbpedia:Amsterdamdbpedia:Barcelonadbpedia:Parisdbpedia:Netherlandsdbpedia:Spaindbpedia:Francedbpedia:Europeex:worksInex:worksInex:isInex:isInex:worksInex:isInex:isInex:isInex:isInrdf:typerdf:typerdf:typeConocido@esrdf:label

Reasoning with Semantics

Bonus!dbpedia:Netherlandsdbpedia:Europeex:isIndbpedia:Amsterdamex:isInex:isInowl:TransitivePropertyrdf:type+

=

dbpedia:Europeex:isIndbpedia:AmsterdamExample usageMaterialize implicit information

Check for consistency

Linked Data vs Linked Open Data

Linked Data doesn't imply Open Data!Possible to use Linked Data principles to closed data

Open Data doesn't imply Linked DataMany open data is not yet published as linked data

Linked data + Open Data = Linked Open DataGlobal, web-scale, data space of open data

Rough estimate of size

295 data sets, 31B facts in LOD Cloud

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Everyone can enrich the cloud

ex:Acquaintanceex:Christopheex:Peterex:Daviddbpedia:Amsterdamdbpedia:Barcelonadbpedia:Parisdbpedia:Netherlandsdbpedia:Spaindbpedia:Francedbpedia:Europeex:worksInex:worksInex:isInex:isInex:worksInex:isInex:isInex:isInex:isInrdf:typerdf:typerdf:typeConocido@esrdf:label

Get Linked Open Data

Linked Data is a graph data base on the Web

It can be consumed in two waysAs documents on the WebOpen the resources and ask for RDF content to get a graph

As a data baseQuery the data with SPARQL (equivalent of SQL)

Search for RDF documents

Look for the RDF export

Look for the RDF export

Look for the RDF export

Sindice Web data inspector

Hands on session

Get the RDF of a BestBuy product

Get RDF out of rottentomatoes

Use-case: building a social
network of musicians

Goal

Make a networkNodes = artists

Edges => play(ed) in the same band

Use Freebase as data source

Getting the data

First option:Get all the pages for all the artists as RDF

Merge them

Filter the data to keep only the desired relations

Second option:Extract a sub-graph out of the data graph of Freebase

SPARQL query

PREFIX fb:

SELECT distinct ?name1 ?name2 WHERE { ?g1 fb:music.group_membership.group ?group. ?g1 fb:music.group_membership.member ?member1. ?member1 fb:type.object.name ?name1.

?g2 fb:music.group_membership.group ?group. ?g2 fb:music.group_membership.member ?member2. ?member2 fb:type.object.name ?name2.

filter ((?g1 != ?g2) && (?member1 != ?member2)) filter ((lang(?name1)="en") && (lang(?name2)="en")) filter (str(?name1) < str(?name2))}

Result

Use factforge.netContains a copy of the data from Freebase

Understands SPARQL queries

Results: http://bit.ly/music_sn

Hot line for Linked (Open) Data

Christophe [email protected]

http://www.few.vu.nl/~cgueret

@cgueret

Rinke [email protected]

http://www.rinkehoekstra.nl/

@rinkehoekstra

/