Methodology for Linked Data Generation and publication Daniel Vila-Suero [email protected].
-
Upload
randell-dawson -
Category
Documents
-
view
216 -
download
1
Transcript of Methodology for Linked Data Generation and publication Daniel Vila-Suero [email protected].
![Page 2: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/2.jpg)
Introduction
• Different methods and life-cycles available:
• LOD2• Datalift• W3C Linked Data cookbook• W3C Best practices for Linked Data• Guidelines for Multilingual Linked Data• "datos.bne.es: an insight into Library Linked Data"
Daniel Vila-Suero, Asunción Gómez-Pérez, Library Hi-tech Journal 2013
![Page 3: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/3.jpg)
OEG LD Projects
Bibliotecas: Biblioteca Nacional
http://datos.bne.es
Geográfico
IGN: http://geo.linkeddata.es/
OTALEX
Meteorología:
AEMET: http://aemet.linkeddata.es/
Viajes:
Grupo Prisa : http://webenemasuno.linkeddata.es/
3
![Page 4: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/4.jpg)
Linked Data lifecycles
• Methodological approach to Linked Data publication and management.
• Basically described as series of steps or activities and associated tasks, technologies and methods
• Several approaches are being proposed (Hausenblas et al., Hyland et al., Villazón-Terrazas et al. etc.) with a lot of similarities see http://www.w3.org/2011/gld/wiki/GLD_Life_cycle
• However, these different lifecycles should be understood as a set of practices that are currently applied not as one-size-fits-all formula [1]
4[1] Publishing Linked Data - There is no One-Size-Fits-All Formula http://mccarthy.dia.fi.upm.es/doc/edf.pdf
![Page 5: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/5.jpg)
LD Lifecycles: Hyland et al.
5
![Page 6: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/6.jpg)
LD Lifecycles: Hausenblas et al.
6
http://linked-data-life-cycles.info/
![Page 7: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/7.jpg)
LD Lifecycles: Datalift approach
7
• French National project:• http://datalift.org/
• Activities:• Raw RDF Conversion• Selection of vocabularies• Convertion according to a schema• Publication• Interlinking• Exploitation
![Page 8: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/8.jpg)
LD Lifecycles: LOD2 Approach
8
• Extraction• Storage• Authoring• Interlinking• Enrichment• Quality• Evolution• Exploration
• More info at:
http://lod2.eu/
![Page 9: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/9.jpg)
LD Lifecycles: Villazón-Terrazas et al.
9
![Page 10: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/10.jpg)
Guidelines for ML Linked Data
• Set of main activities:• Specification• Modelling• RDF Generation• Data curation• Linking• Publication
• Each activity composed of several tasks
![Page 11: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/11.jpg)
Method and lifecycle
![Page 12: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/12.jpg)
Specification: Intro
• The goal is to:• Specify and analyse the data sources• Produce documentation that will be used withing the next
activities
![Page 13: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/13.jpg)
Specification: Tasks
1. Identification and analysis of data sources
2. URI/IRI design
3. Analysis and definition of licensing and provenance information
![Page 14: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/14.jpg)
1. Specification
1. Analyze the data sources: What is the structure of your data? In which format? What type entities are described in the data?
2. URI Design: How will you name your resources?
• Several guidelines available:
• Linked Data Patterns (Dodds and Davis)
http://patterns.dataincubator.org/book/• UK Cabinet office:
http://www.cabinetoffice.gov.uk/media/308995/public_sector_uri.pdf
• Style Guidelines for Naming and Labelling Ontologies in the Multilingual Web: http://bit.ly/xJwA9g
14
![Page 15: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/15.jpg)
1. Specification
3. Define/describe the provenance information: How will you express and track the sources and aggregations of resources?
• Different vocabularies available: OPMV, W3C PROV-O, OAI-ORE, DCMI-PROV, etc.
• Good starting point: PROV Model Primer
15
![Page 16: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/16.jpg)
Specification
Identification and analysis of data sources
INPUT• Data sources• Associated documentation
OUTPUT • Documentation of data sources (type of data, formats, data
model, language, identifiers, etc.)
![Page 17: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/17.jpg)
Specification
Identification and analysis of data sources
• Documentation of data sources:
• Type of data: Authors, Publications, Subjects, etc.
• Format: MARC21, XML, EXCEL, TSV, CSV, RDF
• Data model: MARC21, Ad-hoc RDB, XSD schema
![Page 18: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/18.jpg)
Specification
Identification and analysis of data sources
• Documentation of data sources:
• Languages: English
• Identifiers: Labels of terms, code 001
• Licensing and provenance information:
![Page 19: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/19.jpg)
Specification
URI/IRI design
INPUT:• Documentation of data sources
OUTPUT • Documentation of URI/IRI patterns and namespaces
![Page 20: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/20.jpg)
URI Forms
• Publication of RDF should at least:• Have a de-referenceable URI that responds RDF (Turtle
preferably)• Have a de-referenceable URI that responds a human-
readable representation (HTML)
• Two URI strategies:• #{id}:
• A request to http://example.org/vocabulary#Person
Returns the complete vocab to the client and the client is responsible to find #Person
• HTTP 303 or / Slash strategie:• A request to http://example.org/vocabulary/Person• Returns HTTP 303 with the location of the document with
content-negotiation
20
![Page 21: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/21.jpg)
Specification
URI/IRI design
- Namespace:
- http://www.datos.bne.es/
- prefix: bne:
- Patterns:
- Canonical
http://datos.bne.es/resource/{id}
- Persons: http://datos.bne.es/autor/{id}
- Works: http://datos.bne.es/obra/{id}
- Expressions: : http://datos.bne.es/versión/{id}
- Manifestations: : http://datos.bne.es/edición/{id}
- Subjects: http://datos.bne.es/edición/{id}
![Page 22: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/22.jpg)
Specification
URI/IRI design
- Namespace vocabulary:
- http://www.datos.bne.es/vocab
- prefix: bnevoc:
- Patterns:
- Canonical
http://datos.bne.es/vocab/{id}
![Page 23: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/23.jpg)
URI design = naming
Some general URI design guidelines
23
![Page 24: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/24.jpg)
Naming: Preliminary guidelines for a multilingual scenario
24
![Page 25: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/25.jpg)
Some tools are not prepared for opaque URIs (Pubby)…
25
Semantic Web Journal reviewer about datos.bne.es' paper* :
"It is pity that local names of chosen IFLA-FRBR properties are cryptic codes … but authors of this paper are not to blame about that"
* http://datos.bne.es/resource/XX1718747
* http://www.semantic-web-journal.net/content/datosbnees-library-linked-data-dataset
![Page 26: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/26.jpg)
Some others are better prepared (Puelia)…
26
frbr:C1005 a rdfs:Class; rdfs:label "Person"@en, "Persona"@es
Display labels are configurable using a Turtle config file
* http://datos.bne.es/frontend/persons
Label not selected based on User's locale
![Page 27: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/27.jpg)
Specification
Licensing and provenance information
INPUT:• Documentation of data sources
OUTPUT • Documentation of provenance and licensing terms
ODRLOpen Digital Rights Language
![Page 28: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/28.jpg)
Exercise:
• Define the URI patterns for the following elements:
- Person
- Book
- Person is author of Book
- Person first name
- Person last name
- Book title
- Book is part of Collection
- Collection has part Book
28
![Page 29: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/29.jpg)
How open is the LOD cloud?
[1] Rodriguez-Doncel, Victor et al., 2013. Rights declaration in Linked Data. in Proc. of the 3rd Int. W. on Consuming Linked Data O. Hartig et al. (Eds) CEUR vol. 1034 (2013)
![Page 30: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/30.jpg)
Modelling: Intro
• The goal is to:• Design and implement a vocabulary for describing the data
sources in RDF• Produce a valid and LD compliant vocabulary that
facilitates the understanding and consumption of data
![Page 31: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/31.jpg)
2. Modelling
2. Create vocabulary terms (classes, properties and relationships): if you are unable to find suitable terms, you need to create them• From an existent Conceptual Model• From scratch• By specializing existing terms (subPropertyOf, subClass of,
domain, range, etc.). For example:
myVocabulary:Archivist rdfs:subClassOf foaf:Person
• Desktop Tools:• Neon Toolkit• Protégé• Topbraid Composer Community Edition
• Online tools:• Metadata registry• Neologism (DERI)
31
![Page 32: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/32.jpg)
Modelling: Tasks
1. Analysis and selection of domain vocabularies
2. Development of vocabulary
3. Vocabulary for representing licensing and provenance information
![Page 33: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/33.jpg)
Modelling
Analysis and selection of domain vocabularies
INPUT:• Documentation of data sources (data models, type of data)
OUTPUT • A selection of standard/widely-used vocabularies
LOVLinked Open Vocabularies
![Page 34: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/34.jpg)
2. Modelling
1. Search for suitable vocabularies to model the data sources: What properties and classes will you use to describe the RDF data?• Yes, but How can I find suitable vocabularies??
• Go to thedatahub.org to search for similar datasets and see what vocabularies are used
• Given a SPARQL endpoint use vocab-express• Given a set of keywords describing entities in your
source data go to LOV http://lov.okfn.org/ and use the search utility
• Open Metadata Registry http://metadataregistry.org• Ask the community: [email protected], public-
[email protected], [email protected]• What makes a vocabulary suitable?
Is your LD vocabulary 5-star?
34
![Page 35: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/35.jpg)
Modelling
Analysis and selection of domain vocabularies
NIFNLP Interchange Format
IFLA
BIBO
DublinCore
Use http://lov.okfn.org/
![Page 36: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/36.jpg)
Modelling
Development of vocabulary
INPUT:• Documentation of data sources (data model, type of data)• Selection of standard/widely-used vocabs
OUTPUT • Well-documented and linked vocabulary (with mappings to
widely-used vocabs)
![Page 37: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/37.jpg)
Modelling
Development of vocabulary
• Integrate all reused vocabularies (possibly using subClassOf and subpropertyOf)
• Document the vocabulary using an Ontology Editor Tool
• Validate the vocabulary:
Example:
http://oeg-lia3.dia.fi.upm.es/oops/
![Page 38: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/38.jpg)
Exercise:
• Create a small vocabulary in Protege with the following model:
- Person
- Book
- Person is author of Book
- Person first name
- Person last name
- Book title
- Book is part of Collection
- Collection has part Book
38
![Page 39: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/39.jpg)
Exercise:
• Reuse and align with existing vocabularies if possible
• Validate using
39
![Page 40: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/40.jpg)
Modelling
Vocabulary for licensing and provenance
INPUT:• Documentation of data sources (licensing and provenance)
OUTPUT • Selection of standard vocabs
ODRLOpen Digital Rights Language
PROVW3C Provenance Ontology
![Page 41: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/41.jpg)
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description(e.g., VoID, DCAT)1 DCAT
Data catalog vocabulary
![Page 42: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/42.jpg)
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description(e.g., VoID, DCAT)1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license)2
DCATData catalog vocabulary
![Page 43: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/43.jpg)
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description(e.g., VoID, DCAT)1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license)2
?3a
Standard license available
DCATData catalog vocabulary
![Page 44: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/44.jpg)
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description(e.g., VoID, DCAT)1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license)2
?Yes
Use URI of standardlicense e.g., CC03a
Standard license available
DCATData catalog vocabulary
![Page 45: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/45.jpg)
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description(e.g., VoID, DCAT)1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license)2
?Use rights declarationlanguage, e.g., ODRL
Yes
Use URI of standardlicense e.g., CC0 3b3a
No
Standard license available
ODRLOpen Digital Rights Language
DCATData catalog vocabulary
![Page 46: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/46.jpg)
RDF Generation: Intro
• The goal is to:• Define the method, process and technologies to generate
RDF• Ideally, produce a set of explicit mappings from the data
sources to RDF
![Page 47: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/47.jpg)
RDF Generation: Tasks
1. Selection, extension or development technologies for RDF generation
2. Mapping of data sources to RDF
3. Transformation of data sources to RDF
![Page 48: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/48.jpg)
RDF Generation
Technologies for RDF Generation
INPUT:• Documentation of data sources (formats)
OUTPUT • Configuration of technologies (framework)
Open RefineJena Toolkit
any23
MorphR2rml engine
![Page 49: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/49.jpg)
3. RDF Generation
1. Transform the data source to RDF• For that we need to know the model and format of the data
source.• Increasing amount of RDFizers:
• RDB: New W3C Recommendation R2RML and Direct Mapping.
• CSV, TSV, Tabular data: Google Refine, NOR2O• XML: GRRDL, XSLT, etc.• …
• Not always easy to map source data to RDF: Low usability
2. Data cleansing:• New tools for data quality, for example LDIF (Linked Data
Integration Framework)
49
![Page 50: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/50.jpg)
RDF Generation
Technologies for RDF Generation
![Page 51: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/51.jpg)
RDF Generation
Mapping of data sources to RDF
INPUT:• Vocabulary• Data sources • Configuration of technologies
OUTPUT • Ideally, machine-processable mappings• RDF dataset
![Page 52: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/52.jpg)
Data curation: Intro
• Cross-cutting activity• The goal is to:
• Ensure the quality of the LD dataset• Enable data curation at the data sources level
![Page 53: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/53.jpg)
Data curation: Tasks
1. Data sources curation
2. RDF data curation
![Page 54: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/54.jpg)
Data curation
Data sources curation
INPUT:• Data sources• Reports from issues in previous activities
OUTPUT • Documentation of issues in data sources• Fixes to data sources
![Page 55: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/55.jpg)
Data curation
RDF data curation
INPUT:• RDF dataset• Vocabulary
OUTPUT • Documentation of issues• Fixes to RDF dataset
RDFUnita Test-Driven Data Debugging framework
http://validator.linkeddata.org/vapour
![Page 56: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/56.jpg)
RDF Data Curation
• Publication of RDF should at least:• Have a de-referenceable URI that responds RDF (Turtle
preferably)• Have a de-referenceable URI that responds a human-
readable representation (HTML)
• Two URI strategies:• #{id}:
• A request to http://example.org/vocabulary#Person
Returns the complete vocab to the client and the client is responsible to find #Person
• HTTP 303 or / Slash strategie:• A request to http://example.org/vocabulary/Person• Returns HTTP 303 with the location of the document with
content-negotiation
56
![Page 57: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/57.jpg)
Exercise:
• Validate 5 diferent URIs from existing Linked Data projects using http://validator.linkeddata.org/vapour
• Search in the datahub for URIs to check if you don't know any dataset
• Analyse and explain the results
57
![Page 58: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/58.jpg)
Linking: Intro
• The goal is to:• Maximize the connectivity to external datasets• Facilitate data discovery and enrichment
![Page 59: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/59.jpg)
Linking: Tasks
1. Target datasets discovery and selection
2. Link discovery
3. Links validation
![Page 60: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/60.jpg)
4. Linking
1. Identify datasets that may be suitable as linking targets:• Still difficult, most people think on DBPedia• Use thedatahub.org to find similar datasets• Use Sindice to search for entities in your dataset
2. Discover relationships between data items of your dataset and the items of the identified datasets in the previous step• Most well-known semi-automatic tools: SILK, LIMES• Still require high level of configuration
3. Validate the relationships that have been discovered in the previous step• Still difficult, maybe it should be included in
editorial/cataloguing workflows?
60
![Page 61: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/61.jpg)
Linking
Target datasets discovery and selection
INPUT:• RDF dataset• Vocabulary
OUTPUT • Selection of target datasets• Commonalities with target datasets (data, vocabulary level)
![Page 62: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/62.jpg)
Linking
Link discovery
INPUT:• RDF dataset• Selection of target datasets
OUTPUT • Linking tasks specification• Tentative Linksets
Open Refine Silk Limes
![Page 63: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/63.jpg)
Linking
Link validation
INPUT:• Tentative Linksets
OUTPUT • Validated linksets (including provenance and licensing
information)
sameAsValidator
![Page 64: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/64.jpg)
Publication: Intro
• The goal is to:• Make available the RDF dataset following LD best practices• Facilitate dataset discovery and consumption
![Page 65: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/65.jpg)
Publication: Tasks
1. Dataset and vocabulary publication on the Web
2. Metadata definition and publication
![Page 66: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/66.jpg)
Publication
Dataset and vocabulary publication on the Web
INPUT:• RDF dataset• Validated linksets• Vocabulary
OUTPUT • Linked Dataset available and accesible on the Web• Vocabulary available and accesible on the Web (as LD)
SPARQL Linked Data APIsPuelia, Elda, LDP implementations
![Page 67: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/67.jpg)
Publication
Dataset and vocabulary publication on the Web
Example of LD vocabulary:
GND Ontology:
http://d-nb.info/standards/elementset/gnd
![Page 68: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/68.jpg)
Publication
Metadata definition and publication
INPUT:• All previous assets and documentation
OUTPUT • Dataset registered in relevant catalogues• Metadata available, accesible and discoverable
DCATVoID
![Page 69: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/69.jpg)
Publication
Metadata definition and publication
DCAT: http://www.w3.org/TR/vocab-dcat/
Example:
:catalog a dcat:Catalog ; dct:title "Imaginary Catalog" ; rdfs:label "Imaginary Catalog" ; foaf:homepage <http://example.org/catalog> ; dct:publisher :transparency-office ; dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ; dcat:dataset :dataset-001 , :dataset-002 , :dataset-003 ; .
![Page 70: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/70.jpg)
Exercise:
• Create a DCAT description in Turtle syntax of the Book, Authors, Collections catalog:
• Book dataset• Authors dataset• Collections dataset• If possible put different licenses, languages, and
provenance.
70
![Page 71: Methodology for Linked Data Generation and publication Daniel Vila-Suero dvila@fi.upm.es.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d145503460f949e94dd/html5/thumbnails/71.jpg)
Conclusions
• Data curation is key to success of LD• Documentation of data sources and issues• Language issues have to be taken into account
during the whole process• Metadata description is key for enabling reusing and
discovery• Vocabulary have to be documented and published
following LD BPs