Linked Energy Data Generation

78
Linked Energy Data Generation Tutorial Filip Radulovic, María Poveda Villalón, Raúl García- Castro {fradulovic,mpoveda,rgarcia}@fi.upm.es ETSI Informaticos Universidad Politécnica de Madrid Campus de Montegancedo s/n 28660 Boadilla del Monte, Madrid, Spain Twitter: @LD4SC 02.10.2014. Sustainable Places 2014, Nice, France

description

Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France

Transcript of Linked Energy Data Generation

Page 1: Linked Energy Data Generation

Linked Energy Data GenerationTutorial

Filip Radulovic, María Poveda Villalón, Raúl García-Castro

{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

Twitter: @LD4SC

02.10.2014. Sustainable Places 2014, Nice, France

Page 2: Linked Energy Data Generation

License

• This work is licensed under the Creative Commons Attribution – Non Commercial – Share Alike License

• You are free:• to Share — to copy, distribute and transmit the work• to Remix — to adapt the work

• Under the following conditions• Attribution — You must attribute the work by inserting

• “[source http://www.oeg-upm.net/]” at the footer of each reused slide

• a credits slide stating: “These slides are partially based on “Linked Energy Data Generation” by F. Radulovic, M. Poveda-Villalón, R. García-Castro”

• Non-commercial• Share-Alike

2

Page 3: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

3

Page 4: Linked Energy Data Generation

Classic Web

CIA WorldFactBookWikipedia

Data exposed to the Web via HTML

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig4

Page 5: Linked Energy Data Generation

Classic Web

• Typical web page markup consists of:• Rendering information

(e.g., font size and colour)

• Hyper-links to related content

• Semantic content is accessible to humans but not (easily) to computers…

5

Page 6: Linked Energy Data Generation

Classic Web

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Information from single pages can be found via search engines

6

Page 7: Linked Energy Data Generation

CIA WorldFactBookMovieDB

Classic Web

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

What about complex queries over multiple pages / data sources?

Show me a picture of the tallest building in the country with the highest

CO2 emission rate in 2013

Impossible

7

Page 8: Linked Energy Data Generation

CIA WorldFactBookMovieDB

Classic WebWhat about complex queries over multiple pages / data sources?

Show me a picture of the tallest building in the country with the highest

CO2 emission rate in 2013?

Impossible

8

Page 9: Linked Energy Data Generation

What do we actually want?• Use the Web like a single global database

• Move from a Web of documents to a Web of Data

Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig

Wikipedia CIA WorldFactBook

Shanghai Tower 2013-8-3CC BY-SA 3.0

9

Page 10: Linked Energy Data Generation

Linked Data enables such Web of Data

Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig

Global Identifier: URI (Uniform Resource Identifier) identifies a resource on the Internet.Data Model: RDF (Resource Description Framework) standard model for data interchange on the Web.Access Mechanism: HTTPConnection: Typed Links

Wikipedia CIA WorldFactBook

Shanghai Tower 2013-8-3CC BY-SA 3.0

http://cia.../China 10000…http://...wikipedia.../data/shangaiTower

http://.../co2emission

http

://...

/dep

ictio

n

2013

http://.../co2emissionPerYearhttp://.../location

http://.../location

http://.../year

http://…#sameAs

10

Page 11: Linked Energy Data Generation

The four principles (Tim Berners Lee, 2006)

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those

names. 3. When someone looks up a URI, provide useful

information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover

more things.

http://www.w3.org/DesignIssues/LinkedData.html11

Page 12: Linked Energy Data Generation

“The Semantic Web is an extension of the current Web in which information is

given well-defined meaning, better enabling computers and people to work in

cooperation.

It is based on the idea of having data on the Web defined and linked such that it

can be used for more effective discovery, automation, integration, and reuse

across various applications.”

Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002, http://www.w3.org/2002/07/swint.html

Semantic Web definition

12

Page 13: Linked Energy Data Generation

Benefits + Cases of success

• Provide semantics meaningful data & common understanding• Interoperability

• Reasoning power• Infer more data • Find mistakes in the original data?

• Enrich your data (with what is already out there)• Search engines are indexing some schemas

• Increase visibility• Multilingual information

13

Page 14: Linked Energy Data Generation

In this tutorial“D4.1 Requirements and guidelines for energy data generation” From READY4SmartCities project available at http://goo.gl/IWDmYy

14

Page 15: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

15

Page 16: Linked Energy Data Generation

Select data source

• Selecting the data source that will be transformed into Linked Data

• Steps1. To define the requirements2. To select one or several data sources

• Alternatives:• Data set from your own organization• Data sourced not owned by your organization (external data

sources)

16

Page 17: Linked Energy Data Generation

Select data source – LCC example

• Limitation to external data sources (search)1. Requirements

• Real-world scenario in the energy domain • Available for use• Available in machine-processable format (the

more structured the data are, the better)• Can be linked with generic entities (e.g., location)

2. Leeds City Council – energy consumption (http://data.gov.uk/dataset/council-energy-consumption)

17

Page 18: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

18

Page 19: Linked Energy Data Generation

Obtain access to data source

• Data access means • technical means to retrieve the data• legal rights to use the data

• In some cases, data source might not be accessible

• Steps1. To identify the person to contact2. To request the access3. To obtain access and to retrieve the data

• Access alternatives: files, programming interface, database, data streams, etc.

19

Page 20: Linked Energy Data Generation

Obtain access to data source – LCC example

• Data set already available for download

• Available in a CSV file

20

Page 21: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

21

Page 22: Linked Energy Data Generation

Analysing licensing of the data source

• Licenses specify the legal terms under which a data set can be used and exploited

• Steps1. To identify the publisher2. To find the applicable license

• Web page, data set metadata, data itself• Contact the publisher

3. To read the license and determine legal terms

• Tips• Analysis should be performed upon all available copies of

the data• Ensure compatible licences between several data sources

22

Page 23: Linked Energy Data Generation

Analyse licensing – LCC example

23

Page 24: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

24

Page 25: Linked Energy Data Generation

Analyse data source

• Getting insight into data structure and organization

• Steps1. To analyse the characteristics of the data

• Data values, data ranges, etc.

2. To obtain the schema of the data• Description of concepts and their relationships

• Data format alternatives: • Structured data• Unstructured data

• Tip: Use standard modeling language for data schema (e.g., UML)

25

Page 26: Linked Energy Data Generation

Analyse data source – LCC example

• Electricity, gas and oil consumptions as decimal values

• 1-year intervals - 2010/11, 2011/12, 2012/13• Different types of council sites (mostly buildings) • Full address provided (street, city, district)• Correspondence with people from LCC open data

26

Page 27: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

27

Page 28: Linked Energy Data Generation

Ontology development - Preparation• RDF – Resource Description Framework

• Data model• (subject-predicate-object)

• Resource naming strategy• For terms

• Pattern: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#myterm

• Example: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#hasQuantitiveValue

• For individuals• Pattern:

http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/myIndividual• Example:

http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/LeisureCentreWetJohnCharlesCentreforSport

• RDF syntaxes• RDF/XML, ttl, N3, N quads

28

Page 29: Linked Energy Data Generation

Ontology development

[1] Suárez-Figueroa, M.C. PhD Thesis: NeOn Methodology for Building Ontology Networks: Specification, Scheduling and Reuse. Spain. June 2010.

Activity definition taken from [1]

Focus of each activity

Existing tools to carry out the activity

Tips, alternatives and references

29

Page 30: Linked Energy Data Generation

Ontology developmentOntology Requirements: refers to the activity of collecting the requirements that the ontology should fulfil (for example, reasons to build the ontology, identification of target groups and intended uses). (NeOn)

30

Proposed references: - NeOn Guidelines for non functional

requirements.-Competency Questions technique

[1]

Tools: mind map, text editor, etc

[1] Gruninger, M., Fox, M. S. The role of competency questions in enterprise engineering. In Proceedings of the IFIP WG5.7 Workshop on Benchmarking - Theory and Practice, Trondheim, Norway, 1994.

Page 31: Linked Energy Data Generation

Ontology development – LLC exampleLCC example (Data from….)

Non functional requirements specified:• The ontology will try to adopt concepts and design patterns

in other ontologies where possible• The ontology should be implemented in OWL 2 DL

31

Page 32: Linked Energy Data Generation

Ontology developmentOntology term extraction to extract a glossary of terms that may be developed.

Tools for terminology extraction:• Identify nouns, verbs, etc.

• Tools: Freeling for free text

Focus:• Extract terminology from Competency Questions (NeOn)• Extract terminology directly from the data

• Expert advise || Done by experts

32

Complete the list with synonyms

Page 33: Linked Energy Data Generation

Ontology development – LLC example

Siteplace

Address

PostCode

ElectricityConsumption, utilization

yearstime

33

Page 34: Linked Energy Data Generation

Ontology developmentOntology conceptualization refers to the activity of organizing and structuring the information (data, knowledge, etc.), obtained during the acquisition process, into meaningful models at the knowledge level and according to the ontology requirements specification document. (NeOn)

Drawing tools, including paper and pencil

Focus drafting (optional):• Identify main domains and top concept• Establish relations between concepts and domains

Focus detail model:• Establish hierarchies• Establish specific relationships among defined

elements, rules, axioms, etc.

34

Do not try to define everything. You might change your mind during the implementation.

Page 35: Linked Energy Data Generation

Ontology development – LLC example

35

Page 36: Linked Energy Data Generation

Ontology developmentOntology search refers to the activity of finding candidate ontologies or ontology modules to be reused (NeOn).

Search tools:• General purpose:

• LOV: http://lov.okfn.org• LOD2Stats: http://stats.lod2.eu/vocabularies• Google• Others: ODP Portal http://ontologydesignpatterns.org

• Domain base:• Smart cities: http://smartcity.linkeddata.es/

Focus:• Terms already used in LOD• Save time and resources• Increase interoperability

Use domain terms and synonyms

Do not spend too much time trying to find terms

for everything. You might need to create them.

36

Page 37: Linked Energy Data Generation

Ontology development – LLC exampleTerms and synonyms

37

Page 38: Linked Energy Data Generation

Ontology developmentOntology Selection refers to the activity of choosing the most suitable ontologies or ontology modules among those available in an ontology repository or library, for a concrete domain of interest and associated tasks. (NeOn)

Evaluation tools:• OOPS! – OntOlogy pitfalls scanner [1] http://www.oeg-

upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/

(already included in OOPS!)• Vapour http://validator.linkeddata.org/vapour (to be included

in OOPS!)Also it should be considered:

• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Used in Linked Data (LOD2Stats, Sindice, etc)

Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.

Further reference: NeOn Guidelines

38

Page 39: Linked Energy Data Generation

Ontology development – LLC example

• Domain coverage• Schema.org for public places and provides some additional

terms and properties that can be used(e.g., PostalAddress and City)

• Also widely-known and accepted vocabulary interoperability

• Closer semantics• ero:FinalEnergy class from the Energy Resource and the

ssn:Property class from the SSN ontology in order to represent specific indicator for which the consumption is related to

39

Page 40: Linked Energy Data Generation

Ontology developmentOntology Integration. It refers to the activity of including one ontology in another ontology. (NeOn)

Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.

• Plug-ins: Ontology Module Extraction and Partition• Text editors for manual approach

Focus:• How much information should I reuse?• How to reuse the elements or vocabs? Preliminary analysis [1]

• Should I import another ontology?• Should I reference other ontology element URIs?

• ... replicating manually the URI?• ... merging ontologies?

• How to link them?

Techniques:• Import the ontology as a whole• Reuse some parts of the ontology (or ontology module)• Reuse statements

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. The Landscape of Ontology Reuse in Linked Data. 1st Ontology Engineering in a Data-driven World (OEDW 2012) Workshop at the18th International Conference on Knowledge Engineering and Knowledge Management . Galway, Ireland, 9th October 2012. http://www.slideshare.net/MariaPovedaVillalon/mpoveda-oedw2012v1

40

Page 41: Linked Energy Data Generation

Ontology developmentOntology Enrichment It refers to the activity of extending an ontology with new conceptual structures (e.g., concepts, roles and axioms). (NeOn)

Focus:• How should I create terms according to ontological foundations

and Linked Data principles?

Ontology development:• Ontology Development 101: A Guide to Creating Your First

Ontology [2]• Ontology Engineering Patterns

http://www.w3.org/2001/sw/BestPractices/• Extracting ontology conceptualization, formalization

techniques from existing methodologiesRecommendation

• Link to existing entities• Provide human readable documentation• Keep the semantics of the reused elements

[1] Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology’. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.

Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.

41

Page 42: Linked Energy Data Generation

Ontology development – LLC example

42

Page 43: Linked Energy Data Generation

Ontology developmentOntology Evaluation it refers to the activity of checking the technical quality of an ontology against a frame of reference. (NeOn)

Evaluation tools related to Linked Data principles:• OOPS! – OntOlogy pitfalls scanner [2] http://www.oeg-

upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/

(already included in OOPS!)Evaluation tools/techniques other aspects:

• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Application based (queries)• Syntax issues: validators

Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.

43

Page 44: Linked Energy Data Generation

Ontology development – LLC example

Minor, mostly lack of

annotations in reused

terms.

44

Page 45: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation

1. Data transformation2. Data linking

5. Discussion and Conclusions

45

Page 46: Linked Energy Data Generation

Data transformation

• Transformation of the data to RDF

• Steps1. To select the RDF serialization

• RDF/XML, Turtle, N-Triples, JSON-LD2. To select a tool3. To transform the data4. To evaluate the obtained RDF data

• Syntax evaluation• Accuracy• Usage

46

Page 47: Linked Energy Data Generation

Data transformation - Tools

47

Database to RDF Data streams to RDF• morph-RDB• D2R Server• TopBraid Composer

• morph-streams• D2R Server

Spreadsheets to RDF XML to RDF

• TopBraid Composer• Excel2RDF• RDF123• XLWrap• OpenRefine

• XML2RDF• TopBraid Composer• OpenRefine (GoogleRefine,

LODRefine)

Page 48: Linked Energy Data Generation

Data transformation – LCC example

1. Turtle syntax

2. OpenRefine + RDF extension

48

Page 49: Linked Energy Data Generation

Data transformation – LCC example: OpenRefine creating project

49

Page 50: Linked Energy Data Generation

Data transformation – LCC example: OpenRefine adding columns

50

Page 51: Linked Energy Data Generation

Data transformation – LCC example OpenRefine adding columns

51

Page 52: Linked Energy Data Generation

Data transformation – LCC example OpenRefine column transformations

52

Page 53: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

53

Page 54: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

54

Page 55: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

55

Page 56: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

56

Page 57: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

57

Page 58: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

58

Page 59: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF extension

59

Page 60: Linked Energy Data Generation

Data transformation – LCC example OpenRefine RDF generation

60

Page 61: Linked Energy Data Generation

Data transformation – LCC example Evaluation

• Syntax evaluation

• Consistency with the ontologies

• Usage evaluation by running SPARQL queries• show all electricity consumptions and related time periods

for all council sites related to culture• show all energy consumptions and related time period of

council sites from Wakefield district

61

Page 62: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation

1. Data transformation2. Data linking

5. Discussion and Conclusions

62

Page 63: Linked Energy Data Generation

Data linking

• Ensuring that data are not just “isolated islands”

• Steps1. To identify classes whose instances can be the

subject of linking2. To identify data sets that may contain instances

for the previously-identified classes3. To select the tools for performing the task4. To use the tool in order to obtain links

• Tools: LN2R, LD mapper, Silk, LIMES, RDF-AI, Serimi, OpenRefine

63

Page 64: Linked Energy Data Generation

Data linking – LCC example

1. Classes: City, District2. Data sets: Dbpedia3. Tool: OpenRefine

64

Page 65: Linked Energy Data Generation

Data linking – LCC example OpenRefine reconciliation

65

Page 66: Linked Energy Data Generation

Data linking – LCC example OpenRefine reconciliation

66

Page 67: Linked Energy Data Generation

Data linking – LCC example OpenRefine reconciliation

67

Page 68: Linked Energy Data Generation

Data linking – LCC example OpenRefine reconciliation

68

Page 69: Linked Energy Data Generation

Data linking – LCC example OpenRefine reconciliation

69

Page 70: Linked Energy Data Generation

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

70

Page 71: Linked Energy Data Generation

Discussion and Conclusions

• The guidelines are based on requirements from smart city stakeholders

• Address the broad scope of scenarios• Different data formats (databases, CSV, Excel, XML, etc.)• Update frequencies (static and dynamic data)• Legal and licensing issues

• Introduces a complete example

71

Radulovic, F., García-Castro, R., Poveda-Villalón, M., Weise, M., Tryferdis, T.: D4.1: Requirements and guidelines for energy data generation. Technical report, READY4SmartCities Consortium, May 2014

Page 72: Linked Energy Data Generation

More information

72

Page 73: Linked Energy Data Generation

Linked Data is just data

73

Page 74: Linked Energy Data Generation

Benefits of linking data

74

Total electric consumption

Original data + geolocation

Total electric consumption in locations with population > 20.000

Original data + geolocation+ population

Page 75: Linked Energy Data Generation

Benefits of reasoning

75

Total electric consumption in cultural buildings

Page 76: Linked Energy Data Generation

Discussion and Conclusions

76

Page 77: Linked Energy Data Generation

Discussion and Conclusions – Future work

• Development of services for facilitating the usage of Linked Data technology

• Support in adopting Linked Data technology

• Guidelines for publication and exploitation of Linked Data

• Summer school for 2015• Other training?

77

Page 78: Linked Energy Data Generation

Linked Energy Data GenerationTutorial

Filip Radulovic, María Poveda Villalón, Raúl García-Castro

{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

Twitter: @LD4SC

02.10.2014. Sustainable Places 2014, Nice, France