Web of Data and its Status on Persian Web Data Space

112
Web of Data & its Status on Persian Web Data Space Ali Khalili, PhD Knowledge Representation & Reasoning Group Vrije Universiteit Amsterdam

Transcript of Web of Data and its Status on Persian Web Data Space

Web of Data & its Status on Persian Web Data Space

Ali Khalili, PhDKnowledge Representation & Reasoning Group

Vrije Universiteit Amsterdam

Why do we need the Data Web?

Linked Open Data 2Ali Khalili

Linked Open Data

Why do we need the Data Web?

Linked Open Data 2Ali Khalili

Linked Open Data

Problem: Try to search for these things on the current Web:

آپارتمان های برای اجاره در منطقه ۲ تهران که نزدیک مجموعه فرهنگی، ورزشی و تفریحی هستند

Why do we need the Data Web?

Linked Open Data 2Ali Khalili

Linked Open Data

Problem: Try to search for these things on the current Web:

آپارتمان های برای اجاره در منطقه ۲ تهران که نزدیک مجموعه فرهنگی، ورزشی و تفریحی هستند

Web Server

rahnama.com www.avval.ir

Has data about sport centres in Tehran.

Knows about real estate offers in

Tehran.

DBDB

HTMLHTML

Web Server

نیازمندی های همشهریکتاب اول

Why do we need the Data Web?

Linked Open Data 2Ali Khalili

Linked Open Data

Problem: Try to search for these things on the current Web:

آپارتمان های برای اجاره در منطقه ۲ تهران که نزدیک مجموعه فرهنگی، ورزشی و تفریحی هستند

Web Server

rahnama.com www.avval.ir

Has data about sport centres in Tehran.

Knows about real estate offers in

Tehran.

DBDB

HTMLHTML

Web Server

RDFنیازمندی های همشهریکتاب اول

Why do we need the Data Web? (con.)

Linked Open Data 3Ali Khalili

Linked Open Data

• Researchers working on Semantic Web topics in the Middle East. • Side effects of some drugs with a specific chemical compound

prescribed for certain diseases. • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a

stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants.

• …

Why do we need the Data Web? (con.)

Linked Open Data 3Ali Khalili

Linked Open Data

• Researchers working on Semantic Web topics in the Middle East. • Side effects of some drugs with a specific chemical compound

prescribed for certain diseases. • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a

stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants.

• …

Information is available on the Web, but opaque to current search.

Evolution of the Web

Linked Open Data 4Ali Khalili

Linked Open Data

https://mcgratha.wordpress.com/

Evolution of the Web

Linked Open Data 4Ali Khalili

Linked Open Data

https://mcgratha.wordpress.com/

1998: Semantic Web Road map

Evolution of the Web

Linked Open Data 4Ali Khalili

Linked Open Data

https://mcgratha.wordpress.com/

1998: Semantic Web Road map

2006: emergence of Data Web

Semantic Web

Linked Open Data 5Ali Khalili

Linked Open Data

Linked Data

Linked Open Data 6Ali Khalili

Linked Open Data

• A set of best practices for publishing data on the Web.

Linked Data

Linked Open Data 6Ali Khalili

Linked Open Data

• A set of best practices for publishing data on the Web.• Follows 4 simple principles:

1. Use URIs as names (identifiers) for conceptual things.

2. Use HTTP URIs so that users can look up (dereference) those names.

3. When someone looks up a URI, provide useful information, using the open standards such as RDF, SPARQL, etc.

4. Include links to other URIs, so that users can discover more things.

Linked Data Principles

Linked Open Data 7Ali Khalili

Linked Open Data

WWW World

Linked Data Principles

Linked Open Data 7Ali Khalili

Linked Open Data

WWW World

Linked Data Principles

Linked Open Data 7Ali Khalili

Linked Open Data

WWW World

Linked Data Principles

Linked Open Data 7Ali Khalili

Linked Open Data

WWW World

Linked Data Principles

Linked Open Data 7Ali Khalili

Linked Open Data

WWW World

5 Open Data

Linked Open Data 8Ali Khalili

Linked Open Data

make your stuff available on the Web (whatever format) under an open license

make it available as structured data (e.g., Excel instead of image scan of a table)

make it available in a non-proprietary open format (e.g., CSV as well as of Excel)

use Linked Data format (URIs to identify things, RDF to represent data)

link your data to other people’s data to provide context

http://5stardata.info/

5 Open Data

Linked Open Data 9Ali Khalili

Linked Open Data

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 10Ali Khalili

Linked Open Data

http://lod-cloud.net/

Linked Open Data Cloud

Linked Open Data 11Ali Khalili

Linked Open Data

http://lod-cloud.net/

Statistics

Linked Open Data 12Ali Khalili

Linked Open Data

http://lodlaundromat.org/

http://stats.lod2.eu/

more than 3426 datasets

Linked Open Data 13Ali Khalili

Linked Open Data

Interlinking

Enrichment

Quality Analysis

Evolution

Exploration

Extraction

Storage/Querying

Authoring

Linked (Open) Data Lifecycle

http://stack.linkeddata.org/

Extraction

Linked Open Data 14Ali Khalili

Linked Open Data Lifecycle

from Semi-structured sources

Linked Open Data 15Ali Khalili

Linked Open Data Lifecycle Extraction

from Semi-structured sources

Linked Open Data 15Ali Khalili

Linked Open Data Lifecycle Extraction

Resource Property Value

Multi-lingual• 38.3m things in 125 languages • 23.8m are localized descriptions of things in the English DBpedia • English DBpedia describes 4.58m things (68,091,260 statements)

Multi-domain1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases

Linked Open Data 16Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

http://dbpedia.org/resource/Tehran

Multi-lingual• 38.3m things in 125 languages • 23.8m are localized descriptions of things in the English DBpedia • English DBpedia describes 4.58m things (68,091,260 statements)

Multi-domain1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases

Linked Open Data 16Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Linked Open Data 17Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Linked Open Data 17Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Persian DBpedia?

Persian DBpedia (mapping Wiki)

Linked Open Data 18Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Linked Open Data 19Ali Khalili

Linked Open Data Lifecycle Extraction

• Ad-hoc• DBpedia extraction framework

• Generic• OpenRefine

from Semi-structured sources

from Unstructured sources

Linked Open Data 20Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

from Unstructured sources

Linked Open Data 20Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

Named Entity Recognition

from Unstructured sources

Linked Open Data 20Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

Named Entity Recognition

foundedBy

Relation Extraction

Named Entity Recognition

Linked Open Data 21Ali Khalili

Linked Open Data Lifecycle Extraction

http://spotlight.dbpedia.org

http://bioportal.bioontology.org/annotator

Visualizing Linked Data

Linked Open Data 22Ali Khalili

Linked Open Data Lifecycle Exploration

Contextual visualizationse.g. Zemanta https://wordpress.org/plugins/zemanta

Text Analytics using Linked Data

Linked Open Data 23Ali Khalili

Linked Open Data Lifecycle Exploration

conTEXThttp://context.aksw.org

Linked Open Data 24Ali Khalili

Linked Open Data Lifecycle Extraction

http://nlp2rdf.org

Interoperability between NLP tools, language resources and annotations

NLP Interchange Format (NIF)

Named Entity Recognition & Disambiguation (NERD) http://nerd.eurecom.fr/

Linked Open Data 25Ali Khalili

Linked Open Data Lifecycle Extraction

لزوم ایجاد ابزارهای پردازش زبان طبیعی متصل به وب داده ها

TokenizerStemmer

POS tagger

Named Entity RecognitionPersian WordNet

Relation Extraction

from Structured sources

Linked Open Data 26Ali Khalili

Linked Open Data Lifecycle Extraction

from Structured sources

Linked Open Data 26Ali Khalili

Linked Open Data Lifecycle Extraction

Triplification by Materialization

from Structured sources

Linked Open Data 27Ali Khalili

Linked Open Data Lifecycle Extraction

Triplification by SPARQL-to-SQL-Rewriting

Triplification

Linked Open Data 28Ali Khalili

Linked Open Data Lifecycle Extraction

• Relational Database to RDFR2RML: RDB to RDF Mapping Languagehttp://www.w3.org/TR/r2rml/

• D2R Server: Accessing databases with SPARQL & as Linked Datahttp://d2rq.org/

• Sparqlifydefining RDF views on relational databaseshttp://sparqlify.org/

Storage & Querying

Linked Open Data 29Ali Khalili

Linked Open Data Lifecycle

Relational Databases vs. Triple Stores

Linked Open Data 30Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• A relational databases’ (e.g. MySQL, PostgreSQL, Oracle) natural representation is a collection interlinked tables.

• A triple stores’ (e.g. OpenSesame, AllegroGraph, Neo4j) natural representation is a multi-relational network, or graph.

Relational Databases vs. Triple Stores

Linked Open Data 31Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Relational databases tend to not maintain public access points (some might provide Web APIs).

• Triple stores maintain public access points called SPARQL end-points.

• Relational database users tend to not publish their schemas.

• Triple store users tend to reuse and extend public schemas called ontologies.

Existing Triple Stores

Linked Open Data 32Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Native triple stores4Store, AllegroGraph, BigData, Jena TDB, Sesame, Stardog, OWLIM and uRiKa

• RDBMS-backed triple storesJena SDB, IBM DB2 and OpenLink Virtuoso

• NoSQL triplestoresCumulusRDF

HDT - Your binary format for RDF

Linked Open Data 33Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• HDT (Header, Dictionary, Triples)• a compact data structure and binary serialization format for RDF • keeps big datasets compressed to save space while

maintaining search and browse operations without prior decompression.

• Linked Data Fragments• Streaming approach • Query the web of data on Web-scale

http://www.rdfhdt.org

http://linkeddatafragments.org/

SPARQL – SQL for the Linked Data

Linked Open Data 34Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

SPARQL – SQL for the Linked Data

Linked Open Data 34Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

• SPARQL queries are considerably better aligned with users’ mental models of a domain.

SPARQL – SQL for the Linked Data

Linked Open Data 34Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

• SPARQL queries are considerably better aligned with users’ mental models of a domain.

SPARQL – SQL for the Linked Data

Linked Open Data 35Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• SPARQL allows the conceptual data model to be fully explored through queries.

SPARQL – SQL for the Linked Data

Linked Open Data 35Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• SPARQL allows the conceptual data model to be fully explored through queries.

- example:workPhone rdfs:subPropertyOf example:phone- example:cellPhone rdfs:subPropertyOf example:phone- example:homePhone rdfs:subPropertyOf example:phone

SPARQL – SQL for the Linked Data

Linked Open Data 36Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.

SPARQL – SQL for the Linked Data

Linked Open Data 36Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.

SPARQL – SQL for the Linked Data

Linked Open Data 37Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• In addition to SELECT, INSERT and DELETE, SPARQL supports ASK queries.

• SPARQL includes syntax (i.e. SERVICE) to call two or more data sources within a single query.

• …

Hands-on

Linked Open Data 38Ali Khalili

Linked Open Data Lifecycle Storage/Querying

Write a SPARQL query to return a list of all universities in Iran sorted by name.

DBpedia Endpoints: http://dbpedia.org/sparql http://live.dbpedia.org/sparql http://dbpedia-live.openlinksw.com/sparql http://lod.openlinksw.com/sparql

SPARQL Query Interface

Linked Open Data 39Ali Khalili

Linked Open Data Lifecycle Storage/Querying

http://yasgui.org/

Authoring

Linked Open Data 40Ali Khalili

Linked Open Data Lifecycle

Semantic Wikis

Linked Open Data 41Ali Khalili

Linked Open Data Lifecycle Authoring

• Semantic (Text) Wikis Authoring of semantically annotated text

• Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)

http://semantic-mediawiki.org/

http://aksw.org/Projects/OntoWiki

OntoWiki use case: Catalogus Professorum

Linked Open Data 42Ali Khalili

Linked Open Data Lifecycle Authoring Semantic Data Wiki

The Catalogus Professorum Lipsiensis – Semantics-based Collaboration and Exploration for Historians

OntoWiki use case: Catalogus Professorum

Linked Open Data 42Ali Khalili

Linked Open Data Lifecycle Authoring Semantic Data Wiki

The Catalogus Professorum Lipsiensis – Semantics-based Collaboration and Exploration for Historians

Semantic Content Annotation

Linked Open Data 43Ali Khalili

Linked Open Data Lifecycle Authoring

WYSIWY - What You See Is What You GetG

Semantic Content Annotation

Linked Open Data 43Ali Khalili

Linked Open Data Lifecycle Authoring

WYSIWY - What You See Is What You GetG

Semantic Content Annotation

Linked Open Data 43Ali Khalili

Linked Open Data Lifecycle Authoring

WYSIWY - What You See Is What You Mean

http://rdface.aksw.org

M

LD-R Framework

Linked Open Data 44Ali Khalili

Linked Open Data Lifecycle Authoring

http://ld-r.org

Interlinking

Linked Open Data 45Ali Khalili

Linked Open Data Lifecycle

Interlinking

Linked Open Data 46Ali Khalili

Linked Open Data Lifecycle

• The degree to which entities that represent the same concepts are linked to each other.

• “Connecting things that are somehow related” • Methods

• Automatic, Semi-automatic, Manual • Universal, Domain-specific

<http://dbpedia.org/resource/VU_University_Amsterdam>

<https://www.wikidata.org/entity/Q1065414>

SameAs

Interlinking Methods

Linked Open Data 47Ali Khalili

Linked Open Data Lifecycle

• Ontology Matching • establish links between ontologies underlying two

data sources.

• Instance Matching (Link Discovery) • discover links between instances contained in two

data sources.

SILK Framework

Linked Open Data 48Ali Khalili

Linked Open Data Lifecycle

http://silk-framework.com/

Interlinking Semi-automatic

LIMES Framework

Linked Open Data 49Ali Khalili

Linked Open Data Lifecycle

http://aksw.org/Projects/LIMES

Interlinking Semi-automatic

Linked Open Data 50Ali Khalili

Linked Open Data Lifecycle

Exploration

Linked Open Data 50Ali Khalili

Linked Open Data Lifecycle

• Search • Browse • Visualize

Exploration

Search for Linked Data

Linked Open Data 51Ali Khalili

Linked Open Data Lifecycle Exploration

Search for Linked Data

Linked Open Data 51Ali Khalili

Linked Open Data Lifecycle Exploration

http://swoogle.umbc.edu/

Search for Linked Data

Linked Open Data 52Ali Khalili

Linked Open Data Lifecycle Exploration

Search for Linked Data

Linked Open Data 53Ali Khalili

Linked Open Data Lifecycle Exploration

http://lov.okfn.org/

Search for Linked Data

Linked Open Data 54Ali Khalili

Linked Open Data Lifecycle Exploration

http://lotus.lodlaundromat.org

Search for Linked Data

Linked Open Data 55Ali Khalili

Linked Open Data Lifecycle Exploration

http://demo.ld-r.org/lotus

Search for Linked Data

Linked Open Data 56Ali Khalili

Linked Open Data Lifecycle

Data hub http://datahub.io search for data, register published datasets, create and manage groups of datasets…

Exploration

Search in Linked Data

Linked Open Data 57Ali Khalili

Linked Open Data Lifecycle Exploration

http://www.wolframalpha.com/

Question Answering Systems

Search for/in Linked Data

Linked Open Data 58Ali Khalili

Linked Open Data Lifecycle Exploration

Search for/in Linked Data

Linked Open Data 58Ali Khalili

Linked Open Data Lifecycle Exploration

Search for/in Linked Data

Linked Open Data 59Ali Khalili

Linked Open Data Lifecycle Exploration

https://www.google.com/cse/

Search for/in Linked Data

Linked Open Data 60Ali Khalili

Linked Open Data Lifecycle Exploration

http://schema.org/

http://bl.ocks.org/danbri/1c121ea8bd2189cf411c

Browsing Linked Data

Linked Open Data 61Ali Khalili

Linked Open Data Lifecycle

Loupe http://loupe.linkeddata.es/ discovering the type of data contained in a dataset, its structure, and the vocabularies used…

Exploration

Browsing Linked Data

Linked Open Data 62Ali Khalili

Linked Open Data Lifecycle

Linked Data Reactor http://ld-r.org a component-based framework to view, browse and edit Linked Data

Exploration

Browsing Linked Data

Linked Open Data 63Ali Khalili

Linked Open Data Lifecycle

Exhibit http://www.simile-widgets.org/exhibit/

Exploration

Visualizing Linked Data

Linked Open Data 64Ali Khalili

Linked Open Data Lifecycle Exploration

Using existing visualisations for structured datae.g. Sgvizler http://dev.data2000.no/sgvizler/

Visualizing Linked Data

Linked Open Data 65Ali Khalili

Linked Open Data Lifecycle Exploration

Using graph-based visualizationse.g. RelFinder http://www.visualdataweb.org/relfinder/relfinder.php

Enrichment

Linked Open Data 66Ali Khalili

Linked Open Data Lifecycle

ORE (Ontology Repair and Enrichment)

Linked Open Data 67Ali Khalili

Linked Open Data Lifecycle Enrichment

http://ore.aksw.org/

Quality Analysis

Linked Open Data 68Ali Khalili

Linked Open Data Lifecycle

Linked Data Quality Dimensions

Linked Open Data 69Ali Khalili

Linked Open Data Lifecycle

Quality Assessment for Linked Data: A Survey

Luzzu

Linked Open Data 70Ali Khalili

Linked Open Data Lifecycle

http://eis.iai.uni-bonn.de/Projects/Luzzu

Quality Analysis Tools

RDFUnit

Linked Open Data 71Ali Khalili

Linked Open Data Lifecycle

http://aksw.org/Projects/RDFUnit

Quality Analysis Tools

Evolution

Linked Open Data 72Ali Khalili

Linked Open Data Lifecycle

EvoPat – Pattern based KB Evolution

Linked Open Data 73Ali Khalili

Linked Open Data Lifecycle Evolution

• Inspired by Software Refactoring • Agile Knowledge Engineering • Basic & compound evolution patterns

http://link.springer.com/chapter/10.1007%2F978-3-642-17746-0_41

Linked Open Data 74Ali Khalili

Linked Open Data

Interlinking

Enrichment

Quality Analysis

Evolution

Exploration

Extraction

Storage/Querying

Authoring

Linked (Open) Data Lifecycle

http://stack.linkeddata.org/

Life Cycle

Linked Open Data 75Ali Khalili

Linked Open Data

Linked Open Data 75Ali Khalili

Linked Open Data

References

Linked Open Data 77Ali Khalili

Linked Open Data

• http://slidewiki.org/deck/11936_semantic-data-web-lecture-series • Introduction to linked data and its lifecycle on the web • http://euclid-project.eu/ • http://videolectures.net/wims2011_auer_interlinked/ • https://vimeo.com/76257120 • http://www.slideshare.net/slidarko/evolving-the-web-into-a-giant-global-

database-3880018 • http://www.dataversity.net/introduction-to-triplestores/ • http://www.topquadrant.com/2014/05/05/comparing-sparql-with-sql/