The importance of the Web for the Semantic Web

Post on 08-May-2015

1.080 views 1 download

description

Talk delivered during the Makolab Semantic Day 2013.

Transcript of The importance of the Web for the Semantic Web

The importance of the Web for the Semantic Web

Alexandre Monnin, PhD

Associate researcher @Inria

Senior Open data Adviser for Etalab

Chair of the « PhiloWeb » community group (W3C)

Organiser of the « Les rencontres du web de données » Meetup

Twitter: @aamonnz/@PhiloWeb, Website : web-and-philosophy.org

“semantic web” and not

“semantic web”

[C. Welty, ISWC 2007]

Above all: the Semantic Web is deeply entrenched in the Web

Why the Web?

Lesl

ie C

arr,

« T

he

Fun

dam

enta

ls o

f th

e W

eb, t

he

Imp

ort

ance

of

Web

Sci

ence

” «

Maybe it is a « temporary glitch? » (Leslie Carr)

A fragile reality, relying on specific architectural principles, that gave birth over the years to many innovations that may threaten its very existence.

If the Semantic Web (or Web of data) has any future, it must be aware of its roots and preserve what made the Web so incredibly successful on a previously unseen scale.

I. Naming/identifying

The basics

Kieron O’Hara, « The Web as an ethical space »

Three components of the architecture of the web • identification (URI) & « adressability » (URL)

http://www.inria.fr

http://ns.inria.fr/fabien.gandon#me

ldap://[2001:db8::7]/c=GB?objectClass?one

• communication / protocol (HTTP) GET /centre/sophia HTTP/1.1

Host: www.inria.fr

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de-de)

AppleWebKit/523.10.3 (KHTML, like Gecko) Version/3.0.4

Safari/523.10

Accept-Encoding: gzip

Accept: text/html,application/xhtml+xml,application/xml

Accept-Language: en,en-us;q=0.8,fr;q=0.5,fr-fr;q=0.3

Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7

Referer: http://fabien.info/

• Representation(s) languages (HTML / RDF) Fabien travaille chez <a href="http://www.inria.fr">Inria</a>

<http://www.inria.fr> foaf:member data:fabien

Three functions

• identification of ressources (URI)

• access to representation (HTTP URI)

• Encoding of representations (HTML , RDF, etc)

URIs (universal syntax)

Because the Web had to link to other competing systems: WAIS, Gopher, Prospero… Interoperability and openneness gave it a decisive advantage from the inception (Google “Gopher”!).

“I originally called these things “Universal Document Identifiers” (UDIs) even before we started using them for concepts. 8 The IETF were a bit put off, thinking it was too much hubris to call them “universal.” Now I realize that I should have held firm and said “but they are,” as any alternative system of naming you can make out there, I can map it to the character set we use in URIs and I can invent a new scheme for it. So we can map any scheme to URIs. We’d already mapped Gopher, FTP, and these sorts of things. Now, we’ve got HTTP and there will be lots of other schemes. So in a sense URIs are universal, as we’re saying anything—any name that you come across—can be mapped into this space.” –TimBL)

URIs are also what…

URIs: not « just » a universal syntax

http://www.

…links…

…(yes, links!)…

are made of

<a href=“http://example.com/”>lorem ipsum</a>

This remains true of the Semantic Web as well

le web originel liens typés…

RDF every bit of information is decomposed in triples Subject / entity /node relation /attribute/arc object / value /node

ex : Slides.html has for author Alexandre

and for theme the Web

Slides.html has author Alex Slides.html has theme Web

« From ADA to AAA »

• ADA (Web) = Anyone can designate anything

“philosophy may be necessary to explain what happens when the legal system hits the Web. When you make a web-page you can link to anything, you can write anything about it. But when a lawyer comes along and reserves the right to charge you to link to their page, then in a way it’s a philosophical question, as you have to tie linking to the way the protocol is defined over a name as just a reference, something that has never been controlled over the millennia. Systems where you control names haven’t worked so far, and so you need the philosophy to show how these protocols are ground out in history and in concepts for using names that lawyers” (TimBL)

« From ADA to AAA »

• AAA (Linked data) = Anyone can say anything about anything

Because we can designate anything (green lines), we can then link any things (red lines)

II. What’s being named?

ressources

Document Properties Correlate

UDI Papers (1992)

logical name, not a physical address so

that moving documents does not

impinges on the durability of such

names (some details should be

obfuscated)

object or document, unit of retrieval

rather than the unit of storage, might

identify a query formulated through a

service, a question rather than a

document

URI RFC 1630 (1994)

cf. above. Distinct from a file name

that is local, should remain opaque,

devoid of the details attached to the

technicalities of its implementation

accessible objects if URIs are also URLs

URL RFC 1738 (1994) non-physical address

resources (not defined), identified in a

abstract way (by contrast, accessible

contents for RFC 1739)

URN RFC 1737

(1994) name, identifier stable resource, not accessible

IRL RFC 1736

(1995)

address (URL), Identifier (URN),

Description (URC)

resource – networked or non-

networked

URC IETF drafts meta-information, list of identifiers – Document

appearences database

One URI never = one « page »

Electronic documents

Rendering service

Computers

Servicing Client

Application

Other encoding formats

RPC

Psychophysically equivalents

client server

Content negotiation (conneg)

http

A forerunner: system 33 (1991-1993)

HTTP Range 14

Code HTTP Résultat Indication

200 (OK) Representation Information ressource or non-

information resources

303 (see other)

URI Any kind of resource

4XX, 5XX (error)

Error message Nothing can be inferred

They did not talk about it They talked about it

ressource

state de of the resource

the representational state of the

resource (whence the acronym

« REST »!)

Actually, this explains why there are no links on the Web before an actor like Google appears. Links are indeed rather pointers to resources inside the representations of other ressources (and, as such, these pointer might not dereference, nor therefore link two relata).

Wait! How about REST?

« »

"Resources are angels, URIs pins" "Naming is printing money"

(Larry Masinter)

Resources are « shadows »: not a bug but one of the Web’s greatest features “7.1.2 Manipulating Shadows. Defining resource such that a URI identifies a concept rather than a document leaves us with another question: how does a user access, manipulate, or transfer a concept such that they can get something useful when a hypertext link is selected? REST answers that question by defining the things that are manipulated to be representations of the identified resource, rather than the resource itself. An origin server maintains a mapping from resource identifiers to the set of representations corresponding to each resource. A resource is therefore manipulated by transferring representations through the generic interface defined by the resource identifier.” (Roy Fielding)

Can objects be mere « shadows » ?

Not « mere » shadows, but still, that compares well to what some philosophershave to say about objects:

“the presence of an object inherently involves its absence. The reason is simply the standard one: in order for a subject to take an object as an object, there must be a separation between them – enough separation to make room for intrinsic abstraction, of detachment, of stabilization. So it is essentially an ontological theorem of this metaphysics that no object, for any given subject, will be wholly there, in the sense of being fully effectively accessible. Or, to put it more carefully: in order to be present ontologically – i.e., in order to be materially present – an object must also be (at least partially) absent metaphysically, in the sense of being partly out of effective reach.” (Brian Cantwell Smith)

Just as an objet is never entirely present, a resource is never accessed as such, only « representations » are – slices of trajectories. Many philosophers thus argue that objects are not already there, waiting to be picked up or designated. Rather, what we designate are regularities, patterns that need to be tended to and maintained and that call for it.

It comes with a price

The trajectory drawn by these regularities corresponds to Justin Erenkranz’ characterization of resource as “network continuation”. The price is higher than expected since identifying ressource necessitates to "maintain a mapping from resource identifiers to the set of representations corresponding to each resource". The cost is so high that, eventually, everything will be 404. 404 guarantee that no higher authority is responsible for making sure that every URI dereferences. It is as much a design principle of the Web as any other.

May 2007

April 2008

September 2008 March 2009

September 2010 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

September 2011

The Web as ontology On what there is on a global scale

September 2011

From ADA and AAA to a shared world: Wikipedia and DBpedia

Conclusion :

• “The Web may fragment if the engineering isn’t right” (Kireon O’Hara)

• Just as the Web is an application built on the Internet, not the Internet, applications built on the Web are not the Web itself. While they might depart from its principles, yet they build on its success.

• “The Web spreads the conditions of its initial creation” (L. Carr). Then, as an open platform, it also spreads potential threats to these conditions.

Why should we care for the Web?

• While many important players are all trying to impose their own rules, keeping data behind closed walls, silos, proprietary platforms, we can see one of them going against the grain, towards more and more openness, building a platform designed to nurture open innovation: Valve’s Steam in the field of video games.

• “So rather than having this curated store we’re going to say, “OK if we are thinking about this correctly, it really should be sort of a network API.” There should be this publishing model – and yes you have to worry about viruses and malware and stuff like that – but essentially anybody should be able to publish anything through Steam.” (Gabe Newell)

• No unlike the Web…

Thank you very much!