1 Introduction to KR and Semantic Web Many slides are based on tutorial by Ivan Herman (W3C)...

Post on 31-Mar-2015

221 views 0 download

Tags:

Transcript of 1 Introduction to KR and Semantic Web Many slides are based on tutorial by Ivan Herman (W3C)...

1

Introduction to KR

and

Semantic Web

Many slides are based on tutorial by Ivan Herman (W3C) reproduced here with kind permission. Additionally, many of the intro slides are based on tutorial by Sean Bechhofer, Ian Horrocks and Peter F. Patel-Schneider, with additions by Suresh Manandhar and Dimitar Kazakov.

2

History of the Semantic WebWeb was “invented” by Tim Berners-Lee (amongst others), a physicist

working at CERN

TBL’s original vision of the Web was much more ambitious than the

reality of the existing (syntactic) Web:

TBL (and others) have since been working towards realising this vision,

which has become known as the Semantic WebE.g., article in May 2001 issue of Scientific American…

“... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”

3

Realising the complete “vision” is too hard for now (probably)But we can make a start by adding semantic annotation to web resources

Scientific American, May 2001:

4

Where we are Today: the Syntactic Web

[Hendler & Miller 02]

5

The Syntactic Web is…

A hypermedia, a digital libraryA library of documents called (web pages) interconnected by a hypermedia of links

A database, an application platformA common portal to applications accessible through web pages, and presenting their results as web pages

A platform for multimediaBBC Radio 4 anywhere in the world! Terminator 3 trailers!

A naming schemeUnique identity for those documents

A place where computers do the presentation (easy) and people do the linking and interpreting (hard).

Why not get computers to do more of the hard work?

[Goble 03]

6

Hard Work using the Syntactic Web…

Find images of Peter Patel-Schneider, Frank van Harmelen and Alan Rector…

Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois

7

Impossible (?) using the Syntactic Web…

Complex queries involving background knowledgeFind information about “animals that use sonar but are not either bats or dolphins”

Locating information in data repositoriesTravel enquiries

Prices of goods and services

Results of human genome experiments

Finding and using “web services”Visualise surface interactions between two proteins

Delegating complex tasks to web “agents”Book me a holiday next weekend somewhere warm, not too far away, and where they speak French or English

, e.g., Barn Owl

8

What is the Problem?Consider a typical web page:

Markup consists of: – rendering

information (e.g., font size and colour)

– Hyper-links to related content

Semantic content is accessible to humans but not (easily) to computers…

9

What information can we see…

WWW2002The eleventh international world wide web conferenceSheraton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days learn interactRegistered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,

ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire

Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event …Speakers confirmedTim berners-lee Tim is the well known inventor of the Web, …Ian FosterIan is the pioneer of the Grid, the next generation internet …

10

What information can a machine see…

WWW2002The eleventh international world wide web conferenceSheraton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days learn interactRegistered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong,

india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire

Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event …Speakers confirmedTim berners-lee Tim is the well known inventor of the Web, …Ian FosterIan is the pioneer of the Grid, the next generation internet …

11

Solution: XML markup with “meaningful” tags?<name>WWW2002The eleventh international world wide webcon</name><location>Sheraton waikiki hotelHonolulu, hawaii, USA</location><date>7-11 may 2002</date><slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,

ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>

<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the Web,</bio>…

12

But What About…<conf>WWW2002The eleventh international world wide webcon</conf><place>Sheraton waikiki hotelHonolulu, hawaii, USA</place><date>7-11 may 2002</date>

<slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,

ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>

<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the Web,…

13

Machine sees…<name>WWW2002The eleventh international world wide webc</name><location>Sheraton waikiki hotelHonolulu, hawaii, USA</location><date>7-11 may 2002</date><slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,

ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>

<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh

international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the W</bio><speaker>Ian Foster</speaker><bio>Ian is the pioneer of the Grid, the ne</bio>

14

Need to Add “Semantics”External agreement on meaning of annotations

E.g., Dublin CoreAgree on the meaning of a set of annotation tags

Problems with this approachInflexible

Limited number of things can be expressed

Use Ontologies to specify meaning of annotationsOntologies provide a vocabulary of terms

New terms can be formed by combining and extending existing ones

Meaning (semantics) of such terms is formally specified

Can also specify relationships between terms in multiple ontologies

15

An ontology is an engineering artifact:

It is constituted by a specific vocabulary used to describe a certain reality, plus

a set of explicit assumptions regarding the intended meaning of the vocabulary.

Thus, an ontology describes a formal specification of a certain

domain:

Shared understanding of a domain of interest

Formal and machine manipulable model of a domain of interest

“An explicit specification of a conceptualisation” [Gruber93]

Ontology in Computer Science

16

Structure of an OntologyOntologies typically have two distinct components:

Names for important concepts in the domainElephant is a concept whose members are a kind of animal

Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants

Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years

Background knowledge/constraints on the domainAdult_Elephants weigh at least 2,000 kg

All Elephants are either African_Elephants or Indian_Elephants

No individual can be both a Herbivore and a Carnivore

18

A Semantic Web — First Steps

Extend existing rendering markup with semantic markupMetadata annotations that describe content/funtion of web accessible resources

Use formal Ontologies to provide vocabulary for

annotations“Formal specification” is accessible to machines

A prerequisite is a standard web ontology languageNeed to agree common syntax before we can share semantics

Syntactic web based on standards such as HTTP and HTML

Make web resources more accessible to automated processes

19

Ontology Design and DeploymentGiven key role of ontologies in the Semantic Web, it will

be essential to provide tools and services to help users:Design and maintain high quality ontologies, e.g.:

Meaningful — all named classes can have instances

Correct — captured intuitions of domain experts

Minimally redundant — no unintended synonyms

Richly axiomatised — (sufficiently) detailed descriptions

Store (large numbers) of instances of ontology classes, e.g.:Annotations within web pages

Answer queries over ontology classes and instances, e.g.:Find more general/specific classes

Retrieve annotations/pages matching a given description

Integrate and align multiple ontologies

20

Lets have a go towards building a semantic website

21

HTML and XML

<H1>Book </H1><UL> <LI>Title: AI introduction

<LI>Author: Frank Russell<LI>ISBN: 554-15197912-554X

</UL>

HTML:

<book><title>AI introduction</title><author>Frank Russell</author><ISBN>554-15197912-554X</ISBN>

</book>

XML:

22

XML documents are trees over textbox nodes

<book><title>...</title><author>...</author><isbn>...</isbn>

</book>

• XML Schema: grammar to describe legal trees

• Problem: No agreed semantics

book

author isbntitle

23

Sharing data across the web

Requires

Shared Syntax: XML provides this

Shared Semantics : XML does not provide

this

24

But this is a difficult task

Web documents also contain unstructured pagesWords are ambiguous

bank – river bank or financial institution (context dependent, so cannot do dictionary lookup)

Even though sentences may be unambiguous. Mostly, but not always:

I saw a man with a telescope

But the web is a collection of documents

25

But hang on

The web is not just a collection of documentsThe web is lot more:

It’s a collection of links between documentsIt’s a collection of structured documents

Hence, we can gain a lot by attaching shared semantics to:

the linksthe structure within documents

At least, this is a start!!

26

But how do we do it?

27

Example: The Music site of the BBC

27

29

Site editors roam the Web for new factsmay discover further links while roaming

They update the site manually

And the site gets soon out-of-date

How to build such a site 1.

30

Editors roam the Web for new data

published on Web sites

“Scrape” the sites with a program to extract

the informationIe, write some code to incorporate the new data

Easily gets out of date again…

Content changes

How to build such a site 2.

31

Editors roam the Web for new data via API-s

Understand those…input, output arguments, datatypes used, etc

Write some code to incorporate the new data

Easily get out of date again…

Format changes

How to build such a site 3.

32

Use external, public datasetsWikipedia, MusicBrainz, …

They are available as data not API-s or hidden on a Web site

data can be extracted using, e.g., HTTP requests or standard queries

standardised formats/semantics

The choice of the BBC

33

Use the Web of Data as a Content

Management System

Use the community at large as content

editors

Get others to do the work

In short…

34

And this is no secret…

35

There are more an more data on the Webgovernment data, health related data, general knowledge, company information, flight information, restaurants,…

More and more applications rely on the

availability of that data

Data on the Web

36

But… data are often in isolation, “silos”

Photo credit “nepatterson”, Flickr

37

A “Web” wheredocuments are available for download on the Internet

but there would be no hyperlinks among them

Imagine…

38

And the problem is real…

39

We need a proper infrastructure for a real Web

of Datadata is available on the Web

accessible via standard Web technologies

data are interlinked over the Web

ie, data can be integrated over the Web

This is where Semantic Web technologies

come in

We want machines to do the integration.

But this requires machines to understand data

Data on the Web is not enough…

40

Lets have a go i.e. connect the silos

Photo credit “kxlly”, Flickr

41

We will use a simplistic example to

introduce the main Semantic Web concepts

In what follows…

42

Map the various data onto an abstract

semantically grounded data

representation

Merge the resulting representations

Start making queries on the whole!queries not possible on the individual data sets

The rough structure of data integration

43

We start with a book...

44

A simplified bookstore data (dataset “A”)

ISBN Author Title Publisher Year

0006511409X id_xyz The Glass Palace id_qpr 2000

ID Name Homepage

id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publisher’s name City

id_qpr Harper Collins London

45

1st: export your data as a set of relations

http://…isbn/000651409X

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:authora:publish

er

46

Relations form a graphthe nodes refer to the “real” data or contain some literal

how the graph is represented in machine is immaterial for now

Some notes on the exporting the data

47

Same book in French…

48

Another bookstore data (dataset “F”)

A B C D

1 ID Titre Traducteur Original2 ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3

4

5

6 ID Auteur7 ISBN 0-00-6511409-

X$A11$

8

9

10 Nom11 Ghosh, Amitav12 Besse, Christianne

49

2nd: export your second set of datahttp://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteurf:ti

tre

http://…isbn/2020386682

f:nom

50

3rd: start merging your data

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur f:titre

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

51

3rd: start merging your data (cont)

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur f:titre

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

Same URI!

52

3rd: start merging your dataa:title

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

http://…isbn/000651409X

53

User of data “F” can now ask queries like:“give me the title of the original”

well, … « donnes-moi le titre de l’original »

This information is not in the dataset “F”…

…but can be retrieved by merging with

dataset “A”!

Start making queries…

54

We “feel” that a:author and f:auteur should be the

same

But an automatic merge does not know that!

Let us add some extra information to the merged

data:a:author same as f:auteur

both identify a “Person”

a term that a community may have already defined:a “Person” is uniquely identified by his/her name and, say, homepage

it can be used as a “category” for certain type of resources

However, more can be achieved…

55

3rd revisited: use the extra knowledge

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

56

User of dataset “F” can now query:“donnes-moi la page d’accueil de l’auteur de l’original”

well… “give me the home page of the original’s ‘auteur’”

The information is not in datasets “F” or “A”…

…but was made available by:merging datasets “A” and datasets “F”

adding three simple extra statements as an extra “glue”

Start making richer queries!

57

Using, e.g., the “Person”, the dataset can

be combined with other sources

For example, data in Wikipedia can be

extracted using dedicated toolse.g., the “dbpedia” project can already extract the “infobox” information from Wikipedia (check out the dbpedia ontology)

Combine with different datasets

58

Merge with Wikipedia data

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

r:type

foaf:name w:reference

59

Merge with Wikipedia data

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

http://dbpedia.org/../The_Hungry_Tide

http://dbpedia.org/../The_Calcutta_Chromosome

http://dbpedia.org/../The_Glass_Palace

r:type

foaf:name w:reference

w:author_of

w:author_of

w:author_of

w:isbn

60

Merge with Wikipedia data

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

http://dbpedia.org/../The_Hungry_Tide

http://dbpedia.org/../The_Calcutta_Chromosome

http://dbpedia.org/../Kolkata

http://dbpedia.org/../The_Glass_Palace

r:type

foaf:name w:reference

w:author_of

w:author_of

w:author_of

w:born_in

w:isbn

w:long w:lat

61

It may look like it but, in fact, it should not

be…

What happened via automatic means is

done every day by Web users!

The difference: a bit of extra rigour so that

machines could do this, too

Is that surprising?

62

We could add extra knowledge to the merged

datasetse.g., a full classification of various types of library data

geographical information

etc.

This is where ontologies, extra rules, etc, come inontologies/rule sets can be relatively simple and small, or huge, or anything in between…

Even more powerful queries can be asked as a

result

It could become even more powerful

63

What did we do?

Data in various formats

Data represented in abstract format

Applications

Map,Expose,…

ManipulateQuery…

64

What did we do? (alternate view)

Inferencing

Query and Update

Web of Data Applications

Browser Applications

Stand Alone Applications

Common “Graph” Format &Common

Vocabularies

“Bridges”

Data on the Web

65

… the graph representation is independent

of the exact structures

… a change in local database schemas,

XHTML structures, etc, does not affect the

whole“schema independence”

… new data, new connections can be

added seamlessly

The abstraction pays off because…

66

Through URI-s we can link any data to any

data

The “network effect” is extended to the

(Web) data

“Mashup on steroids” become possible

The network effect

67

Ontology Languagesfor theSemantic Web

68

Ontology LanguagesWide variety of languages for “Explicit Specification”

Graphical notationsSemantic networksTopic Maps (see http://www.topicmaps.org/)UMLRDF

Logic basedDescription Logics (e.g., OIL, DAML+OIL, OWL)Rules (e.g., RuleML, LP/Prolog)First Order Logic (e.g., KIF)Conceptual graphs(Syntactically) higher order logics (e.g., LBase)Non-classical logics (e.g., Flogic, Non-Mon, modalities)

Probabilistic/fuzzy

Degree of formality varies widelyIncreased formality makes languages more amenable to machine processing (e.g., automated reasoning)

69

Objects/Instances/IndividualsElements of the domain of discourseEquivalent to constants in FOL

Types/Classes/ConceptsSets of objects sharing certain characteristicsEquivalent to unary predicates in FOL

Relations/Properties/RolesSets of pairs (tuples) of objectsEquivalent to binary predicates in FOL

Such languages are/can be:Well understoodFormally specified(Relatively) easy to useAmenable to machine processing

Many languages use “object oriented” model based on:

70

Web “Schema” LanguagesExisting Web languages extended to facilitate content

descriptionXML XML Schema (XMLS)

RDF RDF Schema (RDFS)

XMLS not an ontology languageChanges format of DTDs (document schemas) to be XML

Adds an extensible type hierarchyIntegers, Strings, etc.

Can define sub-types, e.g., positive integers

RDFS is recognisable as an ontology languageClasses and properties

Sub/super-classes (and properties)

Range and domain (of properties)

71

RDF and RDFSRDF stands for Resource Description Framework

It is a W3C candidate recommendation

(http://www.w3.org/RDF)

RDF is graphical formalism ( + XML syntax +

semantics)for representing metadata

for describing the semantics of information in a machine- accessible way

RDFS extends RDF with “schema vocabulary”, e.g.:Class, Property

type, subClassOf, subPropertyOf

range, domain

72

URIsURI = Uniform Resource Identifier

"The generic set of all names/addresses that are short cuts referring to resources"

URLs (Uniform Resource Locators) are a particular type of URI, used for resources that can be accessed on the WWW (e.g., web pages)

In RDF, URIs typically look like “normal” URLs, often with fragment identifiers to point at specific parts of a document:

http://www.somedomain.com/some/path/to/file#fragmentID

73

Web Ontology Language RequirementsDesirable features identified for Web Ontology Language:

Extends existing Web standards Such as XML, RDF, RDFS

Easy to understand and useShould be based on familiar KR idioms

Formally specified

Of “adequate” expressive power

Possible to provide automated reasoning support

74

From RDFS to OWL

Two languages developed to satisfy above requirementsOIL: developed by group of (largely) European researchers (several from EU OntoKnowledge project)

DAML-ONT: developed by group of (largely) US researchers (in DARPA DAML programme)

Efforts merged to produce DAML+OILDevelopment was carried out by “Joint EU/US Committee on Agent Markup Languages”

Extends (“DL subset” of) RDF

DAML+OIL submitted to W3C as basis for standardisationWeb-Ontology (WebOnt) Working Group formed

WebOnt group developed OWL language based on DAML+OIL

OWL language now a W3C Candidate Recommendation

Will soon become Proposed Recommendation

75

OWL Axioms

Axioms (mostly) reducible to inclusion (v)C ´ D iff both C v D and D v C

76

Semantic Web Wedding Cake

77

The Semantic Web provides technologies

to make such integration possible!

Hopefully you get a full picture at the end …

So where is the Semantic Web?

78

Books – Semantic Web + OWL

79

Books – Semantic Web + OWL

80

Papers – Description Logics

A Description Logic Primer

Markus Krötzsch, Frantisek Simancik, Ian Horrocks

This paper provides a self-contained first introduction to description

logics (DLs).

http://arxiv.org/abs/1201.4089

Attributive concept descriptions with complements

Manfred Schmidt-Schauß, Gert Smolka

The description logic described in this paper is closely related to the

one taught in ARIN.

http://dx.doi.org/10.1016/0004-3702(91)90078-X,