06 gioca-ontologies

29
A.A. 2010-2011 1 Information Technology and Arts Organizations Chapter 6: Metadata and ontologies for digital cultural heritage documentation Information Technology and Arts Organizations

description

OWL reprezentation of Cultural Inheritance

Transcript of 06 gioca-ontologies

Page 1: 06 gioca-ontologies

A.A. 2010-2011 1Information Technology and Arts Organizations

Chapter 6: Metadata and ontologies for digital cultural heritage documentation

Information Technology and Arts Organizations

Page 2: 06 gioca-ontologies

A.A. 2010-2011 2Information Technology and Arts Organizations

Syllabus (3/3)

5. Databases1. Entities, attributes and relations

2. Primary key and foreign key, data domain, query language (SQL)

3. Examples using Access DBMS

4. Spatial access to digital content: GIS and GPS5. GIS examples using ESRI Arcview

6. Metadata and ontologies for digital cultural herit age documentation1. XML, RDF

2. Dublin Core3. Semantic Web

4. OWL, ontologies5. Cidoc-CRM

Page 3: 06 gioca-ontologies

A.A. 2010-2011 3Information Technology and Arts Organizations

Motivations

• Digital data are stored into files and databases

• The data representation is important because if common convention are taken , different applications can cooperate , communicate and elaborate data to provide advanced services � Interoperability

• Internet is a perfect spreader of digital information about art and culture

• A lot of standards are present � it is difficult to have high level of interoperability

• Information can be written in many ways (different languages, synonyms, ...)

� META-DATA means “data about data”

Page 4: 06 gioca-ontologies

A.A. 2010-2011 4Information Technology and Arts Organizations

Looking for a film budget...

326.000 results �

1 result ☺

WEB 2.0 ���� SPARQL

WEB 1.0 ���� SQL

Page 5: 06 gioca-ontologies

A.A. 2010-2011 5Information Technology and Arts Organizations

A cultural search engine...

http://e-culture.multimedian.nl/demo/session/search

Page 6: 06 gioca-ontologies

A.A. 2010-2011 6Information Technology and Arts Organizations

MultimediaN N9C Eculture project

Page 7: 06 gioca-ontologies

A.A. 2010-2011 7Information Technology and Arts Organizations

A step toward ontologies...

Text

during these few lectures I’ve understood the importance of new technological devices in the arts: in particular way, they could really help the public in better understand the context and the history of a piece of art. The concepts I’ve learned make me think about new opportunities and new systems which can be implemented using basic devices and means of communication.I would like to learn more about ICTs in the fields of music and live performances, if there are some important steps forward in this direction, because they are the scenarios in which I’m more interested in. I’m also interested in archives and in query formulation.I personally have some difficulties with the computer language and with its working mechanisms. I also have some problems with abbreviations: I would like to know literary the meaning of these acronyms in order to better understand their applicability and functions.In these sessions I’ve found interesting the fact that even if technology seems very complicated, in reality it is a sum of different small and simple components which can be all be leaded by a fundamental rationale.

• No structure• For a computer it is just a sequence of chars

<feedback><description> This survey takes less than ten minutes to be completed. The first

section is composed of 10 multiple choices questions followed by 5 open questions. Thank you for your feedback.

</description><course>Please insert here the name of the module your are going to evaluate</course><student>

<name>Put here your name</name><ID>What we can use for this?</ID>

</student></feedback>

XML (eXtensible Markup Language)• TAG• Structured data• TAGs can be nested• Nesting does not represent any “relationship”• It can be represented by a tree• TAG name are free• No “common meaning” associated to each tag

Page 8: 06 gioca-ontologies

A.A. 2010-2011 8Information Technology and Arts Organizations

...second step...

RDF (Resource Description Framework)

• Triples (subject-predicate-object)• Statements• We can relate different resources• It can be represented by a graph• Everything in unique identified (URI)• Namespaces � vocabularies

<rdf:RDF><rdf:Description rdf:about ="subject ">

<predicate rdf:resource ="object " />

<predicate >literalvalue </predicate >

</rdf:Description>

...

<rdf:Description .... />

</rdf:RDF>

DC (Dublin Core)

• Just 15 elements• A DC resource can be represented using RDF/XML• It can be seen as a namespace for resources description• It can be used to describe a single resource• To describe a complex domain we need something different

<rdf:RDF>

xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”xmlns:dcxmlns:dcxmlns:dcxmlns:dc="http://purl.org/dc/elements/1.1/"="http://purl.org/dc/elements/1.1/"="http://purl.org/dc/elements/1.1/"="http://purl.org/dc/elements/1.1/"

<rdf:Description rdf:about="http://http://http://http://en.wikipedia.org/wiki/Oxforden.wikipedia.org/wiki/Oxforden.wikipedia.org/wiki/Oxforden.wikipedia.org/wiki/Oxford"><<<<dc:titledc:titledc:titledc:title>>>>OxfordOxfordOxfordOxford</</</</dc:titledc:titledc:titledc:title>>>><<<<dc:coveragedc:coveragedc:coveragedc:coverage>>>>OxfordshireOxfordshireOxfordshireOxfordshire</</</</dc:coveragedc:coveragedc:coveragedc:coverage>>>><<<<dc:publisherdc:publisherdc:publisherdc:publisher rdf:resource=“http://http://http://http://en.wikipedia.orgen.wikipedia.orgen.wikipedia.orgen.wikipedia.org” />

</rdf:Description>

</rdf:RDF>

URI

Literal

Namespace

Statement

ISO Standard 15836:2009

Page 9: 06 gioca-ontologies

A.A. 2010-2011 9Information Technology and Arts Organizations

...third step!

RDFS (RDF Schema)• RDF + meaning to “special resources”• Concept of Class• Predicate is also known as Property

OWL (Ontology Web Language)• Can be written in RDF• Added “new properties”

• intersectionOf, unionOf, complementOf• someValuesFrom, allValuesFrom

• Cardinality of a property• Different types of property

• Symmetric• Functional• InverseFunctional

“The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation .”

An ontology of 80 classes and 132 properties

ISO Standard 21127:2006

CIDOC-CRM (Conceptual Reference Model)

Page 10: 06 gioca-ontologies

A.A. 2010-2011 10Information Technology and Arts Organizations

The Semantic Web project

http://en.wikipedia.org/wiki/Semantic_Web

http://en.wikipedia.org/wiki/DBpedia

Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web". Scientific American Magazine. http://www.sciam.com/article.cfm?id=the-semantic-web&print=true. Retrieved March 26, 2008.

“Most of the Web's content today is designed for humans to read, not forcomputer programs to manipulate meaningfully”

Page 11: 06 gioca-ontologies

A.A. 2010-2011 11Information Technology and Arts Organizations

XML: gives a structure to data (syntax)

<TagnameA attrName1=“AttrValueX” … AttrNamen= “ AtrrValueY” >Text</TagNameA>

OR

<TagnameB attrName1=“AttrValueX” … AttrNamen= “ AtrrValueY” />

Example:

<root><author>

<name>Luca</name><email>[email protected]</email>

</author><author2 name="Luca" email="[email protected]"/>

</root>

• Tags can be nested � the first opened is the last to be closed � the structure of a XML document can be represented by a tree

• A well formed document has only one root element

• At the beginning of the document (before the root element) there is a line which declares the language, the version the encoding and other characteristic, i.e. <?xml version="1.0" encoding="UTF-8"?>

Page 12: 06 gioca-ontologies

A.A. 2010-2011 12Information Technology and Arts Organizations

Same meaning but different structure

Same structure but different meaning

<Works><Work1>Monnalisa</Work1><Work2>Last Supper</Work2>

</Works>

<Works Work1=“Monnalisa”Work2=”Last Supper”/>

<Works><Work1>Monnalisa</Work1><Work2>Last Supper</Work2>

</Works>

<Europe><Nation1>Italy</Nation1><Nation2>France</Nation2>

</ Europe >

Semantics in XML

Page 13: 06 gioca-ontologies

A.A. 2010-2011 13Information Technology and Arts Organizations

The next step...RDF

• Resource: everything we want to identify. Identification is done by using URI (Universal Resource Identifier):

an URL (called namespace or prefix) + suffix

• Statement: a triple like subject – predicate – object

Subject and predicate are resources, i.e. they are identified by an URI

Object can be a resource or a primitive type, i.e. number, string

Example

URIhttp://dbpedia.org/resource/ (Namespace) + Rio_de_Janeiro (Suffix) ���� URI: http://dbpedia.org/resource/Rio_de_Janeirohttp://dbpedia.org/property/ (Namespace) + populationTotal (Suffix) ���� URI: http://dbpedia.org/property/populationTotalhttp://dbpedia.org/ontology/ (Namespace) + birthPlace (Suffix) ���� URI: http://dbpedia.org/ontology/birthPlacehttp://dbpedia.org/resource/ (Namespace) + Paulo_Coelho (Suffix) ���� URI: http://dbpedia.org/resource/Paulo_Coelho

STATEMENTShttp://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/property/populationTotal - 6093472

http://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/ontology/birthPlace - http://dbpedia.org/resource/Paulo_Coelho

Page 14: 06 gioca-ontologies

A.A. 2010-2011 14Information Technology and Arts Organizations

RDF graph

•A set of RDF statements can used to describe a domain . This set is called RDFknowledge base

•An RDF knowledge base can by represented by using a labelled graph : each node represents a resource, i.e. subject or object, and each edge represents a predicate

subject

object

predicate

subject

predicate

object

NAMESPACES

Drupal 7 data model

Page 15: 06 gioca-ontologies

A.A. 2010-2011 15Information Technology and Arts Organizations

RDF/XML statement

• An RDF statement can be expressed by using the XML syntax• In order to make a RDF statement more concise, a namespace can be

specified by using this convention @prefix namespace:URL

Examples:@prefix dbpedia-owl:http://dbpedia.org/ontology/@prefix dbpprop:http://dbpedia.org/property/@prefix dbpedia:http://dbpedia.org/resource/

http://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/ontology/birthPlace - http://dbpedia.org/resource/Paulo_Coelho

becomes:

dbpedia:Rio_de_Janeiro – dbpedia-owl:birthPlace – dbpedia:Paulo_Coelho

<rdf:Description about= “ dbpedia:Rio_de_Janeiro “><dbpedia-owl:birthPlace >

“ dbpedia:Paulo_Coelho “</dbpedia-owl:birthPlace >

</rdf:Description>

“RDF/XML:

Page 16: 06 gioca-ontologies

A.A. 2010-2011 16Information Technology and Arts Organizations

Dublin Core: a famous “namespace”

The DC was born in Dublin (Ohio) in 1995. It was created by a research group organized by the Online Computer Library Center (OCLC) and by the National Center of Supercomputer Application (NCSA)

Motivations:• Museums organize and present their resources in different ways• Even if the structures used to handle information are compatible,

often there are difficulties in data interpretation, caused by a different terminology and semantics

• Effort in cultural integration from different institutions are still limited. Integration of resources owned by different cultural sites could be very helpful for users. They could use an unified interface to search different kind of resources, available in different formats (from a real object to a digital representation)

• The main obstacle is the structural/semantic incompatibilitybetween information system hosted by institutions

It is important to adopt a common/standard interchange data format: a standard to represent information

Page 17: 06 gioca-ontologies

A.A. 2010-2011 17Information Technology and Arts Organizations

CIMI (Computer Interchange of Museum Information)

CIMI (Computer Interchange of Museum Information): is “a consortium of cultural heritage institutions and organizations working together to remove barriers to sharing our most valuable cultural information.”

The consortium develops relevant standards and encourages open, standards-based approaches to creating and sharing digital information

CIMI worked on the application of Dublin Core in museum resources and supply guidelines for the implementation of this standard in cultural heritage domain

ISO Standard 15836:2009 of February 2009

Page 18: 06 gioca-ontologies

A.A. 2010-2011 18Information Technology and Arts Organizations

Dublin Core goals

• Simplicity of creation and maintenance

The Dublin Core element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment

• Commonly understood semantics

The Dublin Core can help a non-specialist searcher to find her way by supporting a common set of elements, the semantics (meaning) of which are universally understood and supported

• Extensibility

Dublin Core developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs

Dublin Core aims to allow the exchange of information in different environment to simplify the discovery of resources

exchangecommunicate collaborate

Page 19: 06 gioca-ontologies

A.A. 2010-2011 19Information Technology and Arts Organizations

One to one principle

In general Dublin Core metadata describes one manifestation or version of a resource, rather than assuming that manifestations stand in for one another

• Surrogates are described separately from the original object

A jpeg image of the Mona Lisa has much in common with the original painting, but it is not the same as the painting. As such the digital image should be described as itself

The problem of the original and surrogates is important for the museums, where the originals are exposed and the surrogates has to be described accurately but at the same time efficiently

• This principle, in many cases, simplify the resource description

The author of the of the original Mona Lisa is the painter, while the author of the photo is the photograph

Page 20: 06 gioca-ontologies

A.A. 2010-2011 20Information Technology and Arts Organizations

RELATIONSHIPS

Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element.SUBJECT

Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.CREATOR

Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element.

TYPE

Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource.DESCRIPTION

A name given to the resource.TITLE

IDENTIFICATION

Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. A jurisdiction may be a named administrative entity or a geographic place to which the resource applies. Recommended best practice is to use a controlled vocabulary such as the Thesaurus of Geographic Names [TGN]. Temporal topic may be a named period, date, or date range. Where appropriate, named places or time periods can be used inpreference to numeric identifiers such as sets of coordinates or date ranges.

COVERAGE

Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Relationships may be expressed reciprocally (if the resources on both ends of the relationship are being described) or in one direction only, even when there is a refinement available to allow reciprocity. If text strings are used instead of identifying numbers, the reference should be appropriately specific. For instance, a formal bibliographic citation might be used to point users to a particular resource.

RELATION

The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. In general, include in this area information about a resource that is related intellectually to the described resource but does not fit easily into a Relation element, e.g. Image from page 54 of the 1922 edition of Romeo and Juliet

SOURCE

INTELLECTUAL PROPERTIES

Recommended best practice is to identify the resource by means of a string conforming to a formal identification system.IDENTIFIER

Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646].LANGUAGE

Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types[MIME].

FORMAT

Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].

DATE

Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights.RIGHTS

Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.PUBLISHER

RESOURCE

CONTRIBUTOR Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity

The 15 DC ELEMENTS

http://dublincore.org/documents/usageguide/elements.shtml http://dublincore.org/documents/dcmi-terms/

Page 21: 06 gioca-ontologies

A.A. 2010-2011 21Information Technology and Arts Organizations

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/metadata/dublin_core#">

<rdf:Description rdf:about=" http://www.dlib.org"><dc:Title >D-Lib Program</dc:Title ><dc:Description >The D-Lib program supports the community of peoplewith research interests in digital libraries andelectronic publishing.

</dc:Description ><dc:Publisher >Corporation For National Research Initiatives

</dc:Publisher ><dc:Date >1995-01-07</dc:Date ><dc:Subject >Research; statistical methods</dc:Subject ><dc:Type >World Wide Web Home Page</dc:Type ><dc:Format >text/html</dc:Format ><dc:Language >en</dc:Language >

</rdf:Description ></rdf:RDF >

AN EXAMPLE OF DUBLIN CODE IN RDF

Page 22: 06 gioca-ontologies

A.A. 2010-2011 22Information Technology and Arts Organizations

Considerations

• The Dublin Core is a standard useful in the mapping phasein different domains

• It can not be applied to model complex domains

• It is possible to add attributes called qualifiers to improve the quality and the detail of the information– “Dumb-Down Principle” : every application can ignore the qualifiers for which it has not an interpretation

• This data model describes each single resource

• Relationships among resources are not well specified

• DC namespace can be used in RDF/XML statements

Page 23: 06 gioca-ontologies

A.A. 2010-2011 23Information Technology and Arts Organizations

RDF Vocabulary Description Language -- RDF Schema (RDFS)

• It extends RDF to include basic features needed to define ontologies

– Everything is called resource (subject, predicate, object)

– The predicate is also called property

• It allows to give a meaning to “special” resources

• It introduces the concept of Class

• The rdfs:Class resource is the class of all the RDF classes

Tree of Porphyry

The technique of inheritance is the process of merging the differentiae along the path above any category: Living is defined as animate material Substance, and Human is rational sensitive animate material Substance.

Page 24: 06 gioca-ontologies

A.A. 2010-2011 24Information Technology and Arts Organizations

rdfs:Class

WorkOfArt Artist

RDFS graph example

rdfs:subClassOf

Classes

http://en.wikipedia.org/wiki/David_%28Donatello%29 http://en.wikipedia.org/wiki/Donatello

http://en.wikipedia.org/wiki/Michelangelohttp://en.wikipedia.org/wiki/David_%28Michelangelo%29

rdf:type

rdfs:subClassOf

rdf:type rdf:type

rdf:type

Instances

dc:creator

dc:creator

URI?

Page 25: 06 gioca-ontologies

A.A. 2010-2011 25Information Technology and Arts Organizations

OWL (Ontology Web Language)

• It is integrated with RDF � OWL is directly accessible to web applications

• It allows to create a knowledge base about a domain of interest in terms of:– individuals: are the basic elements of the domain, e.g. Donatello

– concepts (classes): describe sets of individuals having similar characteristics , e.g. Artist

– roles (properties): describe relationships between pairs of individuals, e.g. dc:creator

• RDFS allows to model:– Hierarchy of classes and properties

– Domain and range of properties

• OWL extends RFDS in terms of:– Logical operation on classes

– Additional property characteristics: Transitive, Symmetric, Functional

Page 26: 06 gioca-ontologies

A.A. 2010-2011 26Information Technology and Arts Organizations

OWL Properties examples

Figures from:A Practical Guide To Building OWL Ontologies Using P rotégé 4 and CO-ODE Tools, Edition 1.2, Matthew Horr idgeThe University Of Manchester, Copyright @ The Unive rsity Of Manchester, March 13, 2009

Page 27: 06 gioca-ontologies

A.A. 2010-2011 27Information Technology and Arts Organizations

CIDOC-CRM

• Semantic interoperability in culture can be achieved by an “extensible ontology” and explicit event modeling, that provides shared explanation rather than prescription of a common data structure

• “The intended scope of the CIDOC CRM may be defined as all information required for the scientific documentation of cultural heritage collections, with a view to enabling wide area information exchange and integration of heterogeneous sources”

• “The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.”

• The CIDOC-CRM models actors, events, objects in space and time

The ontology is a language that IT experts and cultural experts can share

Page 28: 06 gioca-ontologies

A.A. 2010-2011 28Information Technology and Arts Organizations

Type: TextTitle: Protocol of Proceedings of Crimea ConferenceTitle.Subtitle: II. Declaration of Liberated EuropeDate: February 11, 1945Creator: The Premier of the Union of Soviet Socialist Republics

The Prime Minister of the United KingdomThe President of the United States of America

Publisher: State DepartmentSubject: Postwar division of Europe and Japan

“ The following declaration has been approved:The Premier of the Union of Soviet Socialist Republics,the Prime Minister of the United Kingdom and the President of the United States of America have consulted with each other in the common interests of the people of their countries and those of liberated Europe. They jointly declare their mutual agreement to concert…and to ensure that Germany will never again be able to disturb the peace of the world……“

Document

Metadata (Dublin Core)

About…

From DC to CIDOC-CRM

Page 29: 06 gioca-ontologies

A.A. 2010-2011 29Information Technology and Arts Organizations

P14 p

erform

ed

P11 participated in

P94 has created

E31 Document“Yalta Agreement”

E7 Activity

“Crimea Conference”

E65 Creation Event*

E38 Image

P86 falls within

P7 took place

at

P67 is referred to

E52 Time-SpanFebruary 1945

P81 ongoing throughout

P82 at some time within

E39 Actor

E39 Actor

E39 Actor

E53 Place

E52 Time-Span11-2-1945

CIDOC-CRM example from Martin Doerr, Steve Stead “The CIDOC CRM, a Standard for the Integration of Cultural Information