Metadata: an introduction Michael Day UKOLN, University of Bath [email protected] Managing Networks:...

32
Metadata: an introduction Michael Day UKOLN, University of Bath [email protected] Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001

Transcript of Metadata: an introduction Michael Day UKOLN, University of Bath [email protected] Managing Networks:...

Page 1: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Metadata: an introduction

Michael Day

UKOLN, University of [email protected]

Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001

Page 2: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Presentation overview

• Defining “metadata”

• Dublin Core:

– Background

– Exercise 1

– Semantics

– Syntax

– Content Rules

– Exercise 2

Page 3: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata (1)

Some definitions:– “data about data”– “Internet-age term for structured data about

data” - Joint NSF-EU Working Group on Metadata (1998)

– “... Machine understandable information about web resources or other things” - Berners-Lee (W3C)

Functional definition:– structured data about resources that can be

used to help support a wide range of operations

Page 4: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata (2)

These operations may include:

• resource discovery and access

• rights management

• e-commerce

• authentication

• collection management

• preservation

Page 5: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata (3)

Resource discovery metadata:• Provides support for:

– searching– location – retrieval (delivery)– description

• May help enable:– Semantic interoperability

Page 6: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata (4)

Where is metadata stored?:• Different models of metadata-resource

association:– embedded within resource– tightly coupled using protocols or

identifiers– separate database(s)

Page 7: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (1)

Diversity of metadata formats and frameworks

• How many have you heard of?

Page 8: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (1)

Diversity of metadata formats and frameworks, e.g.:

• Dublin Core• EAD, CIMI, TEI • PICS, RDF• MARC• GILS, FGDC• ROADS

http://www.ukoln.ac.uk/metadata/glossary/

Page 9: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (2)

SCHEMAS Forum project “Metadata Watch” has already identified:

• Over 200 implementation activities

• Around 90 standardisation activities

• Very different levels of information about the various initiatives

Page 10: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (3)

USMARC:

245 00 Wordnews online $h [computer file].

246 3 World news online

256 Computer online service.

260 Washington, D.C. : $b Worldnews Online, $c [1995-

538 Mode of access: Internet.

500 Title from title frame.

520 “WorldNews OnLine is a service ... “

650 0 Newspapers $x Databases.

856 7 $u http://worldnews.net $2 http

Page 11: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (4)

TEI header:

<teiHeader type="aacr2"><fileDesc><titleStmt>

<title type="245">Rubaiyat of Omar Khayyam : the astronomer poet of Persia / rendered into English verse by Edward Fitzgerald ; with drawings by Florence Lundborg</title>

<title type="gmd">[electronic resource]</title>

<author>Omar Khayyam</author>

<respStmt>

<resp>Conversion to TEI.2-conformant markup:</resp>

<name>University of Virginia Library Electronic Text Center </name>

</respStmt>

Page 12: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata formats (5)

ROADS/IAFA template:

Template-Type: SERVICE

Handle: 871473886-23884

Title: Wellcome Unit for the History of Medicine

URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html

Admin-Email-v1: [email protected]

Publisher-Name-v1: Wellcome Unit for the History of Medicine

Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE

Publisher-City-v1: Oxford

Page 13: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

A metadata typology

Simple Rich

Based on: Dempsey and Heery (1998)

Band One Band Two Band Three

(full textindexes)

(simplestructuredgenericformats)

(more complexstructure,domainspecific)

(part of largersemanticframework)

Proprietaryformats

ProprietaryformatsDublin CoreROADSIAFA/Whois++templates

FGDCMARC

TEI headersICPSREADCIMI

Page 14: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Who creates metadata?

Resource creators• authors• webmasters• institutions

Service providers• search services• third parties• commercial publishers

• hand crafted

• robot/database generated

Page 15: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Metadata creation tools

DC-dot:http://www.ukoln.ac.uk/metadata/dcdot/

Nordic Metadata Project Metadata Template:http://www.lub.lu.se/cgi-bin/nmdc.pl

Reggie Metadata Editor:

http://metadata.net/dstc/

Page 16: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Aspects of metadata

Syntax• related to the technical implementation

- e.g. MARC, XML

Semantics• the basic meaning of elements

Rules for content• e.g., cataloguing rules

Page 17: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

The Dublin Core Metadata Element Set

Page 18: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Dublin Core (1)

What is it?• 15 element metadata set• based on international consensus• Some initial assumptions:

– simple set for untrained creators– basic set for semantic interoperability or

resource discovery– primarily for Web-based document-like

objects

http://www.dublincore.org/

Page 19: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Dublin Core (2)

Dublin Core Metadata Initiative• Workshop series

– first workshop hosted by OCLC in Dublin, Ohio (1995)

– 9th workshop (DC2001) will be held in October (Tokyo)

• Working Groups– for DC issues (e.g. Architecture, Registry,

Standards, tools, etc.)– for specific user communities (e.g. Libraries,

Education, Government, etc.)– open e-mail discussion lists

Page 20: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Dublin Core (3)

Dublin Core Metadata Element Set:• Version 1.0 (RFC 2413, 1998)• Version 1.1 (1999)

– approved (Z39.85) by the US National Information Standards Organization (NISO) as a Draft American National Standard (July 2001)

Dublin Core Qualifiers:• DCMI Recommendation (2000)

Page 21: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC exercise 1

The Dublin Core Metadata Element Set consists of 15 elements, designed for simple resource discovery.

What elements do you think should be part of such a metadata element set?

• Think about the type of resources that need to be described:

– Web pages

– Document-like objects

– Images, sound resources, etc.

– Multimedia resources

Page 22: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Dublin Core semantics

Page 23: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC semantics (1)

• Title • Subject • Description • Creator • Publisher • Contributor • Date • Type

• Format • Identifier • Source • Language • Relation• Coverage • Rights

15 element core metadata set:

Page 24: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC semantics (2)

An example:– Name: Description– Identifier: Description– Definition: An account of the content of the

resource.– Comment: Description may include but is

not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

Page 25: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC semantics (3)

Qualifiers:• DC semantics are defined very broadly• Possible to add qualifiers to some

elements:– Element refinement(s):

– Relation.IsPartOf

– Date.Created

– Encoding scheme(s):– Subject (scheme=DDC)

– Date (scheme=ISO8601)

Page 26: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

DC syntax

Page 27: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC syntax (1)

Can be embedded into HTML Web pages:• <META> tag• limited functionality• the data can be “harvested” by

metadata-aware search engines (but not many do this)

• note that this is just one way of implementing the DC element set

Page 28: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC syntax (2)An example of embedding DC metadata in HTML 4.0:

<html><head>

<title>UKOLN Home Page</title>

<meta name="DC.Title" content="UKOLN">

<meta name="DC.Description" content="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services">

<meta name="DC.Creator" content="UKOLN Information Services Group">

</head>

Page 29: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

DC content rules

Page 30: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC content rules

Not part of DCMI:• No content rules (cataloguing rules)

defined as part of Dublin Core Metadata Element Set

May be important where there are expectations of consistent cross-searching across related services, e.g.:

• ROADS Cataloguing Guidelines• Resource Discovery Network (RDN)

Cataloguing Guidelines

Page 31: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

DC exercise 2

Go to the Nordic Metadata Template at:

http://www.lub.lu.se/cgi-bin/nmdc.pl

And try to create some metadata for a Web page that you know reasonably well

• Reflect on:– Which bits are difficult to fill in

– Which parts relate to semantics, which to content rules (e.g. inverted forms of names)

Page 32: Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September.

Managing Networks, Birmingham, 13 September 2001

Acknowledgements

UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

http://www.ukoln.ac.uk/