Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata...

60
Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative

Transcript of Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata...

Page 1: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Dublin Core Metadata Initiative

Stuart Weibel

OCLC Office of Research

Director,Dublin Core Metadata Initiative

Page 2: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

2

Presentation Outline

• Introduction to Metadata• Dublin Core Metadata Initiative• Metadata Registries• Syntax Alternatives for Web Metadata• A Few Strategic Applications

Page 3: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Introduction to Metadata

Page 4: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

4

The Web as an Information System

• Search systems are motivated by business models, not user needs

• Index coverage is unpredictable and limited• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile• Archiving is presently unsolved• Authority and quality of service are spotty• Managing intellectual property rights is hard

Page 5: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

5

Metadata: Part of a Solution

• Structured data about data– Organization and management of content– Support discovery– Direct content in channels– Enable automated discovery/manipulation

Page 6: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

6

Internet Commons includes Multiple

Communities

ScientificData

HomePages Geo

InternetCommons

Library

Museums

Commerce

Whatever...

Page 7: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

7

Interoperabilityrequires conventions

about:

• Semantics– The meaning of the elements

• Structure– human-readable– machine-parseable

• Syntax– grammars to convey

semantics and structure

Page 8: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

8

Haven’t we done metadata already?

The MARC family of standards is the single most successful

resource description standard in the world

Page 9: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

9

What’s wrong with this model on the Web?

• Expensive– Complex – Professional catalogers required

• Bias towards bibliographic artifacts– Fixed resources– Incomplete handling of resource evolution

and other resource relationships

• Anglo-centric– MARC 21 accounts for ¾ of MARC records,

but there are other varieties

Page 10: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Dublin Core Metadata Initiative

Page 11: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

11

History of the Dublin Core

• 1994: Simple tags to describe Web pages• 1995: The Dublin Core is one of many

vocabularies needed ("Warwick Framework")• 1996: The Dublin Core: 13 elements

expanded to 15 - appropriate for Text and Images

• 1997: WF needs formal expression in a Resource Description Framework (RDF)

• 2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core

Page 12: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

12

Dublin Core Metadata Initiative

• The mission of DCMI is to make it easier to find resources using the Internet through the following activities:– Developing metadata standards for

discovery across domains (example: the Dublin Core)

– Defining frameworks for the interoperation of metadata sets

– Facilitating the development of community or disciplinary specific metadata sets

Page 13: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

13

DCMI Organizational Structure

Liaison

Usage Board

Standards Development WGs

Infrastructure WGs

User Support and Education

WGs

AdvisoryBoard

DCMISubscribers DCMI

ActivityAreas

Board of Trustees

Executive Director

Managing DirectorDirectorate

Page 14: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

14

DCMI Activities

• Standards development and maintenance

• Metadata registry and infrastructure• Technical working groups and periodic

workshops• Tutorial materials and user guides• Education and training• Open source software• Liaisons with other standards or user

communities

Page 15: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

15

Unqualified Dublin Core is the Pidgin metadata

language

• Metadata is language• Dublin Core is a small and simple

language -- a pidgin -- for finding resources across domains using the internet.

• Speakers of different languages naturally "pidginize" to communicate

Page 16: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

16

Qualifiers and Domain-specific Extensions

• The Dublin Core architecture supports more sophisticated metadata solutions through the addition of:– Qualifiers – Domain-specific extensions – Application Profiles of involving mixed

namespaces (more on this later)

• Increased sophistication comes at the cost of some degree of interoperability

Page 17: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

17

Varieties of Qualifiers:Value Encoding Schemes

• Says that the value is– a term from a controlled vocabulary

(e.g., Library of Congress Subject Headings)

– a string formatted in a standard way (e.g., "2001-05-02" means May 2, not February 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Page 18: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

18

Varieties of qualifiers:Element Refinements

• Make the meaning of an element narrower or more specific.– a Date Created versus a Date

Modified– an IsReplacedBy Relation versus a

Replaces Relation• If your software does not understand

the qualifier, you can safely ignore it.

Page 19: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

19

A Grammar of Dublin Core

• http://www.dlib.org/dlib/october00/baker/10baker.html

• By design not as subtle as mother tongues, but easy to learn and useful in practice

• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)

• Simple grammars: sentences (statements) follow a simple fixed pattern...

Page 20: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

Page 21: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Page 22: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

22

Dumb-Down Principle for Qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Qualifiers refine meaning (but may be harder to understand)

• Nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

Page 23: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

23

Using DC with other vocabularies

• Specialized application profiles may need to:– Use general-purpose Dublin Core

elements– Use elements from another, more

domain-specific standard– Narrow standard definitions of DC

elements for specific local uses– Invent local elements outside the

scope of existing standards

Page 24: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

24

What is an Application Profile?

• A metadata schema incorporating a set of elements from one or more metadata element sets

• A set of policies defining how the elements should be applied to the domain of the application

• A set of guidelines that make the policies concerning elements explicit

Page 25: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

26

xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:co="http://purl.org/rss/1.0/modules/company/"

<dc:publisher>The O'Reilly Network</dc:publisher><dc:creator>Rael Dornfest</dc:creator> <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights><dc:date>2000-01-01T12:00+00:00</dc:date> <dc:description> XML is placing increasingly heavy loads on the existing technical infrastructure of the Internet. </dc:description>

<co:name>XML.com</co:name> <co:market>NASDAQ</co:market><co:symbol>XML</co:symbol>

Multiple Namespace Fragment

Page 26: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

28

Namespaces and Translation

• Dublin Core has been translated into 26 languages– machine-readable tokens are shared by

all– human-readable labels are defined in

different languages– translations are distributed, maintained

in many countries– eventually linked in DCMI registry

Page 27: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

29

One concept identifier –

with labels in many languages

dc:creator“Verfasser”rdfs:label

[German]

“Pencipta”

rdfs:label

[Indonesian]

“Creator”rdfs:label

[English]

Page 28: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Metadata Registries:

Dictionaries of Metadata terms and Usage

Page 29: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

31

Metadata is language

• Metadata schemas are languages for making statements about resources:– Book has Title "Gone with the Wind".– Web page has Publisher "Springer Verlag".

• Vocabulary terms (elements) are defined in standards like Dublin Core

• Metadata grammars constrain the statements and data models one can form

Page 30: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

32

Metadata languages are Multilingual

• Metadata is not a spoken language• The words of metadata -- "elements" --

are symbols that stand for concepts expressible in multiple natural languages

• Standards may have dozens of translations

• Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?

Page 31: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

33

Languages Evolve With Use

• Inevitably, languages resist stability• People stretch official definitions• Implementers misunderstand the

intended meaning or use of elements • Implementors coin local terms and

extensions• If the application does not fit the

standard, the standard is often "customized" to fit the application

Page 32: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

34

How do we manage this evolution?

• How can we monitor the usage of a language that is:– Never spoken?– Rarely published in a way that can be

harvested?

• How can dictionary editors help a metadata language evolve and grow in response to usage?

• How can this evolution occur across (human) languages?

Page 33: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

35

RDF Schemas (RDFS) -- W3C standard

• A dictionary format for metadata terms:– Simple XML format for namespaces, terms and

definitions

• Example: "Title" (Dublin Core)– Human-readable label and definition:

• Title: A name given to the resource.– Unique, machine-readable identifiers

• dc:title

• Support for cross-references– Between multiple language renditions of a namespace– between terms in related standards– between local adaptations and related standards

Page 34: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

36

Registries can function as dictionaries

• Metadata dictionaries can help metadata vocabularies evolve more like other human languages– Not just top-down, like traditional standards– Also bottom-up, in response to usage

Page 35: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

37

DCMI –Metadata Registry

• Stores official metadata element definitions in a central database or repository

• Managing a namespace (as a standards agency): publish qualifiers as available, with version control– Managing translations of the standard in multiple

languages

• Eventually:– User guide interface– Support for standardisation processes (peer review)– Downloadable input to software tools for generating,

editing, validating DC metadata

Page 36: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

38

Dictionaries as a tool for harmonization

• Knowledge of how other projects are using standards will avoid "reinventing the wheel"

• To help information providers harmonize their schemas for improved access within domains:– Between countries (Nordic Metadata Project)– Preprint repositories (Open Archives Initiative)– Subject gateways (Renardus)– Theses and dissertations (NDLTD)– Mathematics and physics (MathNet, PhysNet)

Page 37: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

39

A global registry infrastructure?

• RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web

• Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries

• Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?

Page 38: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

40

EOR -- an RDF Toolkitfor Schema Infrastructure

• Harvests RDF Schemas– Schemas distributed on multiple Web

servers– Creates huge database of schemas for

searching– Web interface functions as a "metadata

browser"– Click on cross-references between linked

terms

• Downloadable as open source software– http://eor.dublincore.org/

Page 39: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

41

EOR Toolkit

• Integrate RDF components for supporting search services, topic-maps, site-maps, annotation environments and semantic metadata registries

• Base-level functionality of this toolkit includes:– Creation, deletion, and management of RDF databases. – Ability to infuse RDF instance data into RDF databases. – Ability to search RDF databases. – Generic interface design capabilities to support RDF

applications. – Web interface functions as a "metadata browser„

• Open Source: http://eor.dublincore.org

             

Page 40: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

Syntax Alternatives for Web Metadata

Page 41: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

43

Syntax Alternatives:HTML

• Advantages:– Simple Mechanism – META tags embedded

in content– Widely deployed infrastructure (the Web)– Public domain tools

• Disadvantages– Limited structural richness (won’t easily

support hierarchical,tree-structured data or entity distinctions ).

Page 42: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

44

Syntax Alternatives:XML

• The standard for networked text and data

• Wide-spread tool support– Parsers (DOM and SAX)– Extensibility (namespaces) – Type definition (XML Schema)– Transformation and Rendering (XSLT)– Rich linking semantics (XLINK)

Page 43: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

45

XML DTDs

• Works, but…• DTDs are a stopgap measure

– Extensibility is problematic– Many ways to ‘say’ the same thing (too much

flexibility)– Interoperability must be pre-coordinated– DTDs cannot evolve gracefully– Granularity is at the level of the DTD

Page 44: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

46

XML Schemas

• Rich XML-based language for expressing type semantics

• Replaces arcane and limited DTD (origin in SGML)

• Facilities– Data typing (both complex and primitive)– Constraints– Defaults

Page 45: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

47

Syntax Alternatives:RDF

• RDF (Resource Description Format)• The instantiation of the Warwick

Framework on the Web• Rich data model supporting notions of

distinct entities and properties• Syntax expressed in XML• Granularity is at the level of the

element, not the entire schema as with XML DTDs

Page 46: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

48

RDF Components

• RDF Model and Syntax WG– Formal data model– Syntax for interchange of data

• RDF Schema (RDFS)– Type system (schema model)

Page 47: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

49

RDF Schemas

• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on

corresponding values

• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system

designed for XML Schema WG

• Expressible in the RDF model and syntax

Page 48: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

50

RDF:In Summary

• RDF Metadata transmission– Embedded (e.g. <META>), Transmitted with

resource (HTTP), or from a trusted 3rd Party

• RDF Data Model – Support consistent encoding, exchange and

processing of metadata… critical when aggregating data from multiple sources

• RDF Schema– Declare, define, reuse vocabularies

Page 49: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

51

Unresolved Issues Concerning RDF and XML Schemas

• RDF Schemas and XML Schemas have overlapping functionality– XML Schemas provide strong data typing, but

also supports semantic specifications– RDF is focused on semantic data model and

extensible namespace management

• Resolution of overlap and market acceptance will determine the future of each

• Semantic Web Activity in the W3C Chartered to address such issues: http://www.w3.org/2001/sw

Page 50: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

A Few Strategic Projects

Page 51: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

53

Open Archives Initiative http://www.openarchives.org

• Protocols to support alternative scholarly publishing solutions:

• Federated repositories for:– ePrints– Libraries– Publishers

• OAI archives may contain full text or surrogates (metadata)

• Metadata harvesting protocols

Page 52: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

54

OAI archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle.

However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.

OAI Metadata

Page 53: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

55

• Adoption of unqualified Dublin Core Element Set as required metadata.

• Support for parallel metadata sets maintained– EPMS (e-print community)– Others

• Research library community• Museum community

OAI Metadata Solutions

Page 54: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

56

Renardus Project (EU)

• http://www.konbib.nl/coop/reynard– National libraries (Netherlands

coordinates)– NDR: National Digital Resource in UK– Die Deutsche Bibliothek

• Goal: integrated access to subject gateways in Europe

• High-level agreement on simple, Dublin-Core-based schema as common denominator

Page 55: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

57

Networked Digital Library of Theses and Dissertations

(NDLTD)

• http://www.ndltd.org• International consortium of projects

putting dissertations online• NDLTD agreement on a small Dublin-

Core-based set of metadata elements with extensions to support application-specific needs

• http://www.ndltd.org/standards/metadata/current.html

Page 56: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

58

PRISMPublishing Requirements for Industry

Standard Metadata

• PRISM XML metadata standard for syndicating, aggregating, post-processing and multi-purposing content from magazines, news, catalogs, books and mainstream journals.

• Uses DC and its relation types as the foundation for its metadata

• Adobe, Time, Inc, Getty Images, Conde Nast, Sotheby’s, Interwoven….

• http://www.prismstandard.org

Page 57: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

59

Rich Site Summary (RSS)

http:/purl.org/RSS

• Metadata for content syndication (news feeds)

• Used in developing media content portals

• Built on established vocabularies (DC), using RDF syntax

• Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.

Page 58: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

60

For further information....

• "Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org– Critical overview (with expert commentary) on

the metadata landscape as it evolves– Related database of individual activity reports

• D-Lib Magazine, http://www.dlib.org/dlib/• Ariadne, http://ariadne.ac.uk • DCMI Homepage, http://dublincore.org

Page 59: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

61

DC-2001

• DC-2001 in Tokyo– October 22-26, 2001

• Three tracks:– Technical working group meetings– Implementation reports and research

papers– General introduction and tutorials for non-

experts

Page 60: Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative.

62

How to Participate

• Join the DC-General

mailing list• Join a working

group• Create a

working group

•Information on lists and working groups is available at http://dublincore.org