IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...

Post on 16-Dec-2015

214 views 1 download

Transcript of IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...

IFLA/DELOS/NSF WorkshopStandards and Metadata

EVA 2000 MoscowNovember 2, 2000

Thomas Baker GMDCarl Lagoze Cornell Univ.

EVA 2000 Introductions

• Thomas Baker– GMD Library, Bonn, Germany– Dublin Core Executive Committee– EU DELOS Network of Excellence

• Carl Lagoze– Digital Library Research Group, Faculty of

Computing and Information, Cornell University, Ithaca, NY, USA

– Dublin Core Advisory Committee – NSF Digital Library Initiative

EVA 2000 Workshop Roadmap

• Introduction to Metadata (30 min.)• Dublin Core Metadata Initiative (60 min.)

Break• Simplicity and Complexity (45 min.)• Metadata Infrastructure (45 min.)

Lunch• Deploying and Using Metadata (90 min.)• Metadata Landscape (30 min.)

Introduction to Metadata

EVA 2000 Moscow

EVA 2000

Haven’t we done metadata already?

EVA 2000

What’s wrong with this model?

• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community

of expertise)

• Monolithic– One size fits all approach– Reflects its centralized system origins

• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other

resource relationships

EVA 2000

Internet Commons includes Multiple Communities

ScientificData

HomePages Geo

InternetCommons

Library

Museums

Commerce

Whatever...

EVA 2000

Web Challenge to Traditional Cataloging

• Scale

• Permanence

• Authenticity

• Organizational Context

• Variety

EVA 2000

State of the Web as an Information System

• Search systems are motivated by advertising• Index coverage is unpredictable and limited (1/3)• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile• What about versions, editions, back issues?• Archiving is presently unsolved• Authority and quality of service are spotty• Managing Intellectual Property Rights is hard

EVA 2000

Metadata: Part of a Solution

• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation

• Variety across various dimension:– specialization– decentralization– democratization

EVA 2000

Metadata Takes Many Forms

resourcediscovery

documentadministration

rightsmanagement

contentrating

security andauthentication

archivalstatus

products andservices

databaseschemas

process controlor description

EVA 2000 Metadata Challenges

• Accommodate multiple varieties of metadata

• Tension: functionality and simplicity • Tension: extensibility and

interoperability• Human and machine creation and

use• Community-specific functionality,

creation, administration, access

EVA 2000

Warwick Framework: Containing Chaos

• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)

• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata

• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive

element sets– focuses interoperability issues at package level

EVA 2000

Modularization Allows Distributed Management

• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution

EVA 2000

Interoperabilityrequires conventions

about:• Semantics– The meaning of the

elements

• Structure– human-readable– machine-parseable

• Syntax– grammars to convey

semantics and structure

Dublin Core Metadata Initiative

EVA 2000 Moscow

EVA 2000

History of the Dublin Core• 1994: "Do we have a simple set of tags for

ordinary people to describe their Web pages?"

• 1995: The Dublin Core: 13 elements, later 15• 1996: The Dublin Core is but one of many

vocabularies needed ("Warwick Framework")• 1997: "WF needs formal expression in a

Resource Description Framework (RDF)"• 2000: Dublin Core Metadata Initiative

recommends qualifiers, broadens its organizational scope beyond the Core

EVA 2000 A pidgin for digital tourists

• Metadata is language.• Dublin Core is a small and simple language -- a

pidgin -- for finding resources across domains.• Speakers of different languages naturally

"pidginize" to communicate– E.g., tourists using simple phrases to order beer

("zwei Bier bitte" "dva pivo" "biru o san bai"...)

• We are all "tourists" on the global Internet.

EVA 2000 A grammar of Dublin Core

• http://www.dlib.org/dlib/october00/baker/10baker.html

• By design not as subtle as mother tongues, but easy to learn and extremely useful in practice

• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)

• Simple grammars: sentences (statements) follow a simple fixed pattern...

EVA 2000

Example Dublin Core statements

• Resource has Title 'Grammar of Dublin Core'.

• Resource has Creator 'Tom Baker'.• Resource has Subject 'Metadata'.• Resource has Relation

http://foo.org/file.htm.

EVA 2000

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

EVA 2000

The fifteen special nouns (properties)

Creator Title Subject

Contributor Date Description

Publisher Type Format

Coverage Rights Relation

Source Language I dentifier

EVA 2000

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

EVA 2000

Dumb-Down Principle for qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Like saying that nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

EVA 2000

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

To test whether qualifiers are "good", cover them with your hand and ask:-- Does the statement still make sense?-- Is it still correct?

EVA 2000Element Refinements

• Make the meaning of an element narrower or more specific.– a Date Created versus a Date

Modified– an IsReplacedBy Relation versus a

Replaces Relation• If your software does not understand

the qualifier, you can safely ignore it.

EVA 2000

Value Encoding Schemes• Says that the value is

– a term from a controlled vocabulary (e.g., Library of Congress Subject Headings)

– a string formatted in a standard way (e.g., "2000-05-03" means May 3, not March 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

EVA 2000

Peer review of proposals for new terms

• DCMI Usage Committee reviews proposals for new qualifiers (and perhaps elements)

• Evaluates proposals in light of grammatical principles (are the qualifiers ignorable?)

• Tiered model of approval status (tentative): proposed, conforming, recommended, obsolete

• First qualifiers "recommended" in July 2000• http://purl.org/DC/documents/rec/dcmes-qualifiers-

20000711.htm

EVA 2000

Open questions in Dublin Core

• What are "appropriate values" for the fifteen properties? How can they be used for cross-domain searching?

• How can DCMI control the evolution of Dublin Core as it is adapted in practice?

• How can an application use DC as a pidgin while describing resources with more complex metadata?

• Can we keep the Core simple?

EVA 2000

Search buckets versus description

• Think of DC elements as fuzzy search buckets– Different types of data appropriate for different

buckets: URLs, date strings, word strings, names– Separate books about Sigmund Freud versus

books by Sigmund Freud into different buckets

• Search bucket: for discovering resources• But general, fuzzy categories may not be

sufficient for describing resources– After searching, display more detailed

descriptions on screen

EVA 2000

DCMI broadens its mission (Oct 2000)

• The mission of the DCMI is to make it easier to find resources using the Internet through the following activities:– Developing metadata standards for

discovery across domains (example: the Dublin Core)

– Defining frameworks for the interoperation of metadata sets

– Facilitating the development of community or disciplinary specific metadata sets that are consistent with items 1 and 2

EVA 2000

A context for the Core

• If "the Dublin Core" is the core of DCMI, what is the surrounding context?

• If "the Dublin Core" is the simple pidgin, what is the broader landscape of metadata language?

• How do pidgins relate to more complex models or "application profiles"?

• Do we need pidgins for describing other things, such as "people" and "events"?

EVA 2000

Using DC with other vocabularies

• Specialized application profiles [government

information, education, mathematics] may need to:– Use general-purpose Dublin Core

elements– Use elements from another, more

domain-specific standard– Narrow standard definitions of DC

elements for specific local uses– Invent local elements outside the

scope of existing standards

EVA 2000

Example: adapting DC:Title to local uses

• As defined in the official Dublin Core "namespace":– "Title: A name given to the resource"

• As defined in a UK "application profile":– "Title: A name given to the collection"

•Definition is narrower

EVA 2000

Namespaces in translation

• Dublin Core has been translated into 26 languages– machine-readable tokens are

shared by all– human-readable labels are defined

in different languages– translations are distributed,

maintained in many countries

EVA 2000

One token - labels in many languages

dc:creator“Verfasser”rdfs:label

“Creator”rdfs:label

“Pencipta”

rdfs:label

[Server inGermany]

[Server inJakarta]

[DCMI Server]

EVA 2000

RDF -- a more powerful sentence pattern

• Dublin Core statements:– Resource has Creator "Tom Baker".– Resource has Identifier http://foo.org/bar.html.

• Resource Description Framework "triples" - a more powerful way to say the same thing:– http://foo.org/bar.htm has Creator "Tom Baker".

EVA 2000 DCMI Re-organization

• Expanded mission– Core metadata elements for Agents (or Events)?– Frameworks for integrating multiple standards

• Re-organization model– Membership organization like W3C or Unicode

Consortium?– Retain open consensus model– International perspective– Better training, documentation, outreach

EVA 2000

DCMI Open Metadata Registry

• Managing vocabularies defined by the DCMI– Languages– Versioning– Controlled vocabularies

• Foundation for modular, incremental integration and evolution

• Collaboration with European SCHEMAS Project and ULIS in Tsukuba, Japan

• http://wip.dublincore.org/registry/

EVA 2000

Official recognition of the Dublin Core

• CEN Workshop Agreement– endorse Dublin Core elements as

CWA13874– provide usage guidelines for European

industry

• NISO Z39.85– National Information Standards

Organization, an ANSI affiliate– Balloting concluded in August 2000

EVA 2000

DCMI Activities

• Standards development and maintenance

• Metadata registry• Technical working groups and periodic

workshops• Tutorial materials and user guides• Education and training• Access to software• Liaisons with other standards or user

communities

EVA 2000

DC-9 Workshop in Tokyo, 2001

• DC-8 Workshop was a National Library of Canada (Ottawa)– emphasis on application profiles, longer-

term organizational mission, and domain-specific adaptations of Dublin Core

• DC-9 in Tokyo: well-defined tracks– implementation reports and research

papers– ongoing technical working group meetings– general introduction and tutorials for non-

experts

Simplicity and Complexity

EVA 2000 Moscow

EVA 2000 Warwick Framework

• Container/Package approach to metadata

• Rejection of universal ontology• Recognition of individual community

needs• Provide scope for metadata efforts

EVA 2000

Warwick Framework Design

Containers for aggregating Packages of typed metadata sets

Container

PackageMARC Metadata

PackageIndirect Reference

PackageTerms and Conditions

URI

PackageDublin Core

EVA 2000

Warwick FrameworkImplementation and

Research• Packaging, linking, storing, and

transmitting component/package framework

• Semantic interactions and interoperability among multiple metadata packages/vocabularies

EVA 2000

Interoperability among Metadata Vocabularies

abc coreclasses

DublinCore

MARC

INDECSIMS

EVA 2000 Harmony Project

• Project Investigators– Dan Brickley - ILRT, Bristol (U.K.)– Jane Hunter - DSTC, Brisbane (Australia)– Carl Lagoze - Computer Science, Cornell

(U.S.)

• More Information– http://www.ilrt.bris.ac.uk/discovery/

harmony/

EVA 2000

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

EVA 2000

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Hamlet has a creator Shakespeare

EVA 2000

…because of their failure to model entity distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

EVA 2000

Applying a Model-Centric Approach

• Formally define common entities and relationships underlying multiple metadata vocabularies

• Describe them (and their inter-relationships) in a simple logical model

• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

EVA 2000

Applications of the ABC Model

• Guidance for communities developing vocabularies

• Foundation for understanding existing vocabularies

• Basis for mappings among vocabularies using formalisms such as RDF

EVA 2000 Harmony/ABC Workshop

• January 27-28 2000 CNI Washington • Representatives from

– Dublin Core, INDECS, MPEG-7, IFLA– Archives, Museums, Libraries, Audiovisual

• Result: Importance of processes, events, and states in understanding and describing resources

EVA 2000

Conceptual Basis:Evolution of Content over

TimeIFLA Entity Model

From Bearman, et. al., D-Lib Magazine, January 1999.

EVA 2000

Events help metadata relationships?

• Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model)

• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

• Clarifying attachment points facilitates mapping across common entities in different vocabularies.

EVA 2000

desc1

Content, Events, & Descriptions

desc2

R1 R2 R3

R4

E2 E3E1

E4

EVA 2000 ABC Event Model

EVA 2000

A Simple Example:Live At Lincoln Performance

• Performance at The Lincoln Center for the Performing Arts

• On April 7, 1998 at 8pm Eastern time• Orchestra is New York Philharmonic• Musical score – “Concerto for Violin”• 130 minute MP3 audio recording • Rights held by Lincoln Center

EVA 2000 Example in ABC Model

EVA 2000 Derivation of Multiple Views

CIDOC CRM Model

ABC Description

in XML

ID3 tags embedded in MP3

MPEG-7 description in DDL

Dublin Core in XML/RDF

EVA 2000

Step 1 – Structural Mapping

Event-aware model

Resource-centric model

EVA 2000 Structural Mapping RulesEvent attributes transferred to output:• Context/Date, /Time, /Place ->

Date.Performance, Time.Performance, Place.Performance

• Act/Role -> Agent.Role e.g. Orchestra • Event Type -> Relation between input & ouput e.g. Performance ->Relation.isPerformanceOf• Output Description generated from event

Type and input Title e.g. “Performance of Concerto for Violin”

EVA 2000

Step 2 – Semantic Mapping

EVA 2000 XSLT for Transformations

• Works well for structural and syntactic mapping between metadata descriptions

• Semantic mappings need to be hardcoded• Unsuitable for loosely constrained or

variable input

EVA 2000 A More General Solution

• Flexible semantic mappings require additional knowledge:– Metadata Term Ontology – MetaNet

• Methods for using that context knowledge for mapping– Some combination of procedural language

(Java) and XSLT– Investigating more general mapping rule

language (analogies to compiler technology)

EVA 2000

Planned Experimental Context

• CIMI Experiments– Dublin Core for basic resource descriptions– Richer descriptions derived from ABC model– Mapping among descriptions– Understanding relationship between ABC

and CIDOC CRM

• Connecting with Recordkeeping Metadata Issue - SPIRT Project

Metadata Infrastructure

EVA 2000 Moscow

EVA 2000 Metadata is language

• Metadata schemas are languages for making statements about resources:– Book has Title "Gone with the Wind".– Web page has Publisher "Springer

Verlag".

• Vocabulary terms (elements) are defined in standards like Dublin Core

• Metadata grammars constrain the statements and data models one can form

EVA 2000

But languages evolve with use

• Inevitably, languages resist stability

• People stretch official definitions• Implementers misunderstand the

intended meaning or use of elements • Implementors coin local terms and

extensions• If the application does not fit the

standard, the standard is often "customized" to fit the application

EVA 2000

Metadata languages are "multilingual"

• Metadata is not a spoken language• The words of metadata -- "elements" --

are symbols that stand for concepts expressible in multiple natural languages

• Standards may have dozens of translations

• Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?

EVA 2000

What metadata languages lack

• Comprehensive dictionaries – Where can one get an overview of

vocabulary terms used in metadata languages?

• A publication context for implementers– Where can you see how they are using

metadata?

• Standard grammars– How do we understand the principles of

metadata?

EVA 2000

Can we manage this evolution?

• How can we (scalably) monitor the usage of a language that is:– Never spoken?– Rarely published in a way that can be

harvested?

• How can dictionary editors help a metadata language evolve and grow in response to usage?

• How can this evolution occur across (human) languages?

EVA 2000

RDF Schemas (RDFS) -- W3C standard

• A dictionary format for metadata terms:– Simple XML format for terms and definitions

• Example: "Title" (Dublin Core)– Human-readable label and definition:

• Title: A name given to the resource.

– Unique, machine-readable identifiers• dc:title

• Support for cross-references– between terms in related standards– between local adaptations and related standards

EVA 2000 Print world versus the Web

• Traditional print world– Standards are currently defined and published as

paper documents or Web pages in HTML– Metadata implementors rarely publish their

local extensions and adaptations

• RDF Schemas (RDFS)– Web-based publication format– Explicit cross references from implementation

schemas and the standards on which they are based

EVA 2000

EOR -- an RDF Schema Browser

• Harvests RDF Schemas– Schemas distributed on multiple Web servers– Creates huge database of schemas for searching– Web interface functions as a "metadata browser"– Click on cross-references between linked terms

• Downloadable as open source software– http://eor.dublincore.org/index.html– Authors: Eric Miller (OCLC, RDF Working Group, DCMI)

and Tod Matola

EVA 2000

Hyperlink Metadata Terms over the Web

• Index of metadata terms searchable as one huge database

• Click on cross-references to follow term-to-term links between vocabularies

• Point-to-point, like the Web itself– In 1992, Gopher located the right file within

directory trees (but not points within the file)– HTML enabled point-to-point links between

documents

EVA 2000

"Editor" -- a MARC relator -- refines "Contributor"

EVA 2000

Follow the link to MARC Relator Terms

EVA 2000

...the source of which looks like this:

EVA 2000

...or to Contributor [here, in English, French, German]

EVA 2000

Or view the schema of MyRDF itself...

EVA 2000

...itself an RDF schema like the others

EVA 2000

Registries can function as dictionaries

• Historically, dictionaries of English, French, etc: recorded variants, prescribed forms, and helped standardize (national) languages

• Metadata dictionaries can help metadata vocabularies evolve more like other human languages– Not just top-down, like traditional

standards– Also bottom-up, in response to usage

EVA 2000

Dictionaries prescribe and describe

• Prescribe definitions and recommend usage

• Describe how terms are actually used– Monitor usage through collecting

examples

• Editors and usage boards must strike a balance between prescription and description.

EVA 2000

SCHEMAS Project -- a Thin Registry

• http://www.schemas-forum.org, an EU Project• Pointers to resources elsewhere (a "thin"

registry or portal)• Short descriptions of metadata standards

activities• Critical commentaries by domain experts• Promote the publication of schemas (in

RDF)• Goal: help implementors discover how others

(e.g. EU Projects) are using standards in order to harmonize usage

EVA 2000

DCMI -- a Thick Registry

• A thick registry: stores official metadata element definitions in a central database or repository

• Managing a namespace (as a standards agency): publish qualifiers as available, with version control– Managing translations of the standard in multiple

languages

• Eventually:– User guide interface– Support for standardisation processes (peer review)– Downloadable input to software tools for generating,

editing, validating DC metadata

EVA 2000

Dictionaries as a tool for harmonization

• Knowledge of how other projects are using standards will avoid "reinventing the wheel"

• To help information providers harmonize their schemas for improved access within domains:– Between countries (Nordic Metadata Project)– Preprint repositories (Open Archives Initiative)– Subject gateways (Renardus)– Theses and dissertations (NDLTD)– Mathematics and physics (MathNet, PhysNet)

EVA 2000

A global registry infrastructure?

• Analogously to HTML for text, RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web

• Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries

• Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?

EVA 2000 The scope of registries

• Anything "semantic" (terms and definitions) is potentially an RDF schema:– controlled vocabularies– namespaces, application profiles, annotations– the "schema" of the registry itself

• Application constraints can be modelled in XML Schemas– "title is mandatory"; "date must be after 1980"

• Will XML and RDF Schemas merge?

Deploying and Using Metadata

EVA 2000 Moscow

EVA 2000

Syntax Alternatives:HTML

• Advantages:– Simple Mechanism – META tags embedded

in content– Widely deployed tools and knowledge

• Disadvantages– Limited structural richness (won’t support

hierarchical,tree-structured data or entity distinctions).

– Limited formalisms (parsing and schema definition)

EVA 2000 Dublin Core in HTML

<link rel="schema.DC" href="http://purl.org/dc"> <meta name="DC.Title" content="Business Unusual” <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web

cataloging "> <meta name="DC.Date" scheme="W3CDTF"

content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">

EVA 2000

Syntax Alternatives:XML

• The standard for networked text and data

• Wide-spread tool support– Parsers (DOM and SAX)– Extensibility (namespaces) – Type definition (XML Schema)– Transformation and Rendering (XSLT)– Rich linking semantics (XLINK)

EVA 2000 XML Schema

• Rich XML-based language for expressing type semantics

• Replaces arcane and limited DTD (origin in SGML)

• Facilities– Data typing (both complex and primitive)– Constraints– Defaults

EVA 2000 Dublin Core in XML

<metadata xmlns:dc="http://www.openarchives.org/OAI/dc.xsd">   <dc:creator>Carl Lagoze</dc:creator> <dc:title>Accommodating Simplicity and Complexity in Metadata</dc:title> <dc:date>2000-07-01</dc:date>       <dc:publisher>Cornell University, Computer Science</dc:publisher></metadata>    

EVA 2000

Syntax Alternatives:RDF

• RDF (Resource Description Format)• The instantiation of the Warwick

Framework on the Web• Provides enabling technology for richly-

structured metadata• Rich data model supporting notions of

distinct entities and properties• Syntax expressed in XML

EVA 2000 RDF Components

• Formal data model• Syntax for interchange of data• Schema Type system (schema model)

EVA 2000 RDF Data Model

• Directed labeled graphs• Model elements

– Resource– Property– Value– Statement– Containers

EVA 2000 RDF Model Primitives

ResourceProperty

ValueResource

Statement

EVA 2000 RDF Syntax Example

URI:R“CIMI Presentation”

Title

Creatordc:

dc:

“Eric Miller”

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description></RDF>

EVA 2000

“Eric Miller”

RDF Model Example #2

URI:R

URI:ERIC

“emiller@oclc.org”“Eric Miller”

“OCLC”

bib:Emailbib:Affbib:Name

URI:OCLC

“CIMI Presentation”Title

Creatoroa:

dc:

EVA 2000

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> emiller@oclc.org </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description></RDF>

RDF Syntax Example #2

EVA 2000 RDF Containers

• Permit the aggregation of several values for a property

• Express multiple aggregation semantics– unordered– sequential or priority order– alternative

EVA 2000 RDF Schemas

• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on

corresponding values

• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system

designed for XML Schema WG

• Expressible in the RDF model and syntax

EVA 2000

Relationships among vocabularies

dc:Creator

ms:director

marc:100

bib:Author

EVA 2000 Bringing it together

• RDF Metadata transmission– Embedded (e.g. <META>), Transmitted with

resource (HTTP), Trusted 3rd Party (HTTP GET)

• RDF Data Model – Support consistent encoding, exchange and

processing of metadata… critical when aggregating data from multiple sources

• RDF Schema– Declare, define, reuse vocabularies

EVA 2000

Open Archives Initiativehttp://www.openarchives.or

g

EVA 2000 What is Interoperability?

• Naming?– Handles– Purls

• Metadata?– MARC– Dublin Core

• Document models?– WebDAV

• Federated searching?– Z39.50?– DASL?

• Services and Protocols?– Dienst

EVA 2000 Partitioning Interoperability

Document Models

Metadata Harvesting

Mediator ServicesLinking, Searching, Summarizing

EVA 2000

SearchingCurrent

AwarenessSummarization

Service Providers

Data Providers

harv

estin

g

The World According to OAI

EVA 2000 UPS Meeting Results

• Establishment of Open Archives Initiative– Loose coalition to experiment with

interoperability solutions

• Santa Fe Convention– Organizational and technical framework to

support metadata harvesting for ePrint archives

EVA 2000

Metadata Harvesting is not New

• Harvest Project (1992-1995)– DARPA-funded– Mike Schwartz (U. Colorado), Mic Bowman

(Penn State), Udi Manber (U. Arizona)

EVA 2000 “Open” Archives

• Political Agenda?– Author self-archiving of E-Prints– “Mission” to reformulate scholarly

publishing framework

• Technical?– Infrastructure to facilitate interoperability

across multiple domains

EVA 2000

Other communities of interest

• “Cambridge” digital library federation meetings– research library community has many

materials for which they’d like to ‘expose’ metadata

• San Antonio OAI workshop– librarians, publishers (some), others

EVA 2000

Technical Umbrella for Practical Interoperability…

ReferenceLibrariesPublishers

E-PrintArchives

…that can be exploited by different communities

EVA 2000 Acting mission statement

Supply and promote an application independent technical framework – a supportive infrastructure that empowers different scholarly communities to pursue their own interests in interoperability in the technical, legal, business, and organizational contexts that are appropriate to them.

Dan Greenstein, Director DLF

EVA 2000

What does this REALLY Mean?

• Keep the bar low enough to make widespread adoption possible

• Provide enough back-doors to make true “disruption” possible (e.g., ePrint community:– refine record notion to mandate full-content

connection– refine metadata to mandate linkage to full-

content

EVA 2000 Organizational Stability

• Institutional backing of CNI (Coalition for Networked Information) and DLF (Digital Library Federation)

• Formation of steering committee– first steps towards international

involvement

EVA 2000

Framework for Partitioning Tasks

• Steering Committee– policy guidance

• Technical Committee– technical specifications

• Workshops– public dissemination, feedback, community-

building

EVA 2000 Ithaca Technical Meeting

• Input– experiences gained with implementing &

discussing the current SFc specs– emerging interest for the application of

SFc-concepts as a general interoperability framework in a scholarly environment

EVA 2000 Ithaca technical meeting

• Output– guidelines for an in-depth revised technical

spec to be issued early 2001 – stable for experimentation; not definitive– minimize risk for early adopters– maximize chances for future interoperability

across communities

EVA 2000

underlying concepts

abstract principles

concrete implementation of principles

Components of OAI Model

EVA 2000

service providers

records in an archive

open interface to archives

managed archives (data providers)

OAI Underlying Concepts

EVA 2000

metadata harvesting

identifiers

metadata set formats

acceptable use

registration

abstractprinciples

implementationof principle

OAI harvesting protocol

URIs (community schemes)

DC & XML container (parallel sets)

Flow Control (usage restrictions)

(community specific)

Building on Underlying Concepts

EVA 2000 What is a record?

A record in an archive is a metadata-record. The metadata record describes – and can contain an entry point to- full-content.

EVA 2000

We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.

Metadata: Interoperability & Extensibility

EVA 2000

• Adoption of unqualified Dublin Core Element Set as required metadata.

• Support for parallel metadata sets maintained– EPMS (e-print community)– Others

• Research library community• Museum community

Metadata Solutions

EVA 2000 Metadata XML Container

<record> <header> <identifier>oai:arXiv:hep/001001</identifier> <datestamp>1999-12-25</datestamp> </header> <metadata xmlns:dc=“http:…”> <dc:creator>Ernest Rutherford</dc:creator> <dc:title>Investigations of Radioactivity </dc:title> <dc:identifier>doi:1234/5432</dc:identifier> </metadata></record>

EVA 2000 Identifier Issues

• Basic identifier constraints based on URI specifications– A key for requesting a record from a

repository– Key and metadata format ID uniquely

identify a record

• Individual communities may develop URN registration schemes

EVA 2000 Identifier Solutions

full-identifier = oai:archive-identifier:record-identifier

Registered URI

Scheme

Archive Idendifier:

Registered within OAI

Unique ID within archive:

(syntax is archive-specific)

example = oai:ncstrl:ncstrl.cornellcs/TR94-1418

EVA 2000

Repositories, Identifiers, and Records

Identifier

Datestamp

MF1 MF2 MF3 MF4

<record> <header> … </header> <metadata> …. </metadata><record>

EVA 2000 Selective harvesting

• Recognized need for light-weight facility for selective harvesting– By Date

• Sets– A low-cost means of selective harvesting– NOT a general tool for defining global

categories– Attribution of meanings to sets can be done

within communities and in bilateral fashion

EVA 2000 Protocol Solutions

• Normalized and Enhanced Verb Set– GetRecord– Identity– ListIdentifiers– ListMetadataFormats– ListRecords– ListSets

EVA 2000 Protocol Solutions

• CGI-script friendly syntax– baseurl?verb=verbname&argname=argval...

– verbname is the name of the verb– argname is the name of the attribute– argval is the value of the attribute

• Examplehttp://foo/blaz?verb=ListRecords&set=S1

EVA 2000 Registration Solutions

• Automation through:– On-line registration of:

• Archive identifier (uniqueness enforcement)• base-url of archives OAI protocol implementation

– Identity verb that exposes archive characteristics

– Use of protocol for registration of metadata formats and validity checking

• Registration of service providers is still an open issue

EVA 2000 Release Schedule

• October 15 – normalized meeting notes distributed to meeting group

• November 1 – beta specification to steering committee and limited distribution

• Early January – stabilization of specification and public meeting

Metadata Landscape

EVA 2000 Moscow

EVA 2000 Conferences

• ACM Digital Libraries 2001, San Antonio, June 2001, http://www.dl00.org/

• European Conference on Digital Libraries, Darmstadt, Sep 2001 http://www.ecdl2001.org

• Asian Digital Library Conference, Seoul, December 2000, http://ADL2000.kaist.ac.kr

• Tenth International WWW Conference, Hong Kong, May 2001, http://www10.org

EVA 2000

NSF Digital Library Initiative

• Phase I (1994-1998): six large-scale testbeds involving research universities, industrial partners, and next-generation technologies

• Phase II (1999+): expanded scope, smaller projects as well as large testbeds, emphasis on making accessible new types of content

EVA 2000

Distributed National Electronic Resource (UK)

• A managed environment for Internet access to scholarly journals and other materials relevant to higher education in the UK

• Uses international standards (eg, Dublin Core)• National purchase and licensing agreements

for best value to UK education community• eLib research funding since mid-1990s

emphasized incremental improvement of standards and services

EVA 2000 Global Info (Germany)

• "The German Digital Library Project"• Since 1996, integrating access to

scientific information among libraries, publishers, learned societies, and individual scientists

• Emphasis on open standards (e.g., Dublin Core) and open-standard formats (e.g., XML, RDF, MPEG)

EVA 2000 European Union

• Fifth Framework Programme, 1998-2002 – several dozen projects with several countries each– Digital Heritage, Cultural Content– Interactive Electronic Publishing– Multimedia Content and Tools

• DELOS Network of Excellence– http://www.ercim.org/delos/– Communication within European digital library

research community and international networking

EVA 2000 MathNet• German Mathematical Societies index math pre-

prints and home pages of mathematicians– Encourages use of Dublin-Core-based metadata by

distributing free metadata editor; displays hits "with metadata" separately from hits "without metadata"

• International Mathematical Union (IMU) planning international Web service based on German MathNet model

• Seeking international agreement on simple metadata profiles for types of math materials

EVA 2000

IMS Global Learning Consortium, Inc.

• Teachers seeking appropriate classroom materials on Web may want to know:– for which age-group?– has it already been used successfully in

classrooms?– will it work on my equipment?

• IMS: Rich descriptions of learning resources in a standard record format

EVA 2000

Federal Geographic Data Committee

• (US) FGDC Content Standard for Digital Geospatial Metadata: integrate access to resources about a particular area found in diverse repositories

• Government, education, and business needs– Emergency management– Integrated databases and comprehensive maps– City planning– Environmental control

EVA 2000

Visual Resources Association

• VRA Core Categories in a two-level model for describing objects such as paintings and buildings

• "Works" described separately from "images" of those works (One-to-One Principle)

• Conceptual clarity of One-to-One Principle implies more complex work-flow and processing for catalogers and software

EVA 2000 Nordic Metadata Project

• Cooperation between Scandinavian countries (since circa 1996)

• Pioneered idea of metadata-based distributed index across national boundaries

• NetLab (Lund University) maintains SAFARI, which harvests Dublin-Core-based metadata embedded in documents on Web servers

EVA 2000 Renardus Project (EU)• http://www.konbib.nl/coop/reynard

– National libraries (Netherlands coordinates)

– NDR: National Digital Resource in UK– Die Deutsche Bibliothek

• Goal: integrated access to subject gateways in Europe

• High-level agreement on simple, Dublin-Core-based schema as common denominator

EVA 2000

Networked Digital Library of Theses and Dissertations

(NDLTD)• http://www.ndltd.org• International consortium of projects

putting dissertations online• Difficult to agree on single unified

metadata schema -- national, legal, and disciplinary requirements differ significantly

• NDLTD agreement on a small Dublin-Core-based set of metadata elements?

EVA 2000 CIDOC

• International Council of Museums: object-oriented model (CIDOC) designed for describing multiple entities that may be– physical (e.g., museum objects)– conceptual (e.g., works)– temporal (e.g., historical periods)– spatial (e.g., places)

• Implies an integrated information space of "encyclopedic" scope

EVA 2000 Rich Site Summary (RSS)

• Metadata for content syndication (news feeds)

• Used in developing media content portals

• Built on established vocabularies (DC), uses RDF syntax

• Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.

EVA 2000

Moving Picture Experts Group (MPEG)

• MPEG 4: encoding and interacting with audio-visual objects

• MPEG 7: multimedia content description interface for such objects

• MPEG 21: ambitious "umbrella" framework describing the infrastructure for delivering and consuming multimedia content

EVA 2000 More...

• INDECS - Uses an event-based model to describe intellectual property rights for commercial transactions

• DOI - Uses the INDECS framework with a Digital Object Identifier for content description and management of references between scientific, technical, and medical journals

• BSR - Basic Semantic Registry as a universal interlingua of concepts

• GILS - Government Information Locator Service

EVA 2000 ...and more...

• PDS - Planetary Data System• IEEE Learning Object Metadata - an elaborate,

hierarchical scheme for describing multiple facets of educational material

• MARC 21 - Machine Readable Cataloging format and related vocabularies for libraries

• EPICS Data Dictionary, a subset of which -- ONIX -- describes books in a specific XML format (pushed by Amazon.com)

EVA 2000 For further information....

• "Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org– Critical overview (with expert commentary)

on the metadata landscape as it evolves– Related database of individual activity

reports

• D-Lib Magazine, http://www.dlib.org/dlib/

• Ariadne, http://www.ariadne.ac.uk

EVA 2000

Why the Web won

• Tim Berners-Lee's original model was very simple, and it was easy to implement

• Real-world experience with simple HTML led iteratively to better understanding of priorities– As with bicycles and airplanes, there was no

"theory" for design -- design was perfected iteratively, starting simple

• Complex standards impose significant costs, especially if legacy data must be converted

EVA 2000

Learning from experience

• People are only human: the most perfect language is always subject to interpretation

• By design, metadata languages must allow for innovation and evolution

• Physics and art history, Chinese and Finnish -- different languages will continue in real life

• Likewise, a diversity of metadata languages is inevitable

• Interoperability over "everything" can only be via a simple and general pidgin

EVA 2000

thomas.baker@gmd.de