1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.

Post on 27-Dec-2015

222 views 1 download

Tags:

Transcript of 1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.

1

The NSDL:A Case Study in Interoperability

William Y. ArmsCornell University

2

The NSDL is a program of the National Science Foundation's Directorate for Education and Human Resources, Division of Undergraduate Education.

The NSDL Core Integration is a collaboration between the University Center for Atmospheric Research (Dave Fulker), Columbia University (Kate Wittenberg) and Cornell University (Bill Arms).

The ideas discussed in this talk do not represent the official views of the NSF or the Core Integration team.

Acknowledgement and Disclaimer

3

Research Funding: Europe and USA

Europe

Grant is awarded to carry out the research plan specified in proposal

USA

Grant is awarded to carry out research in the area described in the proposal, but is not expected to follow the precise plan.

4

New Initiatives during a Grant

Program Activity University

Gigabit testbed Mosaic Illinois

CSTR Lycos Carnegie Mellon

DLI-1 Google PageRank Stanford

DLI-2 Open Archives Initiative Cornell

Examples of significant partial funding that was not envisaged in the proposal.

5

NSF-funded Research Programs

NSF

Solicitation

Proposals

Research

New ideas

New ideas

6

The NSDL Program

NSF's objective

Build a comprehensive digital library for all aspects of science education

NSF's approach

Solicitation encouraged wide diversity of proposals divided into general categories

Best 60+ proposals funded -- more to follow

Grants allow projects flexibility

Result

A splendid set of projects

A challenge in interoperability!

7

NSDL Collections Funded by the NSF (a) Focused collections

8

9

10

11

NSDL Collections Funded by the NSF (b) Aggregates and federations

12

13

14

15

NSDL Service Projects Funded by the NSF

16

17

18

19

NSDL Core Integration Team Funded by the NSF

20

Responsibility without Authority

Core Integration

Budget $4-6 million

Staff 25 - 30

Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

21

All branches of science, all levels of education, very broadly defined:

Five year targets

1,000,000 different users

10,000,000 digital objects

10,000 to 100,000 independent sites

How Big might the NSDL be?

22

Collections The NSDL program funds only a fraction of the relevant collections.

23

Every Collection is Different

24

... to provide a coherent set of services across great diversity.

The Core Integration Task ...

25

A Spectrum of Interoperability

26

Approaches to interoperability

The conventional approach

Wise people develop standards: protocols, formats, etc.

Everybody implements the standards.

This creates an integrated, distributed system.

Unfortunately ...

Standards are expensive to adopt.

Concepts are continually changing.

Systems are continually changing.

Different people have different ideas

27

Interoperability is about agreements

Technical agreements cover formats, protocols, security systems so that messages can be exchanged, etc.  Content agreements cover the data and metadata, and include semantic agreements on the interpretation of the messages.  Organizational agreements cover the ground rules for access, for changing collections and services, payment, authentication, etc.

The challenge is to create incentives for independent digital libraries to adopt agreements

28

Function versus cost of acceptance

Function

Cost of acceptance

Many adopters

Few adopters

29

Example: textual mark-up

Function

Cost of acceptance

SGML

ASCII

HTML

XML

30

Example: security

Function

Cost of acceptance

Public key infrastructure

IP address

Login ID and password

31

Levels of interoperability

Level Agreements Example

Federation Strict use of standards AACR, MARC(syntax, semantic, Z 39.50and business)

Harvesting Digital libraries expose Open Archivesmetadata; simple metadata harvesting

protocol and registry

Gathering Digital libraries do not Web crawlerscooperate; services must and search enginesseek out information

32

Metadata StrategyMetadata is expensiveThe NSDL cannot afford to create it manually

33

Metadata Strategy

• Support eight standard formats

• Collect all existing metadata in these formats

• Provide crosswalks to Dublin Core

• Expose records in the metadata repository for others to harvest

• Concentrate on collection-level metadata

• Use automatic generation to augment item-level metadata

34

Users

Collections

Metadata repository

The Metadata Repository

Services

The metadata repository is a resource for service providers.

It holds information about every collection and item known to the NSDL.

35

Services Strategy

36

The Metadata Repository as a Resource

Records are exposed through Open Archives Initiative harvesting protocol.

Core Integration team will provide some services based on the metadata repository.

The architecture encourages others to build services.

37

Example: Search Service

Portal

Portal

Portal

Search andDiscoveryServices Collections

SDLIP OAI

http

Metadata repository

James Allan, Bruce Croft (University of Massachusetts, Amherst)

38

Research Challenges:

Extending the Architecture to Support Greater Riches

Federations with rich sets of agreements (e.g., MARC, Z39.50)

Rich object models (e.g., interactive, dynamic, continuous time)

Language tools (e.g, thesaurus, gazetteer)

... and Lesser Riches

Web crawling

Automated quality control