Edward A. Fox fox@vt fox.cs.vt CS DLRL Internet TIC

71
University Electronic Publishing through Digital Libraries: Courseware, Theses and Dissertations Singapore - Dec. 2002 Edward A. Fox [email protected] http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech,

description

University Electronic Publishing through Digital Libraries: Courseware, Theses and Dissertations Singapore - Dec. 2002. Edward A. Fox [email protected] http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech, Blacksburg, VA, USA. - PowerPoint PPT Presentation

Transcript of Edward A. Fox fox@vt fox.cs.vt CS DLRL Internet TIC

Page 1: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

University Electronic Publishing through

Digital Libraries:Courseware, Theses and Dissertations

Singapore - Dec. 2002

Edward A. [email protected] http://fox.cs.vt.edu

CS DLRL Internet TICNDLTD CITIDEL NSDL …Virginia Tech, Blacksburg, VA, USA

Page 2: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Acknowledgements (Selected)

• Sponsors: ACM, Adobe, IBM, Microsoft, NSF (Grants CDA-9312611; DUE-0121741, 0136690, 0121679; IIS-0080748, 0086227, 0002935, and 9986089), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, …

• Faculty/Staff (now): Boots Cassel, Debra Dudley, Lee Giles, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Manuel Perez, Muhammad Zubair, …

• Students: Fernando Das Neves, Marcos Goncalves, Paul Mather, Ryan Richardson, Priya Shivakumar, Hussein Suleman, Wensi Xi, …

• UNESCO Analytical Survey: Leonid Kalinichenko

Page 3: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Outline

• Case Study: NDLTD

• Case Study: CSTC• Case Study: CITIDEL• Interoperability: OAI, ODL• Conclusions

Page 4: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: http://etd.vt.edu

• Collection: http://www.theses.org

Project: Networked Digital

Library of Theses & Dissertations

(NDLTD) http://www.ndltd.org

Page 5: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

The Networked Digital Library of Theses and Dissertations

www.NDLTD.org

Leader of the Worldwide ETD(Electronic Thesis and Dissertation) Initiative

Training AuthorsExpanding Access

Preserving KnowledgeImproving Graduate Education

Enhancing Scholarly CommunicationEmpowering Students & Universities

Page 6: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

GradProgram

IT Ed.(Tech)Library

NDLTD

Page 7: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Key Ideas: Networked infrastructure

Scalability

Education is the rationale

University collaboration

Workflow, automation

Authors must submitMaximalAccess

PDF, SGML, MM,MARC, DC, URNs,Federated search

Standards

8th graders vs. grads

Page 8: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

What led to today’s meeting?• 1987 mtg in Ann Arbor: UMI, VT, …• 1992 mtg in Washington: CNI, CGS, UMI, VT and 10

universities with 3 reps each• 1993 mtg in Atlanta to start Monticello Electronic Library

(regional, US Southeast): SURA, SOLINET• 1994 mtg at VT: std: PDF + SGML + multimedia objects• 1996 funding by SURA, US Dept. of Education (FIPSE)• 1997 meetings in UK, Germany, ...• 1998 – 1st symposium – Memphis (20)• 1999 – 2nd symposium – Blacksburg (70)• 2000 – 3rd symposium – St. Petersburg (225)• 2001 – 4th symposium – Caltech (200)• 2002 – 5th symposium–BYU; 2003–Berlin; 2004–Kentucky

Page 9: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

What are the long term goals?

• 400K US students / year getting grad degrees are exposed / involved

• 200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …)

• Dramatic increase in knowledge sharing: literature reviews, bibliographies, …

• Services providing lifelong access for students: browse, search, prior searches, citation links

• Hundreds/thousands of downloads / year / work

Page 10: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Convene Local Planning Group

ETD

Page 11: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Build Local ETD Site

Digital Library

Policies

Inspection/Approval

Workshop/Training

ETD

ETD

Page 12: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

NDLTD

Computer Resources

Research

Literature

Student Prepares Thesis/Dissertation

Page 13: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Student Defends & Finalizes ETD

My Thesis

ETD

Page 14: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School

Page 15: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Graduate School Approves ETD, Student is Graduated

Ph.D.

Page 16: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

Page 17: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

National / Regional Projects• Australia

• U. New South Wales (lead)• U. of Melbourne• U. of Queensland• U. of Sydney• Australian National U.• Curtin U. of Technology• Griffith U.

• Germany• Humboldt University (lead)

• 3 other universities

• 5 learned societies: Math, Physics, Chemistry, Sociology, Education

• 1 computing center

• 2 major libraries

• OhioLINK: 79 colleges/univs• Consorci de Biblioteques

Universitàries de Catalunya, as group, www.cbuc.es: 9 sites

• India• Korea• Brazil• UK (British Library, JISC,

Edinburgh)• UNESCO (especially Latin

America, Eastern Europe, Africa)

Page 18: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Some Countries

• Australia• Belgium• Brazil• Canada• China, Hong Kong• Columbia• Finland• France• Germany• India (Hyderabad)• Italy• Korea• Mexico

• Netherland• Norway• Russia• Singapore• S. Africa (Rhodes U.)• S. Korea• Spain• Sudan• Sweden• Taiwan• UK• USA

Page 19: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Institutional Members• British Library• Cinemedia• Coalition for Networked Information (CNI)• Committee on Institutional Cooperation (CIC)• Consorci de Biblioteques Universitàries de Catalunya• Diplomica.com• Dissertation.com• Dissertationen Online (Germany)• ETDweb, a Division of Answer4.com• Ibero-American Science & Technology Education Consortium (ISTEC)• National Documentation Centre (NDC), Greece• National Library of Portugal (for all universities)• OCLC Online Computer Library Center• OhioLINK• Organization of American States (SEDI/OAS)• Southeastern Library Network (SOLINET)• UNESCO (www.unesco.org/webworld/etd)

Page 20: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Access Possibilities

Websearchengines

librarycatalogclients

www.theses.org

www.openarchives.org

3rd

PartyServices(e.g.,UMI)

VirginiaTech

NationalLibrary ofPortugal

CBUC(Spain)

OhioLink

MIT NationalProjects:AU, GE, …

Page 21: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

ETD-MS

• ETD Metadata Standard• XML-encoded metadata standard

(content and encoding) for Electronic Theses and Dissertations (ETDs)

• in part conforming to Dublin Core (DC)

• using UNICODE

• (optionally / later using RDF)

• Well specified relationship with MARC

Page 22: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

NDLTD Members and ETD-MS

• NDLTD members will• Share metadata for their ETDs

• Providing that in either ETD-MS

• Or if they use a version of MARC locally, work to have that eventually shared in either MARC21 or UNIMARC

• Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions

Page 23: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Some recent additions

• ETD individuals support• http://etdindividuals.dlib.vt.edu:9090

• ETD discussion (e-prints)• http://ndltdpapers.dlib.vt.edu:9090

• Conference papers and presentations• http://www.ndltd.org/WVUproc.htm

• Marcel Dekker book in publication

Page 24: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

What are plans at VT?

• LOCKSS welcomed us• Lots of Copies Keeps Stuff Safe

• MARIAN: harvest, crawl/scrape, fed search• Metadata crosswalks and format converters• XML schema for ETDs• Open Digital Libraries: easy to add

services!• http://oai.dlib.vt.edu/odl

Page 25: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Union catalog (OCLC)

• OCLC will expand the OAI data provider on TDs

• Will get data from WorldCat

• Will harvest from all who contact them

• Need DC and either ETD-MS or MARC

• Will have a set for ETDs

Page 26: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Union catalog (VTLS, VT)

• VTLS will enhance search/browse service for ETDs• Will harvest from OCLC’s set of ETD records• Will receive through other mechanisms, too• Will work with MARC-21 and ETD-MS

• VT will continue to offer experimental services

Page 27: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

NUDL (www.nudl.org)Int’l Research Support

• Networked University Digital Library• Partners: Germany, Mexico (Puebla and

Monterrey), Brazil• Problems: Multilingual search, high

performance DLs, requirements/usability, …

• Start with ETDs, then expand to other student works, portfolios, data sets, (CS) courseware, ...

Page 28: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Outline

• Case Study: NDLTD

• Case Study: CSTC• Case Study: CITIDEL• Interoperability: OAI, ODL• Conclusions

Page 29: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

CS Teaching Center (CSTC)

• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.

• Learners benefit from having well-crafted modules that have been reviewed and tested.

• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.

Page 30: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC
Page 31: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Browsing (2)

Page 32: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC
Page 33: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC
Page 34: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC
Page 35: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

JERIC

• JJournal of EEducational RResources iin CComputing

• Accessible from www.cstc.org and www.acm.org

• ACM and SIGCSE support

• Refereed and interactive

• Part of ACM Digital Library

Page 36: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Outline

• Case Study: NDLTD

• Case Study: CSTC• Case Study: CITIDEL• Interoperability: OAI, ODL• Conclusions

Page 37: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

www.CITIDEL.org• Computing and Information Technology Interactive

Digital Education Library, an NSDL Collection Track project

• Led by Virginia Tech, with co-PIs:• Fox (director, DL systems)• Lee (history)• Perez (user interface, Spanish support)

• Partners• College of New Jersey (Knox)• Hofstra (Impagliazzo)• Villanova (Cassel)• Penn State (Giles)

Page 38: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Summary of Spring 2001 Survey of CITIDEL-related Collections

and their Sizes

Size of Collection

1-5 items

6-100 items

101-999items

+1000items

Number ofCollectionsIdentified

100-300 50 20-35 10-25

Page 39: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

English

Spanish

Nominated

Editor reviewed

Java

Multimedia

LLaanngguuaaggee TTooppiicc

QQuuaalliittyy

Identified by crawl

Peer reviewed

Algorithms

Multi-dimensional Categorization

Page 40: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

CITIDEL Collection Sources

metadata

JERIC

fulltext

Experts’finding

aids

IEEE-CS…

include

CSTC ResearchIndex

ACM

NEC’sdata

dataprocessedw. R.I.

SIGCSEproceedings

ACMDL

include

include

include

include

include

Borner’sinfo vizsoftware

repository

NCSTRL

Page 41: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

CITIDEL Collection Buildingthru

aided by

after

using

or thru

using

Submitting

VIADUCTGetSmart

Searching,Browsing

Classifying

Nominating

Crawling

Crawlifier

thru

Composing

include afterCreating

include after

Page 42: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

DIGITAL LIBRARY SERVICES

REPOSITORIES

USER PORTALS

Overview of CITIDEL architecture

Page 43: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Union Metadata Repository

OAI Data

Provider

Laboratories Repository

Applets Repository

Papers Repository

Syllabi Repository

. . .

Digital Library Services

OAI Data

Harvester

Distributed repository structure

Page 44: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Annotations

OAI Data

Harvester

EDUCATORS

ADMINISTRATORS LEARNERS

Multilingual Searching

Revising Annotating Filtering Browsing Administering

Filtering Profiles User Profiles

Union Metadata

OAI Data

Provider

Remote and Peer Digital Libraries (eg. NSDL -CIS)

PORTALS

SERVICES

REPOSITORIES

Digital library architecture for localand interoperable CITIDEL services

Page 45: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Outline

• Case Study: NDLTD

• Case Study: CSTC• Case Study: CITIDEL• Interoperability: OAI, ODL• Conclusions

Page 46: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Open Archives Initiative

OAIwww.openarchives.org

[email protected]

Page 47: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

Page 48: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Technical Umbrella for Practical Interoperability…

ReferenceLibraries

PublishersE-Print

Archives

…that can be exploited by different communities

Museums

Page 49: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Tiered Model of Interoperability

Mediator services

Metadata harvesting

Document models

Page 50: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Browse SummarizeSearch Visualize

DO DODODODODODO

Services:

Docs:

Metadata:

Page 51: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Aggregation throughOAI Harvesting

Archive

Lite Sites

NCSTRL

Eprints

IEEE-CS, ACM, …

Own: History, ResearchIndex,

CSTC, …

CITIDEL

Active

Page 52: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Approaches to Open Archives

Build ByDiscipline

Build By Institution

AuthorCategoryInterdisciplinaryYearLanguageQuery …

Page 53: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

OAI Perspective

• Rethink your efforts in terms of providers of• Data, Services

• Reduced work for data providers• Tools available• Don’t need to offer services

• Reduced work for service providers• Others provide the data• Can use tools and systems for OAI, XOAI

• Results• More data becoming available• To more people• Supported by improved services

Page 54: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

repository

repos i tory

OAI protocol

harves ter

supportdata

harvestingdata

items

Page 55: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

selective harvesting - datestamps

repos i tory

harvest withindate range

record

record

Page 56: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

selective harvesting - sets

repos i tory

harvest within setS1

recordrecord

record

S2

Page 57: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

What is an Open Archive ?

• Any WWW-based system that can be accessed through the well-defined interface of the Open Archives Protocol for Metadata Harvesting

• … aka OAI-Compliant Repository

• No implications for:• Physical storage of data• Cost of data• Metadata and data formats• Access control to server

Page 58: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Sample OAI Record

<record> <header> <identifier>oai:sigir:ws3</identifier> <datestamp>2001-08-13</datestamp> </header> <metadata> <dc> <title>OAI Workshop at SIGIR</title> <creator>Hussein Suleman</creator> <language>English</language> </dc> </metadata> <about> <metadataID>oai:sigir:ws3md</metadataID> </about></record>

Page 59: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Sets

• Protocol mechanism to allow for harvesting of sub-collections

• No well-defined semantics – depends completely on local data providers

• May be defined by arrangement between data providers and service providers

• E.g., Subject areas, years, author names, search queries

Page 60: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Protocol for Metadata Harvesting

• Service Requests• Identify

• ListMetadataFormats

• ListSets

• GetRecord

• ListIdentifiers

• ListRecords

• Metadata Multiplicity

• Date Ranges

• Resumption Tokens

Page 61: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Example: Union Collection of ETDs(Electronic Theses and Dissertations,

for Networked Digital Library ofTheses and Dissertations, NDLTD)

VIRTUA

Merged Metadata Collection

MARIAN

Virginia Tech ETD Archive

Duisburg ETD

Archive

HumboldtETD

Archive

Future: recommender, …

… OAI Data Provider

OAI Service Provider

OAI Harvesting

LEGEND

Page 62: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Example: Details

NDLTD Site / Member

Local DB

OAI Server

Local Search / Brow se

Student Entry

NDLTD Central

OAI Harvester

Name Authority Service

(e.g. OCLC)

MARIAN Union

Catalog

VTLS Union Catalog

MARC DB

Virtua

Conversion

Alternate MARC Transport (f tp?) tapes?)

Librarian Verif ication / Validation / Enrichment / Maintenance

Page 63: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Open Digital Library (ODL) Hypothesis (Hussein Suleman)

• Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ?

Maybe … if

Digital Libraries can be modeled as• networks of extended Open Archives, where• each extended Open Archive is a• source of data and/or a provider of services.

Page 64: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Example Architecture (NDLTD)

Humboldt

Duisburg

MIT Filter

MIT

Browse

Union Catalog

Search Recent

User Interface

User Interface

OAI/ODL archive

OAI/ODL protocol

leg

end

Virginia Tech

PhysNet

CalTech

Dresden

Page 65: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

ODL Demonstration - FrontPage

Page 66: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

ODL Demonstration - Search

Page 67: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

ODL Demonstration - Browse

Page 68: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Outline

• Case Study: NDLTD

• Case Study: CSTC• Case Study: CITIDEL• Interoperability: OAI, ODL• Conclusions

Page 69: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Conclusions

• Digital libraries can help advance education.

• Singapore is invited to engage in NSDL, CITIDEL, NDLTD, and other ventures.

• UNESCO Analytical Survey on Digital Libraries in Education is recommending DLE in each nation.

• Local and national support can• stimulate activities, including collaboration• promote a sharing culture, especially in research and teaching• leverage others’ investments (networking, computing, …)• encourage / facilitate learning, innovation and problem solving

Page 70: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

Selected Links• CITIDEL

• www.citidel.org

• NCSTRL• www.ncstrl.org

• NDLTD• www.ndltd.org

• NSDL• www.nsdl.org

• Virginia Tech Digital Library Courseware• http://ei.cs.vt.edu/~dlib

• Virginia Tech Digital Library Research Laboratory (DLRL)• http://www.dlib.vt.edu• (5S, 5SL, AmericanSouth.Org, CSTC, ENVISION, MARIAN,

NDLTD, NSDL, OAI, ODL)

• Repository Explorer• http://purl.org/net/oai_explorer

Page 71: Edward A. Fox fox@vt      fox.cs.vt CS            DLRL           Internet TIC

More Links• ARC Cross-Archive Search Service

• http://arc.cs.odu.edu/• Dublin Core Metadata Initiative

• www.dublincore.org• E-Prints DL-in-a-box

• www.eprints.org• Open Archives Initiative

• http://www.openarchives.org• http://www.openarchives.org/OAI/openarchivesprotocol.htm• http://www.dlib.vt.edu/projects/OAI/

• XML Schema Validator• http://www.w3.org/2001/03/webdata/xsv

• XML Tools at W3C• http://www.w3.org/XML/#software