1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

45
1 Virtual Day on Digital Theses 5 October 2007 – Mexico Networked Digital Library of Theses and Dissertations (NDLTD) – www.ndltd.org Edward A. Fox 1 , Executive Director Gail McMillan, Secretary Ryan Richardson, PostDoc Venkat Srinivasan, Graduate Research Asst. 1 [email protected] http://fox.cs.vt.edu/talks/2007/20071005MexicoNDLTD.ppt

description

Virtual Day on Digital Theses 5 October 2007 – Mexico Networked Digital Library of Theses and Dissertations (NDLTD) – www.ndltd.org Edward A. Fox 1 , Executive Director Gail McMillan, Secretary Ryan Richardson, PostDoc Venkat Srinivasan, Graduate Research Asst. 1 [email protected] - PowerPoint PPT Presentation

Transcript of 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

Page 1: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

1

Virtual Day on Digital Theses5 October 2007 – Mexico

Networked Digital Library ofTheses and Dissertations(NDLTD) – www.ndltd.org

Edward A. Fox1, Executive DirectorGail McMillan, Secretary

Ryan Richardson, PostDocVenkat Srinivasan, Graduate Research Asst.

[email protected]

http://fox.cs.vt.edu/talks/2007/20071005MexicoNDLTD.ppt

Page 2: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

Acknowledgements (selected)

• Colleagues: Tony Atkins, Lillian Cassel, Debra Dudley, John Eaton, Lourdes Fernandez, Marcos Gonçalves, Ming Luo, Silvia González Marín, Uma Murthy, Doug Oard, Alfredo Sanchez, Craig Scott, Hussein Suleman, Alberto Castro Thompson, …

• Sponsors: Dept. of Education (FIPSE), DFG, Elsevier, Google, IBM, IMLS, Microsoft, NSF (DUE-0121679, IIS-9986089, 0080748, 0086227, 0535057), OCLC, RDEC/ACE, SOLINET, SUN, SURA, VTLS, …

Page 3: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

Digital Libraries & ETDs

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Benefits: ETD creators develop lifelong skills with DLs. Students, faculty, departments, & universities save money and gain visibility.

Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org

Page 4: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

4

Page 5: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

5

Importance of ETDs

• Open access is natural and highly effective.Levels playing field, making research from every nation and university equally visible.

• Promotes scholarship and understanding since research details are widely shared.

• Quantity of content is comparable to that of the journal publishing enterprise.

• Can leverage “electronic” for flexibility, expressivity, savings, and perservation.

Page 6: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

6

Main Points

1. NDLTD was launched in 1996 to help with ETD activities worldwide.

2. It is a member organization, so we urge joining by all interested in digital theses.

3. Visible results, e.g., ETDs from Mexico accessible from the NDLTD Union Catalog (and then through Scirus, …), show that working together helps everyone.

4. NDLTD helps with training/education, conferences, standards, technologies, research, and leadership.

Page 7: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

• Aiding universities to enhance graduate education, publishing, and IPR efforts

• Helping improve the availability and content of theses and dissertations

• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive)

What are we doing?

Page 8: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

8

D ig ita l L ib ra r y C o n te n t

A rtic le s ,R e p o rts,

B o o ks

T e xtD o cum e n ts

S p ee ch ,M u s ic

V id eoA u d io

(A e ria l)P h o tos

G e og rap h icIn fo rm ation

M o d e lsS im u la tio ns

S o ftw a re ,P ro g ra m s

G e no m eH u m a n,a n im a l,

p la n t

B ioIn fo rm ation

2 D , 3 D ,V R ,C A T

Im ag es a ndG ra p h ics

C o nte n tT yp e s

Page 9: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

9

QuickTime™ and aCinepak decompressor

are needed to see this picture.

http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/

Page 11: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

11

Digital Objects (DOs)

• Born digital– Word processors (e.g., Word)– XML, LaTex, BibTeX, and other processors– Multimedia authoring and capture tools

• Digitized version of “real” object– Scanners, cameras, MRI, …– 3D models, datasets, …

• Renderings for presentation, preservation– PDF/A– ORE (Object Reuse and Exchange)

Page 12: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

12

Metadata Objects (MDOs)

• Dublin Core, and extension to ETD-MS

• RDF

• OAI (Open Archives Initiative) sharing

• MARC

• Crosswalks, mappings

• Ontologies (to aid classification)

Page 13: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

13

LOCKSS

• Lots of copies keep stuff safe

• Initially at Stanford (Vicky Reich)

• Initial focus on lower levels

• Initial content: journals

• Extending to ETDs (Gail McMillan)

Page 14: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

14

OAI - Open Archives Initiative

• www.openarchives.org

• Advocacy for interoperability

• Standard for transferring metadata among digital libraries

– Protocol for Metadata Harvesting (PMH)

• Standard for handling compound/complex objects like ETDs

– ORE

Page 15: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

15

OAI – Repository PerspectiveRequired: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

Page 16: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

16

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Page 17: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

17

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

Page 18: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

18

Software Options

• ETD-db (Virginia Tech; also in Spanish)

– Customized into ADT solution (Australia)

– http://scholar.lib.vt.edu/theses/presentations/ETDdb4Uppsala2007.ppt -> future

• Many local / commercial solutions

• Digital libraries or institutional repositories

– Eprints, Greenstone, Fedora/Fez, …

– DSpace (MIT, HP Labs)

Page 19: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

19

Page 20: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

20

Page 21: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

21

Page 22: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

22

Page 23: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

23

Institutional Repositories - 1

• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”

• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA

• www.arl.org/sparc/IR/IR_Guide_v1.pdf

Page 24: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

24

Institutional Repositories - 2

• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”

• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html

Page 25: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

25

Software Issues

• Be sure:– Can export metadata using OAI-PMH– Is a sustainable solution– Allows open access and preservation

• Request support for– Flexible workflow management

• Scope: just ETDs <-> institutional memory• Scope: time coverage -- authoring,

reviewing, submission, defense presentation

Page 26: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School/Library/IT

Page 27: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

Page 28: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

28

Union catalog: OCLC

• http://alcme.oclc.org/ndltd/servlet/OAIHandler?verb=ListSets (sets of ETDs)

• Is getting data from WorldCat (so, from many sites!).

• Will harvest from all others who contact them.

• Need DC and either ETD-MS or MARC.

Page 29: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

29

Page 30: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

30

Page 31: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

31

OCLC SRU Interface

Page 32: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

32

Page 33: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

33

ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)

Page 34: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

34

VTLS andContent Languages

The VTLS browse/search service has data in many different languages. These include: English German Greek Korean Portuguese

Page 35: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

35

Language = German; hits = 137

Page 36: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

36

Page 37: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

37

Page 38: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

38

ETDs: Library Goals • Improve library services

–Better turn-around time –Always available

• Reduce work –catalog from e-text –eliminate handling: mailing to ProQuest, bindery

prep, check-out, check-in, reshelving, etc.• Save space

Page 39: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

39

Page 40: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

40

The Concept Map:From learning tool to cross-

language knowledge discovery tool

Problem:• Finding interesting ETDs written in Language1 may

be difficult for Language2 speakers, and vice versa.• NDLTD has > 360,000 ETDs in > 12 languages.• Many TDs from the Spanish speaking world are not

yet in NDLTD, e.g., UNAM in Mexico City has 50,000+ ETDs .

• ETDs exist in many languages, but discovery and summarizing across languages is even more difficult.

Page 41: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

41

Cross-language Experiment - 1

English version of ETD by Saraiya

Page 42: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

42

Cross-language Experiment - 2

Spanish (automatic) translation of ETD by

Saraiya

Page 43: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

43

Cmap Study SummaryUsing• NLP tools and a domain-specific ontologyWe have been able to automatically produce concept maps for large documents (ETDs).

For the cross-language case, using• Phrase translations mined from ETD collection• Off-the-shelf MT toolsWe have been able to automatically produce & translate concept maps that allowed users to determine relevance of ETDs better than using machine-translated abstracts alone.Google will support further R&D.

Page 44: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

44

Problems Solved/Solvable

• Plagiarism

• Concern over quality

• Concern about publishers

• Intellectual property rights management

• Handling restricted works

• Pilot -> Recommendation -> Requirement

• Inertia, lack of vision/leadership

Page 45: 1 fox@vt fox.cs.vt/talks/2007/20071005MexicoNDLTD

45

Appeal

• Join NDLTD

• Move forward (in stages) so all theses and dissertations in Mexico lead to open ETDs.

• Make all metadata accessible through the NDLTD Union Catalog.

• Let NDLTD know how we can help!

• PREGUNTAS?