Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

17
Michael Krot, Data Manager and David Yakimischak, CTO [email protected], [email protected] http://www.jstor.org

description

Michael Krot, Data Manager and David Yakimischak, CTO [email protected], [email protected] http://www.jstor.org. JSTOR Mission. - PowerPoint PPT Presentation

Transcript of Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Page 1: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Michael Krot, Data Managerand

David Yakimischak, CTO

[email protected], [email protected]://www.jstor.org

Page 2: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor
Page 3: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor
Page 4: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

JSTOR Mission

• JSTOR is a not-for-profit organization with a mission to help the scholarly community take advantage of the advances in information technology. This includes: (1) building a reliable and comprehensive archive of core scholarly journals, and (2) dramatically improve access to this scholarly material

• In pursuing its mission, JSTOR takes a system-wide perspective, seeking benefits for libraries, publishers and scholars

Page 5: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Currently

Over 1,000 U.S. Participating SitesOver 700 International Participating SitesOver 200 Participating PublishersOver 300 PublicationsBroad coverage of disciplines14 million pages scanned (and counting)

(average 10,000 pages per day)

Page 6: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Monthly Usage

Meaningful Accesses

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

20,000,000

Jan-9

7

Apr-9

7Ju

l-97

Oct-97

Jan-9

8

Apr-9

8Ju

l-98

Oct-98

Jan-9

9

Apr-9

9Ju

l-99

Oct-99

Jan-0

0

Apr-0

0Ju

l-00

Oct-00

Jan-0

1

Apr-0

1Ju

l-01

Oct-01

Jan-0

2

Apr-0

2Ju

l-02

Oct-02

Jan-0

3

Apr-0

3Ju

l-03

Oct-03

Page 7: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

OAI-PMH Project Background

• JSTOR has shared metadata for some applications

• However we use proprietary data formats and transmission methods

• OAI-PMH had the right characteristics• But, we are re-writing our system• Gave us a chance to learn new techniques• Forced the separation of server from data

Page 8: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Purpose of this Presentation

• Overview of JSTOR OAI-PMH System

• Constraints

• Process

• Design

• Sharing our observations

Page 9: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Constraints

• Large amount of data (2.5 million articles)

• Content restricted by subscription

• Authorization System in transition

• Metadata store in transition

• Code must be sharable with others, in Java

• Lots of uncertainty!

Page 10: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Process

• Initial Requirements Gathering

- No existing software for our needs

- Current JSTOR System inadequate

• Unified Process/UML

• Outside advisors (Object Insight)

• Create pluggable parts to handle uncertainty

Page 11: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Use Case Diagram

Page 12: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Initial Steps

• ‘Retrieve Bibliographic Records’ Use Case• Use cases gave insight into Search/Auth

requirements– Repository would have to handle

increments,counts– Auth would have to know about harvester sets

• Use Case Analysis using Collaboration Diagram (MVC)

Page 13: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Retrieve Bibliographic Records Collaboration Diagram

Page 14: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

View of Participating Classes

Page 15: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

J2EE Design Patterns

Page 16: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Current Issues, Questions

• What's “new to repository” vs. “new to subscription”

• Resumption Tokens

• Compression

• Associating metadata formats with types of objects returned from search

• Development nears completion

Page 17: Michael Krot,  Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor

Conclusion

• Constraints, Process, Sharing our Findings• Load Testing has been helpful• Internal Use First• How and when to introduce externally• Possibility of sharing code, UML• Need for a harvester, internally and externally• Paper Available• [email protected], [email protected]• Questions?