Moving Electronic Theses from
ETD-db to EPrints: The Best of Both Worlds
Betsy ColesTechnical Manager, Digital Library Systems
California Institute of [email protected]
Katherine JohnsonCODA Coordinator and Metadata Librarian
California Institute of [email protected]
CNI Spring 2010 Membership Meeting, Baltimore, MDApril 12, 2010
Background: CODAThe Caltech Collection of Digital Archives
(CODA) grew out of a longstanding commitment to scholarly communication and open access
Since its inception CODA has been integral to the Caltech Library’s missionNot a separate department or functionMany staff involved, from all levelsNo special funding – support is from general
operating budget
Background: CODAFirst repository: Computer Science Technical
Reports, using EPrints software from the University of Southampton (April 2001)
Thesis archive, using ETD-db software from Virginia Tech starting (July 2001)
In 2010: largest CODA repository is CaltechAUTHORS, with almost 15,000 non-thesis items
Background: Why ETD-db for Theses?
In 2001, there were not many options!
ETD-db was designed specifically for thesesThesis-specific metadataSupport for thesis workflow and approval processSupport for withholding theses, or parts of theses,
pending journal publication or patent application“Notifications” allowing staff to communicate with
authors via email
Background: ETD EvolutionVoluntary ETD submission for the first year
(2001/2002)
Mandatory submission for PhD students after July 2002
Electronic thesis is now the version of record and we are committed to its preservation
But we still keep and bind a paper copy
Background: From Paper to Bits
Retrospective conversion of older theses began in 2002
Library staff handle both scanning and submitting
No dedicated staff – it’s a “spare time” activity, and many staff participate
We are currently more than halfway through the backlog of pre-2002 theses
Background: Numb3rsNew theses: we add between 180 and 250 per
year
Retrospective conversion: we add about 500 per year
As of April 20105,518 electronic theses in collection1,433 of these were born digital
Background: Numb3rsUsage is high; most users come from Google
In March 201019,000+ website visits from 14,000+ unique usersMore than 22,000 document file downloads
(.pdf, .doc, and .ps)More than 4,300 supplemental file downloads
(video, data files, software, etc)
Problem: Multiple PlatformsAs of 2008, we were running ETD-db for theses
and EPrints for everything elseDuplication of effort for software maintenance and
system administrationStaff had to learn two systemsUsers had to search two interfaces
Problem: Resource CrunchIn 2008, we were operating with reduced staff
and limited resources (like everyone else)
Since we had no dedicated repository staff or funding, efficiency was crucial
Problem: Need for FlexibilityBy 2008, ETD-db software was aging
In contrast, the EPrints platform is in active development; EPrints 3 includes:Plugin architecture offering easy customization
and extension to support new features and protocols
Active development team and contributing user community worldwide
Much more ….
Goals: One PlatformWe wanted all our repositories on one platform
Greater efficiency would allow more forward development
Goals: WorkflowNeed to improve instructions and
documentation for students submitting theses and staff processing theses
Better understanding of thesis workflow
No disturbance of delicate and hard-won relationships with other campus organizations involved with theses (Graduate Office, academic departments)
Need to modernize our process for submitting theses to Proquest/UMI
Goals: TechnicalRetain useful features of ETD-db
Thesis-specific metadata fields and search capabilities (committee, major/minor options, etc.)
Ability to communicate with authors, via email, from within the system interface
Special limited-access categories for thesis materials (restricted, withheld), at the file level as well as the record level
Support for the complex thesis-approval workflow
Goals: TechnicalAdd brand-new features
New metadata elements including advisor(s), major and minor field, funders, additional dates, references, internal notes, and others
Ability to store and identify related documents (copyright permissions, signed thesis forms, etc.) in a hidden part of the record
Expanded ability to track theses through the complex approval process
Additional automatically generated emails
The Plan: Outsourced Elements
EPrints Services in Southampton would createMetadata conversion scripts and metadata and
data migration scriptsNew email trigger function in EPrintsNew complex data structure for degree-granting
departmentsNew functionality to accommodate “hidden”
documents, e.g. permissions letters, signed thesis forms
The Plan: In-House ElementsCaltech Library Services staff would do
Metadata analysis and modificationsAnalysis and customization of the user interfaceCustomization of the system “workflow” – the
movement of theses through the stages of approval
Migration of persistent URLS within the Caltech Library’s locally developed PURL system
The Plan: In-House ElementsLocal staff would also
Customize and localize screen text and help textCreate a new web guide for student submittersWrite documentation for library staff using the new
system
The Plan: Timeline6 month timespan allocated for migration
(March though August 2009)
Informal scheduling process: timeline and tasks list maintained on the library wiki
Schedule did slip by one month, but project was complete by the beginning of the academic year (Sept. 2009)
Process: PeopleInitial task group
the coordinator for the ETD-db repositorythe programmer/system administrator for the
digital repositoriesone subject liaison librarian with extensive CODA
involvementThe Metadata Group support staff person who
processes submitted theses
Process: PeopleGroup expanded gradually as project
progressed:Other subject librarians reviewed progress,
especially interface issuesStaff were asked to test specific featuresFinal testing phase was open to all library staffFeedback was received from a wide range of staff,
from the University Librarian to circulation desk staff
Process: What & WhereEPrints Services staff in Southampton were able
to fit their work into our timeline. Code was often delivered ahead of schedule
Integration of contracted code was done at Caltech
Testing of system components, migration process, and user interface was also done locally
Process: TestingWe used a “staged” process to test the
conversion and migrationMultiple rounds of testing the migration process,
with a larger number of records each time and a larger number of testers
Final test was a full “dress rehearsal” of the complete migration process.
We wish we had had time for formal usability testing, but we didn’t
Process: ArrivalThe actual migration was done on a weekend
Thesis submission was unavailable for 48 hours, but public search interface was up
Actual conversion process took about 8 hours
The remainder of the weekend was devoted to reviewing and testing the results
CaltechTHESIS was open for business Monday morning, as planned
Outcome: Immediate Feedback
First thesis was submitted by a student within hours
We emailed the submitter:
“Congratulations! You are the first student to have deposited his thesis into the new CaltechTHESIS database. Would you mind giving us some feedback on your experience? Ease of use, problems encountered, confusion?”
Outcome: Immediate Feedback
The student’s reply:
“Thanks! I was wondering when this had changed, realized it must have been recent. I found the submission quite easy, it took me only a couple of minutes for the whole process.”
Outcome: The View from HereOur experience in the months since we “went
live” with CaltechTHESIS has confirmed our initial impression that we now have a modern, flexible system that provides a better user experience and smoother process for both students and library staff.
Outcome: SpecificsGoals met – we now have
A better and more easily supported technical system
More efficient thesis processing by library staffA better user experience for
Students submitting thesesLibrary staff processing thesesSearchers worldwide
Outcome: On the HorizonNo, we’re not “done.” To-do list:
Data cleanup remaining from the conversion (fairly minor), and filling in the new metadata fields added as part of the migration project
Export plugin for Proquest/UMI’s XML metadata format (currently being tested)
ETD-MS format for OAI harvesting (in process)User interface tweaks and improvements ( a
never-ending task!)
Outcome: On the HorizonMore “to-do’s”
Upgrade to recently-released EPrints v. 3.2Upgrade to new faster hardware and Red Hat 5
64-bit Linux operating system Implement available EPrints add-ons:
IRSTATS statistics moduleDROID/PRESERV plugin to support preservation
status monitoring
Outcome: On the HorizonEven more “to-do’s”
Perhaps most important: complete the task of documenting the migration in technical terms and uploading migration scripts into the EPrints wiki, so that others may make use of what we’ve done.
Lesson: Best of Both Worlds?We avoided the “buy vs. build” dilemma by
contracting out specific parts of the migration development work to experts, while using our own resources where our local skill set could be put to best use and where local control was crucial to success.
Lesson: Best of Both Worlds?We now have, in EPrints 3, a single, full-featured
repository platform for all of our institutional materials, but we haven’t lost any of the valuable functionality of the older system.
The FutureWe look forward to beginning our second
decade of institutional repository management with a strong and flexible foundation.
Links CaltechTHESIS – http://thesis.library.caltech.edu
CaltechAUTHORS – http://authors.library.caltech.edu
CODA – http://library.caltech.edu/digital
Thesis workflow planning document: http://library.caltech.edu/etd/System_Independent_Thesis_Workflow.pdf
Web guide for student submitters –http://libguides.caltech.edu/theses
This Presentation: http://resolver.caltech.edu/CaltechLIB:2010.001
More Links EPrints software – http://software.eprints.org
EPrints Services – http://www.eprints.org/services/
ETD-db software – http://scholar.lib.vt.edu/ETD-db/developer/index.shtml
Top Related