Developing a model for e-prints and open access journal ... · stances, as e-print archives,...

16
I ntroduction In this article we describe a delivery, management and access model for e- prints and open access journal content for UK further and higher education com- missioned by the Joint Information Systems Committee (JISC). The target content is (i) e-prints – digital copies of academic research articles published in subscription-based journals that are made available online to permit increased access; and (ii) articles published in open access journals. The proposed service would provide immediate and maximal access to scholarly research, supplementing the more limited access provided by subscription-based journals, in turn maximizing the impact of research. Other benefits accrue from such a system too. It would enable the generation of stand- ardized online CVs for each institution’s researchers and these could be used for evaluation purposes – internally within the institution or for external purposes such as the UK’s national Research Assessment Exercise. A nationally organized service in the UK for the delivery of e-prints and open access journal content to the scholarly community would therefore be an important development. There are two ways for researchers to provide open access for their work – by publishing their articles in open access journals (or in hybrid journals that will provide open access to individual articles for a publication fee) or by depositing (‘self-archiving’) copies (‘e-prints’) of their subscription-journal articles in open archives (known variously, depending on circum- stances, as e-print archives, institutional archives or institutional repositories). Al- though often used interchangeably, we use the term ‘institutional archive’ here in preference to ‘institutional repository’. This is in part because the term ‘archive’ is used Developing a model for e-prints and open access journal content in UK further and higher education Alma Swan Key Perspectives Ltd Paul Needham Cranfield University Steve Probets, Adrienne Muir, Charles Oppenheim, Ann O’Brien, Rachel Hardy and Fytton Rowland Loughborough University Sheridan Brown Key Perspectives Ltd © Alma Swan, Paul Needham, Steve Probets, Adrienne Muir, Charles Oppenheim, Ann O’Brien, Rachel Hardy, Fytton Rowland and Sheridan Brown 2005 ABSTRACT: A study carried out for the UK Joint Information Systems Committee examined models for the provision of access to material in institutional and subject-based archives and in open access journals. Their relative merits were considered, addressing not only technical concerns but also how e-print provision (by authors) can be achieved – an essential factor for an effective e-print delivery service (for users). A ‘harvesting’ model is recommended, where the metadata of articles deposited in distributed archives are harvested, stored and enhanced by a national service. This model has major advantages over the alternatives of a national centralized service or a completely decentralized one. Options for the implementation of a service based on the harvesting model are presented. Learned Publishing (2005)18, 25–40 Paul Needham Alma Swan Developing a model for e-prints and open access journal content in UK further and higher education 25 LEARNED PUBLISHING VOL. 18 NO. 1 JANUARY 2005

Transcript of Developing a model for e-prints and open access journal ... · stances, as e-print archives,...

Page 1: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

Introduction

In this article we describe a delivery,management and access model for e-prints and open access journal content

for UK further and higher education com-missioned by the Joint Information SystemsCommittee (JISC). The target content is (i)e-prints – digital copies of academic researcharticles published in subscription-basedjournals that are made available online topermit increased access; and (ii) articlespublished in open access journals. Theproposed service would provide immediateand maximal access to scholarly research,supplementing the more limited accessprovided by subscription-based journals, inturn maximizing the impact of research.

Other benefits accrue from such a systemtoo. It would enable the generation of stand-ardized online CVs for each institution’sresearchers and these could be used forevaluation purposes – internally within theinstitution or for external purposes such asthe UK’s national Research AssessmentExercise. A nationally organized service inthe UK for the delivery of e-prints and openaccess journal content to the scholarlycommunity would therefore be an importantdevelopment.

There are two ways for researchers toprovide open access for their work – bypublishing their articles in open accessjournals (or in hybrid journals that willprovide open access to individual articlesfor a publication fee) or by depositing(‘self-archiving’) copies (‘e-prints’) of theirsubscription-journal articles in open archives(known variously, depending on circum-stances, as e-print archives, institutionalarchives or institutional repositories). Al-though often used interchangeably, we usethe term ‘institutional archive’ here inpreference to ‘institutional repository’. Thisis in part because the term ‘archive’ is used

Developing a model

for e-prints and

open access journal

content in UK

further and higher

educationAlma SwanKey Perspectives Ltd

Paul NeedhamCranfield University

Steve Probets, Adrienne Muir, Charles Oppenheim,Ann O’Brien, Rachel Hardy and Fytton RowlandLoughborough University

Sheridan BrownKey Perspectives Ltd

© Alma Swan, Paul Needham, Steve Probets, AdrienneMuir, Charles Oppenheim, Ann O’Brien, RachelHardy, Fytton Rowland and Sheridan Brown 2005

ABSTRACT: A study carried out for the UK JointInformation Systems Committee examined models for theprovision of access to material in institutional andsubject-based archives and in open access journals. Theirrelative merits were considered, addressing not onlytechnical concerns but also how e-print provision (byauthors) can be achieved – an essential factor for aneffective e-print delivery service (for users). A ‘harvesting’model is recommended, where the metadata of articlesdeposited in distributed archives are harvested, stored andenhanced by a national service. This model has majoradvantages over the alternatives of a national centralizedservice or a completely decentralized one. Options for theimplementation of a service based on the harvesting modelare presented.

Learned Publishing (2005)18, 25–40

Paul Needham

Alma Swan

Developing a model for e-prints and open access journal content in UK further and higher education 25

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 2: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

in many official names (e.g. InstitutionalArchives Registry, Open Archives Ini-tiative) and in part because it reflects anactivity (authors ‘self-archive’ their work –they do not ‘self-reposit’). Most importantly,though, the use of the term repository is nowgenerally coming to denote something morethan an e-print archive; rather, an insti-tutional collection of material that containsfar more than e-prints, such as grey literature,institutional-specific digital collections andso on. Since the remit of our study was todevelop a model for the delivery andmanagement of e-print and open accessjournal content only, the term institutionalarchive is the most accurate and appropriate.

The JISC commissioned the study as partof an overall programme on open accessin the UK and beyond. The brief was toforecast a delivery, access and managementmodel for e-prints and open access journalcontent within the UK further and highereducation communities. The study tookplace during the same period as the inves-tigation into scholarly scientific publicationby the UK House of Commons SelectCommittee on Science & Technology1 andthe two reported almost simultaneously.They were followed very shortly thereafterby the recommendations of the NationalInstitutes of Health in the USA.2

An article that summarizes all the findingsof our study, including preservation issues,legal issues and some aspects of the costsinvolved in setting up and running e-printarchives, is in press3 and the full report hasbeen published by the JISC.4 This presentarticle specifically focuses on the model wedevised and the reasons why we chose thisone above the other possible options.

The open access material to be delivered

For the purposes of the study (and for thisarticle) an e-print was defined by the JISC as

. . . a digital duplicate of an academicresearch paper that is made availableonline as a way of improving access to thepaper. E-prints are divided into preprints(papers that are circulated before theyhave been formally approved for pub-

lication), and postprints (papers that havebeen approved for publication).

It was specified that the model shouldencompass both e-prints and the content ofopen access journals (where this can beharvested). Harvestable articles – bothe-prints and open access journal articles –are those that are OAI-compliant, that is,their metadata (bibliographic records) areexposed in the form laid down in the OpenArchives Initiative Protocol for MetadataHarvesting (OAI-PMH).5 So long as theirexposed metadata are OAI-compliant, theycan be harvested by OAI service providerswhose databases are then searchable byusers who are pointed at articles of interest,wherever those articles reside. There are anumber of OAI service providers in exist-ence. Perhaps the best known examplesare OAIster6 (University of Michigan) andCitebase7 (University of Southampton).Users type in a search term (author name,keyword, etc.); the software searches meta-data already harvested from all availableopen access OAI-compliant archives, andreturns a list of appropriate articles withlinks to their full text. Users then access thefull text at its original location.

With respect to open access journals, thecurrent situation (Oct 2004) is that theDirectory of Open Access Journals (DOAJ8)lists 1,277 titles, of which 324 are harvest-able at the article level. Most, but not all,of these titles are published by BioMedCentral9 or the SciElo10 project.

Existing e-print material resides in openarchives which can take two forms – central-ized, subject-based archives, or distributedarchives located at research-based institu-tions around the world. Two well-knownand long-established examples of subject-based archives are arXiv,11 set up in 1991and covering physics, mathematics and relateddisciplines; and Cogprints12, set up in 1997and covering cognitive sciences (psychology,neuroscience, linguistics and related areas).

Some figures may help to illustrate activ-ity in these two subject-based archives. arXivcurrently houses some 300,000 digital itemsand is accessed approximately 1.5 milliontimes per month by users. Cogprints cur-rently contains around 2,000 items. Almost

repository isnow generally

coming todenote

somethingmore than an

e-print archive

26 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 3: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

half of these (985) are postprints (journalarticles that are either published or in press);there are also 377 preprints, 283 conferencepapers, 44 conference posters and 234 bookchapters.

As well as subject-based e-print archives,distributed broad-based archives have beenset up in universities and research institu-tions around the world. There are nowseveral hundred of these. OAIster iscurrently harvesting from 351 archives(institutional, subject-based and open accessjournal archives) and has over 3.5 millionrecords in its database. Not all of theserecords will be e-prints, however, as OAIsterharvests other types of digital object as well(see below) and from archives that do notcontain e-prints.

The Institutional Archives Registry13

currently lists 227 archives housing e-prints(along with other types of digital object inmany cases); 33 of these are in the UK. Ofthe total, 121 are archives that are based atan individual institution and contain researchoutput material (i.e. e-prints) and theircontent generally reflects the broad scope ofscholarly activity at those institutions; thatis, it covers many subject areas. Some aresub-institutional or departmental archives,however, which usually cover a singlesubject or discipline. Of the total number,32 are archives with e-print content butwhich are cross-institutional and in generalthese tend to be subject-based, though notexclusively, since there are some broad-scope archives formed by collaboratinginstitutions. An example of this type is theWhite Rose Consortium e-Prints Repository,a collaborative project by the universities ofYork, Sheffield and Leeds.

In May 2004 there were just short of25,000 articles in the 20 e-print archivesharvested by the RDN/e-Prints UK14 project.One had no articles at all while at the otherextreme the open access publisher BioMedCentral’s archive contained over 12,000from the circa 120 journals it publishes.By far the best-populated university-basedarchive is the University of Southampton’sECS EPrints service with 8143 articles. Twoother Southampton-based archives also hadreasonably high numbers of articles: e-Prints

Soton had 758, and Psycprints (a journalarchive) had 720.

The types of digital object collected andstored in archives

While some archives concentrate only one-prints, others may house a considerablyvaried selection of digital objects, some ofwhich may be very specific to local require-ments. For our study, the JISC had specifiedthat it required models for the access anddelivery of just two types of digital object –e-prints (preprints and postprints) and openaccess journal articles. These are shown inbold type in the list below. Nevertheless, wekept in mind that other types of object arearchived and that in time the inclusion ofthese other types of object might be deemeddesirable. Some examples of the types ofdigital item that might be stored in archivesset up by UK educational institutions are:

� Preprints� Postprints� All drafts and working papers plus

corrigenda (i.e. a trail from first draft tothe postprint: this is sometimes referred toas the ‘low threshold’ model)

� Ancillary data from research, e.g. video,audio, large datasets. Some archivesaccommodate these types of data, whichcannot be published in a traditional peer-reviewed print journal, because there ismerit in them being made available toother researchers, and in being preserveddigitally in a formal way

� Books and monographs� Non-published digital objects:

– Teaching materials– Collections (music, images, etc.)– Research output from specialised sub-ject fields, such as performing arts, whereoutput is usually in the form of perfor-mance, video or audio– Dissertations and theses– Multimedia items– Local institutional ‘events’, e.g. perfor-mances, lectures, exhibitions

� Open access journal articles

Although the scope of the study did notinclude ‘grey literature’, we were certain thatit should not be ignored, not least because

broad-basedarchives havebeen set up inuniversitiesand researchinstitutionsaround theworld

Developing a model for e-prints and open access journal content in UK further and higher education 27

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 4: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

London Book Fair col ad

Page 5: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

Charlesworth col ad.

Page 6: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

preprints submitted to journals and thensubsequently rejected may end up forming apermanent piece of grey literature (in fact,the DAEDALUS project formally groupspreprints with grey literature15). Conversely,many e-print archives contain much contentin the form of reports (conventionally classedas grey literature) despite many definitionslimiting e-prints to preprints and postprints.

Although grey literature was outside thescope of the study, we addressed it for thereasons above. Readers of this article mayfind a useful introduction to the grey liter-ature landscape in the UK in the MAGiC(Managing Access to Grey LiteratureCollections) Final Report.16 In short, theMAGiC project, which was sponsored by theBritish Library and the Research SupportLibraries Programme, was proposed to dealwith the paucity of grey literature cataloguingand proposed the creation of a national greyliterature service built around the OAIharvesting model, which would incorporateboth electronic and hardcopy (legacy) docu-ments.

The form and format of digital objectscollected and stored

For the purpose of our study an e-print wasdefined as an article published in the schol-arly literature, given away free by its author.In other words the objects archived aregoing to be, in the main, standard journalarticles, perhaps accompanied in the archiveby additional supporting material such as thelarge datasets generated in some branches ofthe sciences, or video or audio clips.

In some disciplines, however, researchoutput may frequently take other forms. Forinstance, in the performing arts output isoften in the form of a performance. In thiscontext this presents additional issues forthe archiving of research results: videorecords of performance use large amounts ofdigital storage space, for example. In the artsand humanities, too, although scholars dopublish work in traditional journals, there isalso a large volume of output in the form ofmonographs. These may be archived in thesame way as journal articles, but there maybe certain differences. Monographs mayhave multiple authors, each contributing a

chapter and possibly from different institu-tions, perhaps requiring separate depositionand submission policies. In general, too,monographs tend to be much larger docu-ments than journal articles, so there is againa space implication. Finally, it is fairlycommon for monograph authors to be paidroyalties by the publisher and though theseare usually small, they nonetheless representpayment, so in these cases this is not‘giveaway’ literature.

The range of data formats permitted byexisting e-print archives varies from onearchive to another. Many e-print archives,for example the University of Oxford’sOxford E-Prints, upload text-based docu-ments only in PDF format, whereas anotherexample, the University of Glasgow’seprints@Glasgow, accepts (and stores)digital objects in a much wider range offormats.

The global picture

There is something of a global race takingplace with respect to achieving open accessvia national policies on self-archiving.Initiatives are in place in India, Norway,The Netherlands, Germany, Canada, Scot-land and France, among others, of whichAustralia is an exemplar: the governmentgave funds of AU$12 m. in Oct 2003 tomake ‘Australia’s research information . . .more easily accessible and better managed’.17

The country’s major research universities allhave institutional archives, and the Depart-ment for Education, Science and Training(DEST) has decided that a national linked-up approach is the best way forward. It nowsupports four projects covering 15 Australianuniversities, Australian and internationallibraries, representatives from industry andvarious international organizations. TheAustralian Partnership for SustainableRepositories (APSR) has been set up and isworking through the Australian NationalUniversity’s Centre for Sustainable DigitalCollections to develop a national researchinfrastructure through broad, archive-basedarchitecture which will ensure access con-tinuity and the sustainability of digitalcollections, and facilitate national co-ordin-ation and international linkages.

the creation ofa national grey

literatureservice

30 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 7: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

The e-print situation in the UK

There are some projects and programmesalready in operation related to nationaldelivery of e-prints in the UK. The situationis promising but lacks co-ordination, con-sisting of a series of linked pilot projects anda number of already established institutionale-print archives and, as yet, no involvementin e-print archiving by the British Library.There are three particularly significantnational initiatives operating.

ePrints UK

The ePrints UK18 project is developing aseries of national, discipline-focused servicesfor e-prints from compliant open archiverepositories, particularly those provided byUK universities and colleges. The interfacewill be provided through OCLC and will use‘name authority’ and ‘citation analysis’Web services (offered by OCLC and theUniversity of Southampton, respectively) toenhance the metadata harvested fromavailable archives. With respect to the studywe are reporting here, it is significant thatePrints UK already also harvests metadatafrom a number of e-journal repositories,demonstrating that integration of metadatafrom journal articles and e-prints is apractical and achievable proposition.

SHERPA

The SHERPA19 project is initiating thedevelopment of openly accessible institu-tional digital collections of research outputin a number of universities. Among theissues the project is investigating are intel-lectual property rights, quality control, otherkey management issues associated withmaking the research literature freelyavailable to the research community andtechnical aspects of such a system, includinginteroperability between repositories and thedigital preservation of e-prints.

FAIR

The Focus on Access to Institutional Re-sources20 programme, funded by the JISC,has a mission ‘to evaluate and exploredifferent mechanisms for the disclosure andsharing of content (and the related chal-

lenges) to fulfil the vision of a web ofresources built by groups with a long termstake in the future of those resources, butmade available to the whole community oflearning.’ The JISC Information Environ-ment is a virtual location where authors candeposit and share useful content (e.g.research outputs) and it is envisaged thatthis will join the current collection of JISC-funded content, which has the potential toinclude externally generated content frompublishers and aggregators as well.

Other issues pertaining to a model for anew service in the UK

There were other issues to investigate andconsider as we deliberated on the develop-ment of a suitable model. These fell into twocategories – technical and ‘cultural’ – andare summarized below.

Technical issues

The technical aspects we considered were:

Software

Should a new system run on one of theavailable software packages or should a new,bespoke package be developed? There areseveral open-source (free) software packagesavailable for running open archives. Thebest known are DSpace, developed by MIT,and EPrints, developed at SouthamptonUniversity. Both Eprints and DSpace offerinteroperability via the OAI-PMH. DSpaceuses persistent identifiers that, unlike ordin-ary URLs, do not change when the physicallocation of the digital item alters. OtherOAI-compliant software systems of note areCDSware, developed by CERN; and Fedora,developed jointly by the University ofVirginia and Cornell University, with fund-ing from the Andrew W. Mellon Foundation

Preservation policies

Among the various issues that we identifiedhere as having implications for a nationale-prints service were: what should happen ifan author wishes to withdraw an article;how to handle and track repeated revisionsof an article after it is first deposited; and

what shouldhappen if anauthor wishesto withdrawan article

Developing a model for e-prints and open access journal content in UK further and higher education 31

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 8: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

Portland press col ad

Page 9: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

what constitutes the final version and howthis can be indicated.

The technical costs and resources involved inestablishing archives

From this technical viewpoint, the maincosts will arise from the initial outlay on ITequipment and staffing, and from ongoingcosts for these. At this stage it was difficultto assess potential costs for a new agencysince we had no idea of the structure andoperational requirements of such a body.We did, however, determine the real costs ofsetting up and maintaining archives at fouruniversities, which would provide the JISCwith approximate-figure data on the nation-wide cost of establishing e-print archives atall higher and further education institutions.These figures are shown in Table 1. All costsare approximate, due to currency conver-sions, and are rounded.

Cultural issues

‘Cultural’ includes political and business-related issues in this context. There hasbeen some useful discussion already in theliterature suggesting that cultural changewill be necessary before self-archivingbecomes the norm.21 Some of the moreconcrete issues that we needed to take intoaccount were the following.

Institutional attitudes to archives

Although there are a number of institutionsin the UK that have set up archives, mostresearch-led establishments have not yetdone this. The likelihood of them doing so,and within a reasonable period of time, was

pertinent to which model we finally decidedupon. There are several advantages to insti-tutions in establishing open access archives.First, open access accelerates and enhancesthe impact of scholarly research.22 Second,it enables improved methods of impactmeasurement and analysis which in turn cangenerate better scientometric performanceindicators for research productivity, usageand impact. Third, it also enables thegeneration of standardized online CVs foreach institution’s researchers and these canbe used for internal as well as external (suchas the UK’s national Research AssessmentExercise23) evaluation purposes. Fourth, ithelps to monitor and enable the fulfilmentof any research-council funding require-ments. Also, at the same time as we reportedour study in full, the recommendations ofthe House of Commons Select Committeeon Science & Technology were published1

and these include the following points:

43. Institutions need an incentive to setup repositories. We recommend that therequirement for universities to dissem-inate their research as widely as possiblebe written into their charters. In addition,SHERPA should be funded by DfES toallow it to make grants available to allresearch institutions for the establishmentand maintenance of repositories.44. Academic authors currently lacksufficient motivation to self-archive ininstitutional repositories. We recommendthat the Research Councils and otherGovernment funders mandate their fundedresearchers to deposit a copy of all theirarticles in their institution's repositorywithin one month of publication or a

Table 1 Examples of actual costs incurred for setting up and maintaining institutional archives

University Set-up costs(£)

Annual running costs (£)

MIT (using DSpace software) 1.3 m. 160,000Queen’s University (Canada) QSpace

(using DSpace software)22,750 22,250

National University of Ireland, Maynooth(using Eprints software)

17,500 26,250

Nottingham University (using Eprints software) 3,900 31,250 (includes provision for atriennial upgrade of hardware andsoftware)

culturalchange will benecessarybefore self-archivingbecomes thenorm

Developing a model for e-prints and open access journal content in UK further and higher education 33

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 10: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

reasonable period to be agreed followingpublication, as a condition of their re-search grant.

If these recommendations are followed up bythe government, archives will be set up byevery university and research-led institutionin the UK.

The populating of archives

Where archives exist today – and withnotable exceptions – in the main they arerather sparsely populated with e-prints.Administrators or champions of existingarchives have tackled the problem in anumber of ways – sustained advocacy bycampaigns, demonstrations, presentationsand seminars being the main route. Support– tacit or actual – from the pro-vicechancellor (PVC) or provost responsible forresearch policy is crucial.

Author inertia is the main enemy of ane-print archive once it is established. Thealternative to the ‘author chooses to comply’model is to mandate self-archiving. To date,there are a few educational institutions thathave gone so far as to mandate that theirauthors deposit copies of all their researcharticles in the institutional e-print archive.24

There are also examples of departmentalmandates, one such being the School ofElectronics and Computer Science at theUniversity of Southampton, which hasproduced a policy that could be used byother departments travelling the sameroute.25 To allay fears about the process ofself-archiving and its legality, Eprints.org hasproduced a FAQ26 and a handbook27 on thesubject.

The agreements that authors have withpublishers and with archives

Are there any restrictive licensing arrange-ments with publishers or exclusive agreementswith archives that hamper or deter authorself-archiving activity? Many publishershave now officially endorsed the practice ofauthors self-archiving the articles publishedin their journals in the author’s own insti-tutional archive. At the time of writing, over70% of the (103) publishers surveyed byEprints.org have adopted this policy,28 and

92% of the (8853) journals surveyed are‘green’ (i.e. they endorse author self-archivingof either the preprints or postprints ofarticles).29

The management costs and resources involved inestablishing archives

These include the staff resources requiredfor planning, promoting and training. Harddata were difficult to come by, largelybecause in institutions where archives havebeen established these sorts of costs havebeen absorbed into existing activities andprovisions, but it was clear from our dis-cussions that there is a real and non-negligiblecost element here.

With this background information in place,we set about determining the candidatemodels for e-print delivery, managementand access.

The models we considered

There are three basic models that can sup-port access to metadata and the associatedscholarly digital resources:

� Centralized model both metadata and theresources themselves are deposited directlyin a central archive.

� Distributed model – all metadata andresources remain in their source archives,and metadata are cross-searched ‘on thefly’.

� Harvesting model (a hybrid model) –metadata are harvested into a centralsearchable archive but also remain dis-tributed among the original archives.

The centralized model

In this model, users (authors) deposit theire-prints in a central archive. This wouldhave a service provision component of itsown, providing the interface through whichusers (readers) search, browse and retrievethe articles they require. The metadata ofarticles in the archive would also be exposedvia OIA-PMH (and any other protocols thatplay the same role; specifically, in our study,we also included RSS and SRW/SSU aspossibilities in this respect) for use by otherservice providers. The configuration of thismodel is shown in Figure 1.

a real andnon-negligible

cost elementhere

34 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 11: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

The distributed model

Under this model, the service would searchall available archives, metadata would beobtained in real time as the user madehis/her request, and the user would bepointed at the digital resource (the article)located in its distributed archive. The modelis configured as in Figure 2.

The ‘harvesting’ model

Under this model, the proposed new UKservice (the service provider) would harvestand store metadata from available e-printarchives and open access journals (the dataproviders), using the OAI-PMH. It wouldhave a service provision component of itsown, which would provide the interfacethrough which readers would search, browseand retrieve articles. We included anadditional element here – the metadata from

these articles would also be exposed viaOAI-PMH, SRW/SRU and RSS for use byother service providers. Figure 3 illustratesthe harvesting model diagrammatically.

A detailed account of the technicalrequirements of each of the models appearsin an appendix to the full report.4

Each of these models had arguments forand against it. We needed to weigh these upbefore we settled on which model would bebest for a new UK e-prints service to adopt.In considering the relative merits of themodels we addressed not only technicalconcerns but also the cultural issuesdiscussed earlier, and especially how e-printprovision (by authors) can be achieved,since without this content provision therecan be no effective e-print delivery service(for users). The points for and against eachof the candidate models are summarized inTable 2.

Figure 1 The centralized model.

Figure 2 The distributed model.

each of thesemodels hadarguments forand against it

Developing a model for e-prints and open access journal content in UK further and higher education 35

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 12: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

The model we recommended

For technical and cultural reasons, the studyrecommended that the centralized modelshould not be adopted for the proposed UKservice. First, this would have been thecostliest option. Second, it would haveomitted the growing body of content indistributed institutional, subject-based andopen access journal archives. Third, thecentral archiving approach is the ‘wrong wayround’ with respect to e-print provision (seebelow) and would therefore not provide soeffective a route to a critical mass of e-printmaterial as the other models.

The distributed model had some distinctadvantages over the centralized model inrespect of the points above, but oper-ationally it did not allow the kind of qualityof service that we believed was possible toattain. In particular, the consistency andquality of metadata was out of the control ofthe agency that would run the new service,yet the quality of metadata is profoundlyimportant to the overall usefulness andeffectiveness of such a service. This model,we felt, was adequate but not optimal.

It was clear to us that the harvestingmodel had the greatest promise. Not only isit compatible with the technical require-ments and capabilities we had identified,and provides the means to standardize,improve and enhance the metadata and,hence, service level, but it also provides themost effective means of overcoming one ofthe major ‘cultural’ obstacles, which is theprovision of e-print content in the first place.

One of the critical aspects of our decisionwas that any model for delivering e-printsmust operate in, and help to create, thearena most likely to generate the maximumamount of e-print content-provision byauthors. Since this issue is so important, it isworth developing the argument further here.

Two things have a bearing on the level ofauthor self-archiving – archives being avail-able for authors to use and authors actuallyarchiving their articles. From the evidencewe looked at – existing archives – it wasclear to us that even when archives areavailable there is still precious little peer-reviewed material yet being deposited, so itis author behaviour that is at the very rootof this matter. How may authors be ‘encour-aged’ to self-archive? The evidence showsthat while the carrot of increased visibilityand impact does prompt a proportion ofauthors to archive their work, ‘encourage-ment’ would best also take the form of astick by self-archiving being made mandatoryas a condition of funding or employment.

There are few examples of such mandatesin operation as yet (though where they exist,they are working25), but plenty of promisefor those to come. A recent study on openaccess publishing produced clear evidencethat authors have, in general and in prin-ciple, no objection to self-archiving and willcomply with a mandate to do so from theiremployer or research funder: in total 77% ofauthors would comply with such a mandate(69% would do so willingly), while only 3%said they would not comply.30,31

Figure 3 The harvesting model.

it was clear tous that theharvesting

model had thegreatest promise

36 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 13: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

Table 2 Advantages and disadvantages of alternative models

Advantages Disadvantages

Centralizedmodel

The agency running the service would:– have overall administration of the whole process, fromarticle deposition through to the user interface– be able to standardize the protocols used– be able to select the archive software that provided themost appropriate set of storage and output capabilities– be able to manage preservation issues– be able impose requirements for the format in whicharticles are deposited– be able to develop facilities that maximize searchcapabilities (categorization of the data, subjectclassification, etc.)– be able to establish an overall programme of continuingdevelopment and improvement

With all administrative and maintenance functionscentralized, it is an expensive optionIt ignores the existence of, and renders useless,already-established institutional and subject-based archivesCreating a scheme for nationwide author deposition ofarticles within or across disciplines in one centralpan-disciplinary archive, or multiple central disciplinaryarchives, would be extremely difficult if not impossible, forpolitical or cultural reasonsIt is not reasonable or practical to expect open accessjournal publishers to submit articles they publish directlyinto a central archive

Distributedmodel

No replication of metadata is requiredThe metadata retrieved are always currentIt provides a consistent look and feel for searching andretrieving metadata from heterogeneous sourcesIt is relatively cheap to implement compared to acentralized solution

The model does not permit any improvements to be madein the management of e-prints and open access journalsIt does not permit enhancements to the metadata, becausethese are only grabbed at the time of need (when the usersearches)As the number of sources to be searched increases,performance decreases it can only work as fast as theslowest server in the group of archives it is searchingQuery syntax varies across source nodes, and syntaxchanges over timeIf results are to be returned using relevance ranking, it isdifficult to merge results from multiple sets in a meaningfulmannerThe institutional and subject-based archives employsoftware that supports the OAI-PMH. At the time ofwriting the vast majority of archives do not support Z39.50or SRW/SRU.

Harvestingmodel

The OAI-PMH is a standard protocol that is easy toimplementIt is flexible although the use of unqualified DC ismandated to be OAI-compliant, additionally other richer,more complex, metadata schemes may be employedThe OAI-PMH is designed to allow metadata exchangeand the sharing of scholarly knowledgeThe institutional and subject-based archives employsoftware that supports the OAI-PMH.Much of the harvesting can be carried out by automaticscheduled tasks, minimizing the need for humaninterventionOnce stored in a local database, the metadata can beprocessed, enhanced and re-exposed both to the originaldata providers and to other service providersIt is possible to develop facilities that maximize searchcapabilities (categorization of the data, subjectclassification, etc.)It can form the basis for an overall programme ofcontinuing development and improvementIt is a low-cost option which can work equally well forjournal articles, e-prints, journal descriptions andcollection level descriptions.

Unqualified DC, which is mandated as the minimummetadata standard for use by the OAI, is the onlymetadata scheme in common use as yet. It is a lowestcommon denominator which lacks semantic richness andlimits the possibilities of providing enhancementsThe metadata exposed by the service may not always bethe very latest version of that metadata. Changes made tometadata at institutional archives, subject-based archivesand open access journals will not be reflected until asubsequent re-harvest

Developing a model for e-prints and open access journal content in UK further and higher education 37

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 14: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

The recent parliamentary recommend-ations in the UK1 on mandating self-archivingin institutional archives – published as ourstudy was concluding – are therefore per-fectly on target to address the issue mostcritical to open access provision. Scholarswill self-archive if told to do so. Employersand research funders have the authority todo the telling, but tell authors to do what,and which authors? Funders can only telltheir grant-holders, but they do have thechoice of telling them to deposit theirarticles in the funder’s own archive (if thereis one), in some other centralized archive, orin the researcher’s own institutional archive,or all of these.

Employers can do all these too, but sincethey not only have shared goals with theirresearchers in respect of dissemination ofresearch findings, but also see additionalvalue in, and uses for, the content of aninstitutional archive, they are very likely tobe eager to see it maximally populated andwill insist on authors depositing there, at thevery least. Moreover, they can mandate andmonitor self-archiving across the board,including by researchers who are not sup-ported by external funding (a large numberin many subject areas), and in every scholarlydiscipline. This is far more effective a routeto comprehensive e-print provision thanrelying on funder mandates alone, and ismuch more likely to provide e-prints in alldisciplines relatively quickly than relying onthe eventual establishment of centralizedarchives in all subject areas. A two-prongedattack from funders and employers (univer-sities and research institutions) mandatingself-archiving in institutional archives is theoptimal way forward.

Finally, many publishers have formallyendorsed the practice of authors of articlespublished in their journals self-archivingthem locally in institutional archives ordepartmental or personal websites, but willnot permit them to self-archive in ‘thirdparty’ archives. A centralized model for thenew service would presumably be viewed asbelonging in the ‘third party’ category andwould therefore suffer from publisher pro-hibition policies.

Our conclusion was, then, that thisscenario is the one most likely to provide the

maximum level of self-archived content, amajor plank of any model for the provisionof e-prints nationwide in the UK.

Implementation of the harvesting modeland services based upon it

Harvesting methods

There are several fundamental ways inwhich metadata might be harvested fromOAI-compliant archives:

1. Harvesting from institutional archives,subject-based archives and open accessjournals is carried out at a national,central level. Then subject-based andother service providers harvest or cross-search subsets from the central nationalservice

2. Harvesting within subject disciplines iscarried out by subject-based service pro-viders. These then act as data providersto national services.

3. Harvesting by resource types – e-prints/OAJs, e-theses, reports literature – iscarried out by agencies dedicated to thosetypes. These agencies act as data pro-viders to national services

The first option is the most realistic andworkable at this time, though the secondand third options do have some advantages,not least the modular management ofservices. Neither the subject portals requiredfor option 2 nor the hypothetical agenciesrequired for option 3 are in place, however,so at this time harvesting at national level(option 1) offers the best route to consist-ency and avoidance of duplication of effort.In fact, option 1 already exists in prototypein the Eprints UK project.32

Alternative services based upon thebroad-level harvesting model (option 1above)

We see three basic ways in which theharvesting model might be used as a basis fore-print and open access journal contentservice provision in the UK.

mandatingself-archiving

in institutionalarchives is the

optimal wayforward

38 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 15: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

An ePrints UK-type service

ePrints UK, one prototype, harvests meta-data from e-print archives and enhancesthem through web services. Data providers(the universities and other research institu-tions) can then re-harvest their enhancedrecords to improve their own local services.Meanwhile, ePrints UK provides access tothe whole dataset via its own interface.33

A portal-in-a-browser service

Our view is that this model is simple,elegant and in keeping with the JISCInformation Environment architecture. Ithas been adopted by the European Libraryproject.34 This model differs from the ePrintsUK model in that we have stripped out theweb services (though these may be added inagain at a future date when they havematured) and added a central archive that‘mops up’ articles deposited by authors whodo not have an institutional archive to use.The model is shown in Figure 4. All theprotocols used in this model are standard

protocols, which are straightforward andinexpensive to implement.

The Google service model

The use of Google to search universityarchives using DSpace via a search systemset up by OCLC is now being piloted.35 Iftrials prove successful, and if the pilot can beextended to search archives powered bysoftware other than just DSpace, this mayprovide a complementary strand to the othertwo models.

Whichever of these detailed servicemodels is ultimately adopted for a nationalUK open access service, it is clear that theparent-level harvesting model is the onewhich should form the basis of a UK service.Its advantages heavily outweigh its dis-advantages, and overall it presents a superioroption to the centralized and distributedmodels that form the alternatives. TheOpen Archives Initiative employs a philos-ophy whose time has come, and the harvestingmodel has gained worldwide acceptance. Itmakes it easy to share information aboutscholarly resources and to offer enhancedresource discovery tools, and its use isbecoming widespread elsewhere. In view ofthis, we recommended that the harvestingmodel should form the basis of open accessservice provision in the United Kingdom.

References

1. UK House of Commons Science and Technology SelectCommittee Recommendations: http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39903.htm

2. National Institutes of Health recommendations:http://thomas.loc.gov/cgi-bin/cpquery/?&db_id=cp108&r_n=hr636.108&sel=TOC_338641&

3. Rowland, F., Swan, A., Needham, P., Probets, S.,Muir, A., Oppenheim, C., O’Brien, A. and Hardy, R.Delivery, management and access model for E-printsand open access journals. Serials Review 2004, in press.

4. Swan, A., Needham, P., Probets, S., Muir, A.,O’Brien, A., Oppenheim, C., Hardy, R. and Rowland,F. Delivery, management and access model fore-prints and open access journals within furtherand higher education. 2004. http://www.jisc.ac.uk/journals_work.html

5. Open Archives Initiative Protocol for MetadataHarvesting. http://www.openarchives.org/OAI/openarchivesprotocol.html.

6. OAIster. http://oaister.umdl.umich.edu/o/oaister/7. Citebase Search. http://citebase.eprints.org/cgi-bin/

search8. Directory of Open Access Journals (Lund University

Library). http://www.doaj.orgFigure 4 Portal-in-a-browser model.

the parent-levelharvestingmodel is theone whichshould form thebasis of a UKservice

Developing a model for e-prints and open access journal content in UK further and higher education 39

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5

Page 16: Developing a model for e-prints and open access journal ... · stances, as e-print archives, institutional archives or institutional repositories). Al-though often used interchangeably,

9. BioMed Central. http://www.biomedcentral.com/10. Scientific Electronic Library Online. http://www.

scielo.br/scielo.php/lng_en11. arXiv.org e-print archive. http://arxiv.org/12. Cogprints: Cognitive Sciences Eprints Archive.

http://cogprints.ecs.soton.ac.uk/13. Institutional Archives Registry. http://archives.eprints.

org/14. Eprints-UK statistics. http://eprints-uk.rdn.ac.uk/stats/15. Daedalus project at Glasgow University. http://www.

lib.gla.ac.uk/daedalus/16. Needham, P., Sidwell, K., Bevan, S. and Harrington, J.

The MAGiC Project. managing access to greyliterature collections. Final report – October 2002.http://www.bl.uk/concord/docs/magic-final.doc

17. McGauran, P. $12million for managing universityinformation. 2003. http://www.dest.gov.au/Ministers/Media/McGauran/2003/10/mcg002221003.asp

18. Eprints UK. http://www.rdn.ac.uk/projects/eprints-uk/.2004.

19. SHERPA project. http://www.sherpa.ac.uk/20. Focus on Access to Institutional Resources (FAIR)

Programme. http://www.jisc.ac.uk/index.cfm?name=programme_fair

21. Pinfield, S. Open archives and UK institutions: anoverview. D-Lib Magazine 2003:9 (3). Available athttp://www.dlib.org/dlib/march03/pinfield/03pinfield.html

22. Harnad, S. and Brody, T. Comparing the impact ofopen access (OA) vs. non-OA articles in the samejournals. D-Lib Magazine, 2004:10 (6). http://www.dlib.org/dlib/june04/harnad/06harnad.html

23. Harnad, S., Carr, L., Brody, T. and Oppenheim, C.Mandated online RAE CVs linked to university eprintarchives: enhancing UK research impact and assess-ment. Ariadne, 2003:35. http://www.ariadne.ac.uk/issue35/harnad.

24. Eprints.org. Registry of Departments and Institutionswho have adopted an OA self-archiving policy.http://www.eprints.org/signup/fulllist.php

25. Eprints Handbook: Actions for Departments toAchieve Open Access. http://software.eprints.org/handbook/departments.php

26. Self-Archiving FAQ for the Budapest Open AccessInitiative (BOAI). http://eprints.org/self-faq

27. OSI Eprints Handbook. http://software.eprints.org/handbook/

28. RoMEO self-archiving policies by publisher. http://romeo.eprints.org/publishers.html

29. RoMEO summary statistics. http://romeo.eprints.org/stats.php

30. Swan, A. and Brown, S. JISC/OSI journal authorssurvey report. 2004. http://www.jisc.ac.uk/uploaded_documents/JISCOAreport1.pdf

31. Swan, A. and Brown, S. Authors and open accesspublishing. Learned Publishing, 2004:17 (3), 219-24.http://www.keyperspectives.co.uk/OpenAccessArchive/Authors_and_open_access_publishing.pdf

32. Cliff, P. ePrints UK – architecture: 1.032003, based onproject proposal. 2003. http://www.rdn.ac.uk/projects/eprints-uk/docs/technical/architecturev1.032003/

33. Martin, R. ePrints UK: developing a national e-printsarchive. Ariadne, 2003:35. http://www.ariadne.ac.uk/issue35/martin

34. van Veen, T. and Oldroyd, B. Search and retrievalin the European Library: a new approach. D-LibMagazine, 2004:10, Feb. http://www.dlib.org/dlib/february04/vanveen/02vanveen.html

35. Macleod, D. Google launches research archive project.Guardian Unlimited. 2004. http://education.guardian.co.uk/higher/news/story/0,9830,1191090,00.html

Alma Swan and Sheridan BrownKey Perspectives Ltd48 Old Coach Road, Playing PlaceTruro TR3 6ET, UKEmail: [email protected]

Paul NeedhamInformation & Library ServicesCranfield UniversityCranfield, Bedfordshire MK43 0ALUKEmail: [email protected]

Steve Probets, Adrienne Muir, CharlesOppenheim, Ann O’Brien, Rachel Hardyand Fytton RowlandDepartment of Information ScienceLoughborough UniversityLoughborough, Leicestershire LE11 3TUUK

40 A. Swan et al.

L E A R N E D P U B L I S H I N G V O L . 1 8 N O . 1 J A N U A R Y 2 0 0 5