Developing a resource discovery proposition for scientific ... · discovery for datasets was an...
Transcript of Developing a resource discovery proposition for scientific ... · discovery for datasets was an...
Developing a resource discovery proposition for
scientific datasets at the British LibraryEBLIP6 30-06-11
Rachael Kotarski, Content Specialist – Datasets
Elizabeth Newbold, Content and Collections Leader
British Library, Science and TechnologyCollections and Content
Strengths of the collection:
• All aspects of science, technology and medicine – including a strong focus on industry and applications of science
• All material is of a high technical standard and ‘research relevance’is a key factor in selecting material
• International in coverage and scope; material acquired from all of the major STM publishers
• Print monographs and serials - Extensive journal collections including trade magazines and newsletters
• Grey literature (conference proceedings, reports, theses, official publications); Patents and Maps
• National library of the United Kingdom
• Origins of the collections in science as a distinct resource forscientists and engineers date from 1850 and the Patent Office Library
• Science is an integral part of the British Libraries remit
• Serves business & industry, researchers, academics and students through dedicated reading rooms in London and our document supply services based in Boston Spa
3
Why data?
• Data are a vital part of the scientific record.
• Growing number of mandates and requirements from funders and publishers to make data available:
• In the UK: RCUK funders, Wellcome, CRUK
• Internationally: e.g. Genome Canada, NIH, NSF, DFG, INSERM
• But researchers in areas where this is a new requirement need advice, support and the appropriate tools and resources to ensure they can share, find and reuse data
• But what is/should be/will be the role of libraries in this changing landscape?
• Data as a format is very different from traditional library content, so are libraries equipped with the knowledge, technology and capacity to deal with it?
• How can libraries prepare for this?
We needed to look at the landscape of data and the services that the Library could provide to investigate our potential role further.
By research datasets, we mean scientific information generated by experiments,
observation or computation, which forms an evidence base for the
work of researchers. That information may be stored in any digital form, including
text, numbers, images, video, audio, software, algorithms and models.
4
What do we mean by data?
Late PHASE 1
2007 Consultancy reports
STM Strategy
2008 Content strategy
Dataset content specialist in post
2009 Scoping
DataCite metadata working group
Assess suitable Library systems
2010 PHASE 2
Low-key data discovery pilot
Promotion of pilot
Survey
2011 Extension of pilot
Expanding subject scope
Analysis
Background and timeline
5
PHASE 1: Scoping of the ‘data’ landscape.
•We commissioned Key Perspectives and RAND to
look at the datasets available and assess the kinds of
services for data that the British Library would be
best placed to provide.
•These were worked into the overall STM strategy.
•The Content Strategy for STM 2008-2011 was
devised, with specific reference to datasets.
•Recruitment of an STM Datasets Content Specialist
PHASE 2: Low key pilot.
•To test the approach and gauge user interest and
need for such a service.
•Analysis to judge sustainability and use.
PHASE 1Scoping a role for the Library:Consultancy reports and STM Content Strategy
• Key Perspectives suggested four different
approaches, which RAND explored further,
fleshing out the options proposed by KP based
on ‘supply’ and ‘demand’ characteristics of
datasets.
• Both reports highlighted that providing
discovery for datasets was an important avenue
for the Library to investigate further.
6
• As a result, the focus on enabling and developing discovery of datasets was worked
into the Library’s STM content strategy 2008-2011. In detail, points included:
• Develop and test selection criteria for reference datasets
• Develop relationships with data stakeholders
• Explore the role of Libraries in developing mechanisms to facilitate longer term access
and persistence
7
How to test a discovery proposition?
• A service involving a ‘new’ material type would raise
questions about:
• Users
• Selection
• Metadata
• Operational sustainability
• To build the evidence to answer these questions, we:
• incorporated datasets questions in on going research for
other projects (UKPMC, RIC, Flooding project, PhD focus
groups, life science case studies)
• sought out similar user research from the literature
• worked internally to draw out suitable processes and
systems
• These would give us theoretical evidence, but to draw
concrete conclusions, we needed to pilot a service.
Options for a pilot
Most importantly, we wanted to use the technical solutions that were already available
in the Library. Options were drawn up for the shape of a discovery service. These were:
•BL webpage-based discovery: This service would be created and based within the
content management system (CMS), Percussion. • Similar to CISTI’s Scientific Data Gateway.
•BL Integrated Catalogue: This option would see data resources catalogued into Aleph.
The records would then be surfaced via the Integrated Catalogue and Primo.• Similar to TIB Catalogue’s inclusion of data.
•Themed Collection Catalogue: This entails a standalone database for discovery as well
as storage, administration and editing of discovery metadata. • Similar to ViFaBiO.
•Primo-based discovery: This option sees metadata indexed Primo (from Ex Libris).
Metadata can be stored anywhere providing it can be ‘fed’ to Primo. • Similar to Search Oxford Libraries Online.
8
Collecting evidence: Metrics
• In order to measure the success of the pilot, we needed to
engage our users. We took a survey approach.
• The survey needed to answer questions of user need, but
also their thoughts on the shape and direction of pilot.
• We looked at earlier surveys to phrase questions for
comparable results that would still be specific to the pilot.
• We also included profiling questions.
• We also looked at the actual use of the services through
views of each record, and SFX click-through data from SoC
to the resource itself.
• We had to keep in mind we were only included a
limited set of records with limited scope
10
Promotion of the pilot and survey
• Dataset records that were made public in May 2010 would
only be discoverable by accident, so we:
• created a webpage explaining the pilot, with Adobe Captivate
videos demonstrating how it worked, and example records.
• actively promoted via JISC email lists, British Library
newsletters, FaceBook , Twitter and user training sessions
• Survey was released in October, promoted using the same
methods but additionally on the SoC homepage and in user
training sessions.
• Response to the survey was disappointing possibly due to:
• General lack of use of SoC, which was still in ‘beta’ itself
• Lack of users with a current interest in research datasets
• Limited subject scope restricted the number of potentially
interested users
11
12
So how did the pilot answer our questions?Users / Usage
Do researchers need to find data?
• 9% of respondents said they do not currently need to find
data to reuse.
• But 100% of those expect to reuse data in the future.
Where are researchers getting their data?
• Pilot survey: Spread across all sources, but primarily Web
searches and Colleagues.
• Our other surveys and case studies showed comparable
results, although with a stronger bias towards literature
and web searches, and colleagues and collaborators.
What kind of data are they looking for?
• Pilot survey: Non-specifc, although would prefer to not
need to search again.
• Our other surveys showed a need for a wide variety of
data types, including non-digital and supplementary data
So how did the pilot answer our questions?USERS contd…
13
Will researchers use the services to find data?
•Our initial usage stats suggest yes.
•Use has remained stable, and although the number of records viewed has decreased,
the number of those that lead the user to view the dataset remains stable.
So how did the pilot answer our questions?USERS: Comparable usage
How does this compare with their use of other content?
•Compared to other resources accessed from Search Our Catalogue, the ratio of users
who go on to access datasets remains high (when factoring for the number of records
actually available) DESPITE the restricted subject scope of the records available*.14
What have we learned?
• There is a role for libraries! Although many concentrate on libraries’ role in storage and
preservation, we can quickly and easily start with how we enable discovery of data.
• It is achievable! The first year of our discovery pilot has been successful in demonstrating
one of the options for enabling discovery of research data.
• Available library systems can handle the discovery of datasets, but work is needed to
ensure staff understand the differences between data and traditional content.
• Many researchers aren’t currently able to define their needs for data, but this will
change: we need to remain engaged to maintain understanding of these changing needs.
• You have to get involvement from a lot of people – the pilot involved people from every
directorate of the Library.
15
The U.S. National Archives. Public Domain. Via Flickr
Future direction
PHASE 3 will involve:
• Assessing sustainability, particularly time
requirements of maintenance
• Harvesting and simplifying metadata
• Expanding subject scope for wider engagement
• On-going monitoring of usage
• Re-use in other projects: comparing approach
e.g. for subject portals
16
Links and references
LinksSearch Our Catalogue (soon to be ‘Explore the British Library’): http://search.bl.uk
STM@BL website: http://www.bl.uk/science
RAND report: http://www.rand.org/pubs/technical_reports/TR567.html
Refs:Sharing research data to improve public health: joint statement of purpose. (2011, January 10). Retrieved from
http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-
epidemiology/WTDV030690.htm.
Funders’ Data Policies. Retrieved 2011-06-23, from http://www.dcc.ac.uk/resources/policy-and-legal/funders-data-policies.
Researchers and Discovery Services: Behaviour, Perceptions and Needs. A study commissioned by the Research Information
Network. (November 2006). Research Information Network. Retrieved from
http://www.rin.ac.uk/system/files/attachments/Researchers-discovery-services-report.pdf
Patterns of information use and exchange: case studies of researchers in the life sciences. (November 2009). Research
Information Network. Retrieved from http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-
REPORT_Nov09.pdf
Cyberinfrastructure Vision for 21st Century Discovery. (March 2007). Retrieved from
http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf
18