Management of Data Collections
-
date post
18-Oct-2014 -
Category
Technology
-
view
444 -
download
0
description
Transcript of Management of Data Collections
Data Collections
Bernadette Duffy and Abraham de Jesus
LIBR 580
Louise Broadley
October 5, 2011
What are Data Collections?
• Data from surveys, opinion polls, climate data
• Numeric data in machine-readable form • To make use of the data files need
Codebooks and other supporting files
Data Lifecyclefrom DataOne https://www.dataone.org/content/education
Libraries and Data Collections
• Important in academic and special libraries
• Used by researchers and policy analysts
• Academic libraries starting to get involved in the preservation of research data from own institution
UBC Library Data Serviceshttp://data.library.ubc.ca/
Data suppliers - UBC
• Statistics Canada http://www.statcan.gc.ca/ Canadian Census, labour, health, income, trade
• The Roper Center for Public Opinion Research at the University of Connecticut http://www.ropercenter.uconn.edu/ Opinion polls
• Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan http://data.library.ubc.ca/gen/icpsr.html Social Sciences data
abacus
abacus - data set Part 1
abacus - data set Part 2
Data file
Challenge - Cost
Strategies to reduce cost for subscription data sets
• Collaborative purchase with several departments (UC Berkeley)
• University consortium (UBC, SFU, UVic, UNBC combined to form BC Research Libraries’ Data Services consortium – abacus http://abacus.library.ubc.ca/
Challenge - Selection
Decisions are based on• Collection policy• Knowledge of what is available• Understanding user need• Cost• Individual patron need• If the data would be useful to multiple
users
Challenge - Supporting Access
• Make visible in Library Catalogue. • Convert file formats for use in statistical
programs• Outreach / education in use of data
collection and statistical tools• Workshops on data literacy• Create a Data Lab• Become embedded in course requiring use
of data collections
Infrastructure
• Data sets can be highly variable in size.• This creates certain infrastructural
challenges for storage, institution’s system, and the institution itself.
Storage
• Scalability: “the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth.” (Wikipedia)
• Location: Does your institution expect to host the data produced by researchers at that institution?
Systems Support
• Network: Can the network handle downloading of large datasets?
• Hardware: Can the systems support computation over disparate data sets?
• Software: Do you have statistical programs (like SPSS or R) available for your users?
• Flexibility: Can your system handle the wide variety of data formats, sizes, and uses?
• Example of a good system: http://www.devinfo.info/genderinfo/
UN Gender Info
Institutional Support
• Workflows: Can your data collections be integrated into the larger collections management framework?
• Faculty Partnerships: Will faculty work with the library to create data management plans?
• Mandate: Does your institution consider data collections a priority?
Preservation
• Best practices for data preservation mean that preservation concerns enter in at the earliest point in the data management cycle: creation.
Criteria for Preservation
• Obligation• Value• Uniqueness• Verification• Other Cultural Reasons
Metadata
• Plagued by a lack of standards.• No international metadata standard for
data sets.• Needs to give enough context for the data
to be understandable. • No clear citation practice has emerged for
data sets. • Data Documentation Initiative (DDI)
Wrap-Up
• What is a data collection? A collection of the data resulting from research.
• They have unique challenges for selection, access, infrastructure, and preservation.
• Data Curation is an up and coming field in librarianship.
• Librarians are uniquely poised to be involved in the recent surge of interest in data.