Assessing the Academic Library's Role in Campus-Wide Research Data Management: A First Step at the...

18
This article was downloaded by: [69.26.46.21] On: 16 June 2014, At: 07:10 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Science & Technology Libraries Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/wstl20 Assessing the Academic Library's Role in Campus-Wide Research Data Management: A First Step at the University of Houston Christie Peters a & Anita Riley Dryden a a University of Houston Libraries , Houston, Texas Published online: 08 Dec 2011. To cite this article: Christie Peters & Anita Riley Dryden (2011) Assessing the Academic Library's Role in Campus-Wide Research Data Management: A First Step at the University of Houston, Science & Technology Libraries, 30:4, 387-403, DOI: 10.1080/0194262X.2011.626340 To link to this article: http://dx.doi.org/10.1080/0194262X.2011.626340 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Transcript of Assessing the Academic Library's Role in Campus-Wide Research Data Management: A First Step at the...

This article was downloaded by: [69.26.46.21]On: 16 June 2014, At: 07:10Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Science & Technology LibrariesPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/wstl20

Assessing the Academic Library'sRole in Campus-Wide Research DataManagement: A First Step at theUniversity of HoustonChristie Peters a & Anita Riley Dryden aa University of Houston Libraries , Houston, TexasPublished online: 08 Dec 2011.

To cite this article: Christie Peters & Anita Riley Dryden (2011) Assessing the Academic Library's Rolein Campus-Wide Research Data Management: A First Step at the University of Houston, Science &Technology Libraries, 30:4, 387-403, DOI: 10.1080/0194262X.2011.626340

To link to this article: http://dx.doi.org/10.1080/0194262X.2011.626340

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Science & Technology Libraries, 30:387–403, 2011Copyright © Taylor & Francis Group, LLCISSN: 0194-262X print/1541-1109 onlineDOI: 10.1080/0194262X.2011.626340

Assessing the Academic Library’s Role inCampus-Wide Research Data Management:A First Step at the University of Houston

CHRISTIE PETERS and ANITA RILEY DRYDENUniversity of Houston Libraries, Houston, Texas

In an effort to support the University of Houston’s goal of becominga Carnegie-designated Tier One research university, several sci-ence librarians within the Department of Liaison Services haveundertaken a study to assess current data management practiceson campus. The goal of this study was to determine if data man-agement needs are being met on campus and how the library mighthelp meet those needs. We found that rather than physical storagecapacity, researchers need assistance with funding agencies’ datamanagement requirements, the grant proposal process, findingcampus data-related services, publication support, and targetedresearch assistance attendant to data management.

KEYWORDS dmp, data, data management, data-supportservices, interviews, NSF, University of Houston

INTRODUCTION

Coined the fourth research methodology by Hey and Hey (2006, 516),eScience or networked, data-driven science has been a buzzword for a num-ber of years. The National Science Foundation (NSF) has invested millions ofdollars in the development of cyberinfrastructure to enable the scientific and

We would like to thank Robin Dasler for her assistance with the implementation of thispilot study. Previously the Science & Mathematics Librarian at the University of Houston,Robin has since moved on to a position as Information Services Librarian with LAC Groupand is currently on contract to the NASA Goddard Library in Maryland.

Address correspondence to Christie Peters, Science and Engineering Librarian,University of Houston Libraries, 114 University Libraries, Houston, TX 77204, USA. E-mail:[email protected]

387

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

388 C. Peters and A. R. Dryden

engineering research necessary to address national and global priorities inareas such as climate change, protection of the natural environment, and pre-dicting and protecting against natural disasters (National Science Foundation,Blue Ribbon Advisory Panel on Cyberinfrastructure 2003, 31). In turn, manyuniversities have developed their own scaled-down cyberinfrastructure todeal with the data management needs of researchers on campus and theircollaborators at other institutions throughout the world. An example ofthis can be found in Purdue University’s Distributed Institutional Repository(DIR).

In Cyberinfrastructure Vision for the 21st Century, NSF acknowledgesthe role that university-based research librarians are positioned to play in thisarea. Potentially significant contributions include the development of digitaldata–archiving, curation, and analysis by applying existing library standardsfor print material to scientific digital data (National Science Foundation,Cyberinfrastructure Council 2007, 25). Reports from the Association ofResearch Libraries (ARL) have also assessed trends in eScience and theirimplications for libraries (Association of Research Libraries, Joint Task Forceon Library Support for E-Science 2007), the role of research and aca-demic libraries in the stewardship of scientific and engineering digital data(Association of Research Libraries 2006), and the degree to which ARL mem-ber institutions are coming to the plate in terms of eScience and data-supportservices (Soehner, Steeves, and Ward 2010).

Librarians and data specialists at universities such as Purdue, Cornell,and Georgia Tech have been developing unique ways of addressing theissues that individual institutions are facing in regard to eScience and datamanagement needs at the university level. For example, Purdue UniversityLibraries created the Distributed Data Curation Center (D2C2) in 2006 asa mechanism to bring researchers together to investigate different ways tomanage data sets at Purdue (Mullins 2007). In an effort to identify the needsof researchers with regard to their data, project personnel collaborated withthe Graduate School of Library and Information Science at the University ofIllinois and Urbana-Champaign to collect data curation profiles for a num-ber of researchers on the Purdue campus (http://www4.lib.purdue.edu/dcp/history). This process resulted in the creation of a data curation profile toolkitthat Purdue Libraries has made available to the public to help other librariesinterested in collecting similar data.

Cornell University Libraries (CUL) created the Data Working Group(DaWG) to exchange information about CUL activities related to data cura-tion, to review and exchange information about developments and activitiesin data curation in general, and to consider and recommend strategic oppor-tunities for CUL to engage in the area of data curation (Steinhart et al. 2008).DaWG recommendations include directives to seek out and cultivate part-nerships with other organizations, to assess local needs and develop localinfrastructure and related policies, and to cultivate a workforce capable

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 389

of addressing the new challenges posed by data curation and cyberin-frastructure development. The library at Georgia Tech created a researchdata project librarian position to initiate an assessment of data managementneeds on campus and to evaluate whether or not the library needed to pro-vide services that could meet those needs. It was a modified version of theGeorgia Tech assessment that the science team used for this pilot study atthe University of Houston Libraries.

In summer 2010, the University of Houston (UH) released its “Planto Achieve Recognition as a National Research University.” Within thisdocument, a number of strategies to help improve research funding andproductivity are described. These include investing in areas of research thatalign institutional strengths with high levels of external funding opportunitiesand industry strengths in Houston and Texas (e.g., energy, health sciences),developing core research facilities that align with the university’s researchpriorities, enhancing the recruitment and retention of top faculty and stu-dents, facilitating the acquisition of external research funding, and buildinginterdisciplinary research centers and institutes that enable UH to securelarge federal research grants (University of Houston 2010, 4). Reflecting theuniversity’s focus on research, the UH Libraries 2010–2013 strategic direc-tions document states that “library staff deployment will align with theresearch priorities of the university, adding technical skills and new levels ofdisciplinary expertise as needed to offer the best service possible to facultyand other researchers” (http://info.lib.uh.edu/p/research-support). As such,a push was made by library administration to find ways to support the uni-versity’s vision. This provided the perfect opportunity for a foray into theworld of data management initiated by a group of science librarians.

In summer 2010, all members of the library’s Liaison ServicesDepartment were asked to develop projects that would support theUniversity of Houston’s push for Tier One status. As no formal assess-ment had been done on the data management needs of researchers oncampus, the project team elected to develop a pilot study aimed at assess-ing the data management needs of principal investigators working on NSFand National Institutes of Health (NIH) grant-funded projects. In contrastto the case studies cited above, this project was initiated by liaison librari-ans. While discussions surrounding issues of data-support services are still intheir infancy at the University of Houston Libraries, our study has garnereda great deal of internal support.

METHODS

A project team consisting of two science liaisons and the library’s digital andweb projects fellow worked with the Office of Contracts and Grants, locatedwithin the Division of Research, to obtain a list of all of the NSF and NIH

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

390 C. Peters and A. R. Dryden

grant-funded projects for fiscal year 2010. Projects were selected based onthe department affiliation of the Principal Investigator (PI), the dollar amountof the grant, and whether the PI was the sole recipient of the grant orpart of an interdisciplinary project team. Our goals were to interview PIs ofsignificant grants, to assess individuals in as many science and engineeringdepartments as possible, and to obtain information on data managementpractices from both individual and group-based projects. The results of thispilot study will inform the nature and scope of future assessments.

Ultimately, fourteen projects were targeted for this study. All of theprojects were funded by multiyear grants, and all are still in progress. Nineof the projects were funded by NSF and five by NIH. Together they totaledover five million dollars during fiscal year 2010, represent twelve differ-ent departments, and are made up of both individuals and interdisciplinarygroups. The diversity within this sample was designed to allow the teamto examine data management needs across a wide cross-section of campusresearchers. It was the opinion of the project team that individuals with NSFand NIH grant-funded projects would be more receptive to this pilot studythan a random sample of researchers due to the mandate imposed by NSFon January 18, 2011 that requires all proposals to include a data manage-ment plan, and the similar NIH policy that requires a data-sharing statementfor proposals over $500,000. An e-mail solicitation was sent to the PI ofeach project, and this was followed by a phone call during which the teamattempted to schedule an appointment for the interview. Ten of the four-teen PIs agreed to be interviewed. In light of trends that were noticed earlyon in the study, interviews were also scheduled with one co-PI, one post-doctorate, and one graduate student, each of whom was associated withone of the projects included in the study. A list of the interview subjectsidentified numerically along with all associated information can be found inFigure 1.

The interview instrument, located in the Appendix, is based on thatused by the Georgia Tech Libraries in a similar study, which is in turn based

FIGURE 1 Interviews Conducted.

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 391

on the Data Asset Framework, version 1.8 developed jointly by the DataCuration Centre (DCC) and Joint Information Systems Committee (JISC).Slight changes were made to the instrument to customize the interview forthe UH campus. Interview subjects were given a copy of the questions tofacilitate understanding of some items that lent themselves to a more visualapproach. Members of the project team then guided each subject throughthe thirty questions of the instrument. In an effort to avoid bias, care wastaken to inform subjects that members of the project team would not beoffended by candid answers, particularly with regard to opinions about thelibrary’s role in data services. The subjects’ responses, as well as generalnotes from the team members, were compiled, and items of a more quan-titative nature were placed into a spreadsheet to aid in final evaluation ofthe data.

RESULTS

This was a pilot study conducted with a small number of subjects. As such, itmay be difficult or inappropriate to generalize our findings to all researcherson campus. Even so, some definite trends emerged within this group, whichwe plan on using to inform our next steps. This section is organized accord-ing to the interview instrument to allow for a clearer understanding of theresults of each particular section.

Project Information

As previously mentioned, we interviewed ten PIs, as well as one co-PI,one post-doctorate, and one graduate student who were associated withone of the projects selected for this study. The eight different departmentalaffiliations were biology and biochemistry, chemistry, physics, pharmacy,and mathematics, as well as civil, electrical, and mechanical engineering;and all ten projects were still in progress at the time of the interviews. Of thesix PIs interviewed who were the sole recipients of funding for the projectin question on the UH campus, five had external partners on the project,indicating that the distinction between individual and group projects waslargely insignificant. Based on this finding, we will not use this parameter tohelp in our selection process in subsequent studies.

Data Lifecycle Workflow

In terms of planning prior to data collection, practices varied greatly. Themajority of respondents indicated that much of their planning occurs duringthe grant proposal writing process. Three of the ten PIs discussed theiruse of LabVIEW, software that allows scientists to develop customizable

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

392 C. Peters and A. R. Dryden

measurement, test, and control systems for large data sets, and that supportsa wide variety of file types. In this context, designing the research processwas synonymous with planning for data collection. All of the researchersprovided milestones for their projects, many of which gave the project teama good idea of the different types of data collection that the research entailed.

Data Characteristics

As can be seen in Figure 2, each of the projects selected for this assessmentinclude multiple data types. With the exception of XML and GIS, all of thedata types we suggested were used by at least some of the participants, withimages, scanned documents, spreadsheets, and text being used almost acrossthe board. One participant indicated that all of the types of data might applyat particular points in the process and so did not mark any in particular.According to the data, a majority of the projects included in this study arenot expected to produce in excess of 500 GB of data, indicating that formany scientists on campus, organization will be of greater concern thanstorage space. In keeping with this finding, few faculty members currentlylook to the library for assistance with the technical side of data management.Most of the interview subjects indicated that they are comfortable utilizingdepartmental and campus IT resources to store and back up their workingdata, despite the fact that there is no clearly organized information availableon campus that documents what particular data-support services are offeredand by whom. An expanded study will help us determine if this trend holdsfor the entire campus or if it simply reflects a trend common among this onesmall sample of researchers.

We made a number of interesting observations when inquiring aboutproject data characteristics. For instance, in one case, the responses of thePI and co-PI who were working together on the same project did not cor-respond exactly. The PI generated CAD data, whereas his co-PI did not.

0

Audio

CADData

Data -

Statist

ical

Data -

XML

Databa

se GISIm

age

Scann

ed D

ocum

ents

Sprea

dshe

ets

Don't K

now

Text

Video

Web

2468

1012

Fre

quen

cy

Data Types Used

FIGURE 2 Data Types Used.

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 393

This is not terribly surprising given that the two are working on differentaspects of the project, but it is important to note in regard to future assess-ments. It should not be assumed that the PI of any given project is intimatelyfamiliar with the work being done by his collaborators. What is more sur-prising is that the data types generated by the PI/post-doctoral team alsodid not correspond exactly. The post-doctorate indicated that .dat data wasgenerated by the project, whereas the PI did not; and the PI indicated thatspreadsheets were generated, whereas the post-doctorate failed to includethis type of data in her response. It is impossible to tell whether or not thePI/graduate student team agree exactly on the types of data generated intheir project, as the PI indicated very generally that it was possible that alltypes of data could ultimately be generated by their project. These dispar-ities indicate that the PI and his or her support staff are not always on thesame page about what type of data is generated in any given study. Basedon this observation, the project team agrees that efforts should be made tointerview multiple members of a project team when performing future dataassessments to gather a thorough overview of the data they are generating.

Data Management

When asked who was responsible for managing the data associated witha project, all ten PIs indicated that they have that responsibility, with oneclaiming that data management is his responsibility and his alone. Nine PIsclaimed shared responsibility, with six specifying graduate students, twopost-doctorates, and several isolated mentions of technicians, a project man-ager, and a lab team. In regard to data storage, numerous methods wereused by everyone interviewed, the most common being storage on a PChard drive (Figure 3). Only one PI did not specify any particular location,indicating that students were responsible for storing all of the data. This is

0

CD/DVD

USB Driv

ers

Inter

net-b

ased

Don't K

now

Depart

ment S

erver

PC Hard

Driv

e

Extern

al Hard

Driv

e

Instr

umen

t Hard

Driv

e

2

4

6

8

10

12

Fre

quen

cy

Data Storage Methods

FIGURE 3 Data Storage Methods.

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

394 C. Peters and A. R. Dryden

interesting given that the same PI claimed sole responsibility for managingdata associated with the project in the previous question. This is indicativeof a general theme that has emerged from this study, namely that there ismuch confusion over what constitutes data management.

The question concerning data backups was encountered with particu-larly high levels of uncertainty and discomfort, so it came as no surprise thatresponses varied widely among respondents. Only one individual reportedthat his data is backed up multiple ways and at various times throughout themonth, even going so far as to claim that it is also stored in multiple places.Seven indicated that their data is backed up weekly, with two specifyingthat the backups happen by means of campus IT. One person specified thathis data is backed up by the College of Natural Sciences and Mathematics(NSM) IT, but he was uncertain as to how frequently that occurred. Theproject team was able to confirm the weekly backup schedule used by cam-pus IT, and that the backup systems offered by the NSM IT unit allow usersto set their own schedule for backups. The latter fact is interesting given thatthe one individual who stores his data on the NSM server was uncertain ofhis backup schedule. Two respondents specified that their data is backed updaily, two monthly, and one admitted that he has no idea whether his data isbacked up at all. He confessed that his graduate students and post-doctorateshave complete control of the data, even with regard to backups.

Although most of the interviewed PIs revealed some sort of planfor obtaining the raw data generated by graduate students during theirtime at the university upon their departure, this was not always the case.Surprisingly little thought was given to how the transience of students mightimpact the consistency of data management practices within a lab. Oneresearcher was quite honest about the fact that students up to that pointhave not been held accountable for ensuring that their data stays with theresearch group on their departure.

The majority of respondents indicated that they plan on storing theirdata indefinitely. Comments to this effect included the need to have dataavailable should a paper be challenged, as well as the opinion that in theabsence of storage space concerns, there is simply no reason to get ridof anything. With this being said, further investigation led to the realiza-tion that researchers were largely referring to analyzed, often visualized,publication-level data and not data directly generated through the course ofthe experiment. In most cases, there was no centralized storage of experi-mental data. Two individuals did state that they only plan on keeping thedata associated with their project for one to five years, one specifying thatcertain key results will be kept indefinitely. Three indicated that they willkeep their data for more than ten years.

When asked if they had a data management plan (DMP) for the projectin question, seven respondents indicated that they do not; six of themstating that this is because a DMP was not necessary at the time of their

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 395

proposal submissions. Other reasons included general lack of informationabout DMPs and the extra demand on their time. The three individuals whoclaimed to have a data management plan in place stated that it was just goodresearch practice to do so, that it helped them to stay organized, and thatdata are simply too valuable not to manage.

Data Organization

As with data backup, there was a great deal of variation when it came to pre-ferred methods of data organization. A common theme among researcherswas to suggest that their data organization is largely predetermined byexperimental design and research practice, and that additional planning isunnecessary and inefficient. Five respondents indicated that they have noreal file or folder naming conventions in place, with students often being leftto their own devices in terms of data organization. Only one of the PIs thatwe interviewed provides very specific instructions for his students and labpersonnel on how to manage the data associated with his projects. Anothertwo claimed that they use industry standards to organize their data withoutspecifying exactly what those standards are. Five individuals claimed to usespecific file- and folder-naming protocols, with one additional person claim-ing to simply use the file names generated by the equipment. Three of theten PIs discussed their use of LabVIEW. Although none of the interviewedindividuals indicated using any formal metadata standards, two mentionedthat their data is automatically time-stamped by their equipment, while oneadditional respondent mentioned that some of his equipment automaticallyembeds metadata as the data is generated. Even though these responsesvaried greatly, all thirteen indicated the belief that their data organizationmethods were sufficient for others in their field, even those individuals whoacknowledged no clear method of data organization.

Data Use

Toward the end of the interview, we asked respondents to indicate whohas current access to their data and with whom they would share their dataif they could. Every person interviewed indicated that the researchers, stu-dents, and staff working on a project have access to project data. Most ofthe respondents who indicated that they have made data available to peopleoutside of the research group stated that they do so only upon request. Onlypublished data, not raw or experimental data, was ever shared with the pub-lic or project sponsors. The difference in responses of researchers workingon individual versus group projects was minimal. The most commonly citedreason for not sharing data was that it is confidential, proprietary, or clas-sified; but intellectual property concerns, possible misinterpretation of thedata, and the time or effort required to share it were all cited as reasons as

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

396 C. Peters and A. R. Dryden

well. Only one researcher claimed that data would be shared with anyonewho asked. One individual who does research in the field of evolutionarybiology, an area that is controversial even without the possibility of the mis-interpretation of data, claimed that he would share his data with anyonein the scientific community, but probably not the general public. Anotherresearcher pointed out that there was no reason to share raw, unprocesseddata with anyone outside of the research group, because it would not makeany sense to them. Not a single respondent seemed to consider the possi-bility that sharing raw data might allow for the independent validation ofresults or different studies using the same data set.

When asked how they currently share their data, the vast majority ofresearchers indicated that e-mail was their preferred method. This was fol-lowed by external storage devices and collaborative web space. As statedbefore, the idea of sharing raw data or making raw data openly available toeveryone was simply not a consideration by any of the respondents. Theseresults highlight the opportunity that we as librarians have to demonstratethe benefit of sharing data to the researchers with whom we discuss datamanagement issues.

CONCLUSIONS

None of the researchers interviewed for this pilot study are working onthe type of projects that Borgman and others (2007, 17) describe as “BigScience,” a term that indicates large-scale, networked projects. The projectstaking place on campus that fit this discription do not appear to be fundedby NSF or NIH, but are supported by units such as the Texas Learningand Computation Center (TLC2) and the Texas Center for Superconductivity(TCSuh), both of which are UH research centers. It turns out that UHresearch centers currently support a number of large-scale projects. A con-sequence of this is that the need for infrastructure support did not presentitself over the course of this study. A more comprehensive study will help usdetermine if such a need does in fact exist on the UH campus. In addition tolooking at research on campus irrespective of funding agency, the expandedstudy will include interviews with graduate students, post-doctorates, andlab technicians, since most of the day-to-day management of data, includingboth collection and analysis, falls to the these individuals.

A number of next steps have been identified by the project team thatwill precede this expanded data management assessment. First, a proposalfor the creation of a library Data Working Group is currently being prepared.A number of units within the library have begun to work independently onprojects pertaining to data management, and there has been some con-cern that this will result in mixed messages being relayed to researchers oncampus. The Data Working Group will ensure communication between all

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 397

groups within the library that have a vested interest in data-support servicesand consistent messaging when addressing data management concerns oncampus. Second, the library will host a gathering of data service providers oncampus. The University of Houston is a highly decentralized institution, withindividual colleges and research centers operating largely independently ofone another. As mentioned in E-Science and Data Support Services: A Studyof ARL Member Institutions, highly decentralized campuses tend to movevery slowly toward developing needed support systems across their insti-tutions (Soehner, Steeves, and Ward 2010, 13). This gathering will includerepresentatives from the UH Libraries, the Division of Research, campusIT, departmental ITs, and campus research centers. Our pilot study clearlydemonstrates that multiple units are providing varying degrees of data man-agement support to faculty on campus, but no one really knows who isoffering what service and to whom. In an effort to eliminate duplication ofeffort and to develop services at the point of need, this information must becollected, organized, and shared among the various service providers. Thisis clearly an instance when the library can play a facilitative role on campus,if only by initiating the discussion.

In addition to proposing a library Data Working Group, organizing agathering of data service providers on campus, and expanding the study toinclude science and engineering researchers regardless of funding agency,the project team plans to broaden this study to include researchers in non-scientific disciplines on campus. Because we feel that it is important forhumanities and social sciences liaisons to interview researchers in theirrespective fields, a series of data management 101 instruction sessions forall liaison librarians regardless of discipline will be offered. This will provideliaison librarians in all subject areas with the knowledge they need to interactintelligently with faculty in their departments on data management topics.The data management assessment instrument will be revised to account fordisciplinary differences.

Non-infrastructure-related data service needs that have been identifiedin this study include help with the grant proposal process in general, espe-cially assistance with funding agency data management requirements, helpidentifying campus data-related services, publication support, and targetedresearch assistance attendant to data management. Several interview subjectsalso expressed a desire for data visualization and manuscript preparationsupport, as well as a system to help them share their publication data.Fortunately, the UH Libraries are currently well-positioned to provide anumber of these services. For example, the UH Libraries Digital ServicesDepartment has developed a web-based form to assist researchers in thecompletion of a DMP as required by NSF and is coordinating with theDivision of Research and specialized research centers on campus to provideeducational opportunities for researchers. In addition to helping researchersgenerate a DMP, this form provides an opportunity to indicate an interest

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

398 C. Peters and A. R. Dryden

in storing their publication data in our institutional repository. To date, ourrepository has been used primarily to host electronic theses and disserta-tions, so many faculty members are either unaware of or unfamiliar with itsfeatures. We hope that the use of the web-based form to generate DMPsmay lead to increased usage of this service for a variety of purposes includ-ing data storage. A small number of UH researchers have used the form todevelop their DMP, and feedback is currently being gathered.

A very interesting consequence of this study has been the formation ofunexpected connections between science librarians and scientists in depart-ments with which they do not normally liaise and with faculty who havepreviously had very little interaction with the library. Many of the facultywho fall within this latter category have been surprised to find that the librarycan offer research support services other than traditional library instruction.This pilot study has turned into a phenomenal outreach opportunity, notonly in terms of assessing data management needs, but in assessing generalresearch needs as well.

Taking on a new role as expansive and multifaceted as data manage-ment is a daunting task for librarians. The nature of high-level scientificresearch and technology involved in providing data infrastructure seemedinsurmountable obstacles for our library prior to speaking with campusresearchers for this project. Fortunately, our initial assessment has providedus with a clearer understanding of the campus research landscape and hasshown that University of Houston researchers require support attendant todata management that the library is currently capable of providing withoutan excessive initial investment of time and resources. We highly recommendthat librarians at institutions that do not currently offer any data support ser-vices take the first step into the world of data management. This one modestpilot study has led to numerous unforeseen opportunities at the Universityof Houston Libraries.

REFERENCES

Association of Research Libraries. 2006. ARL workshop on new collaborative rela-tionships: The role of academic libraries in the digital data universe. To stand thetest of time: Long-term stewardship of digital data sets in science and engineer-ing, September 26–27. http://www.arl.org/bm~doc/digdatarpt.pdf (accessedJune 2, 2011).

Association of Research Libraries, Joint Task Force on Library Support for E-Science.2007. Agenda for developing e-science in research libraries. http://www.arl.org/bm~doc/ARL_EScience_final.pdf (accessed June 2, 2011).

Borgman, C. L., J. C. Wallis, and N. Enyedy. 2007. Little science confronts thedata deluge: Habitat ecology, embedded sensor networks, and digital libraries.International Journal on Digital Libraries 7 (1/2): 17–30.

Hey, T., and J. Hey. 2006. E-science and its implications for the library community.Library Hi Tech 24 (4): 515–528.

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 399

Mullins, J. L. 2007. Enabling international access to scientific data sets: Creation ofthe Distributed Data Curation Center (D2C2). Libraries Research Publications85. http://docs.lib.purdue.edu/lib_research/85 (accessed May 31, 2011).

National Science Foundation, Blue Ribbon Advisory Panel on Cyberinfrastructure.2003. Revolutionizing science and engineering through cyberinfrastructure,May 1–2, 2006. http://www.nsf.gov/od/oci/reports/atkins.pdf (accessed June 6,2011).

National Science Foundation, Cyberinfrastructure Council. 2007. Cyberinfrastructurevision for 21st century discovery. http://www.nsf.gov/od/oci/CI_Vision_March07.pdf (accessed June 6, 2011).

Soehner, C., C. Steeves, and J. Ward. 2010. E-science and data supportservices: A study of ARL member institutions. http://www.arl.org/bm~doc/escience_report2010.pdf (accessed May 25, 2011).

Steinhart, G., et al. 2008. Digital research data curation: Overview of issues, curr-ent activities, and opportunities for the Cornell University Library. E-mail: eCom-mons@Cornell. http://hdl.handle.net/1813/10903 (accessed May 25, 2011).

University of Houston. 2010. Plan to achieve recognition as a National ResearchUniversity.

APPENDIX: DATA ASSESSMENT INSTRUMENT

Please limit your responses to a single NSF or NIH grant-related project. Pleasekeep in mind that we are interested in the raw data generated by this project,and not on published output.

Project Information

1. Your name:2. Indicate your role in this project:

• principal investigator or co-pi• research/academic faculty• research staff• postdoctoral• graduate assistant• it specialist• other (please specify)

3. Project name (grant title):4. Briefly describe the project you are using to answer this survey:5. Briefly describe the data you are using to answer this survey:6. Indicate the status of this project:

• In planning stage• In progress• Completed• Other (please specify)

7. Identify the departments, schools, or research centers, as well as externalpartners affiliated with this project:

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

400 C. Peters and A. R. Dryden

Data Lifecycle/Workflow

8. Describe any planning you did prior to collecting data (e.g., determiningvariables to code, controlled language, etc.)

9. Outline the major stages/milestones of your project.

Data Characteristics

10. Choose all of the following formats that best describe your research data(examples of specific file extensions are included):• Audio (.aif, .iff, .mp3, .wav)• Computer aided design / CAD (.dwg, .dxf, .pln)• Data (.csv, .dat)• Data – Statistical / SAS, SPSS (.sav, .sdq, .spv)• Data – XML (.xml)• Database (.db, .mdb, .pdb, .sql)• Geographic Information Systems / GIS (.gpx, .kml)• Image (.bmp, .gif, .jpg, .png, .ps, .psd, .svg, .tif)• Scanned documents (.pdf)• Spreadsheets (.wks, .xls)• Text (.doc, .docx, .log, .rtf, .txt)• Video (.avi, .mov, .mp4)• Web (.html, .xhtml)• Don’t know• Other (please specify)

11. Indicate the approximate amount of data the project is expected to generate• 1–500 gigabytes (GB)• 500–1000 GB• 1–500 terabytes (TB)• 500–1000 TB• 1–500 petabytes (PB)• >500 PB• Don’t know• Other (please specify)

12. If you have multiple data formats, please identify the stage(s) of your projectassociated with each format.

Data Management

13. Identify who is responsible for managing the data associated with this project(check all that apply):• PI or co-PI• IT staff within your school or research center• Other designated person on project• Collaborative responsibility• External project partners• Third-party data center

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 401

• No one• Don’t know• Other (please specify)

14. Indicate where the data generated by this project are currently stored (chooseall that apply):• Hard drive of the instrument which generates the data• PC hard drive• External hard drive• Departmental server• CD/DVD• USB flash drives• Internet-based storage (e.g., cloud or grid storage)• Don’t know• Other (please specify)

15. Indicate how often backups are made for the data associated with this project(check all that apply):• Hourly• Daily• Weekly• Monthly• Annually• Never• Don’t know• Other (please specify)

16. Identify how long you plan on keeping the data associated with this project:• <1 year• 1–5 years• 5–10 years• >10 years• Indefinitely• Don’t know

17. Please explain why you plan on keeping the data for this amount of time:18. Designate if you have a data management plan or policy:

• Yes (proceed to question 18)• No (proceed to question 19)• Don’t know

19. If you do have a data management plan or policy, indicate the reasons why(check all that apply):• Required by IRB• Required by funding agency• Required by school or research center• Other (please specify)

20. If you do not have a data management plan or policy, indicate the reasons why(check all that apply):• Lack of information about data management plans• Not necessary• Other (please specify)

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

402 C. Peters and A. R. Dryden

Data Organization

21. Please describe the methods used to organize the data, including specificmetadata standards, naming conventions, etc.

22. Are these methods of data organization sufficient for others in yourfield?

Data Use

23. Identify all of the following who currently have access to all or part of the rawdata associated with this project (check all that apply):• Researchers, students, and staff working on the project• Other members of the affiliated school or research center• Researchers at other institutions• Project sponsors• General public• Don’t know• Other (please specify)

24. Identify all of the following with whom you would like to share raw projectdata if you had the ability to do so (check all that apply):• Researchers, students, and staff working on the project• Other members of the affiliated school or research center• Researchers at other institutions• Project sponsors• General public• Don’t know• Other (please specify)

25. If you do not want to share the raw data associated with this project, please tellus why (check all that apply):• Confidential, proprietary or classified information• Intellectual property concerns• Possible misinterpretation of data• Time or effort required to make data available• Lack of appropriate tools for sharing or publishing data• Other (please specify)

26. Choose all of the following methods you use to share all or part of the raw dataassociated with this project (check all that apply):• Collaborative web space (e.g., wiki, blog, Google Docs)• Data portal or database driven web site• E-mail• External storage device (e.g., USB drive, CD/DVD)• Hard copy or print• Don’t share data• Don’t know• Other (please specify)

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4

Campus Research Data Management and the Library 403

Support for Research Data Management

27. What do you see as the libraries’ role (if any) in data services?28. Would you be willing to participate in a follow-up interview regarding research

data management?29. Please indicate on this table which services might be useful in regard to the

management of your research data and from which campus entities you wouldexpect to receive such services (see Figure A1).

30. Please provide any additional comments regarding research data management,potential library data curation services, or this survey.

FIGURE A1 Services that Might Be Useful in Managing Research Data.

Dow

nloa

ded

by [

69.2

6.46

.21]

at 0

7:10

16

June

201

4