Data and Donuts: How to write a data management plan

26
How to write a data management plan C. Tobin Magle, PhD Sept. 29, 2016 10:00-11:00 a.m. Morgan Library Computer Classroom 173 *inspired by content from CU Boulder research computing

Transcript of Data and Donuts: How to write a data management plan

Page 1: Data and Donuts: How to write a data management plan

How to write a data

management plan

C. Tobin Magle, PhDSept. 29, 2016

10:00-11:00 a.m.Morgan Library Computer

Classroom 173

*inspired by content from CU Boulder research computing

Page 2: Data and Donuts: How to write a data management plan

What is research data?

• “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings”

- White House Office of Management and Budget

• Reality: anything that is a (digital) product or your research

Page 3: Data and Donuts: How to write a data management plan

What is a data management plan?

A description of how you plan to describe, preserve and share your research data.

Often required by funding agencies

Page 4: Data and Donuts: How to write a data management plan

DMPTool

• Review requirements from different agencies

• https://dmptool.org/guidance

• Create new DMPs based on funding agency templates

• Search public DMPs

Page 5: Data and Donuts: How to write a data management plan

Successful DMPs include

• A data inventory, including type(s) and size

• A strategy for describing the data

• A plan for preserving the data long term

• A method for access to the data

Always make sure to follow funder requirements

Page 6: Data and Donuts: How to write a data management plan

Data inventory

• What type of data are you going to collect?

• What file type will be produced?

• What size will these files be? How many files?

• What other research outputs will be produced?• Code/Software?• Templates/protocols?

Page 7: Data and Donuts: How to write a data management plan

Data inventorymiRNA sequences

FASTQ files

1 GB per filex 64 strainsx 3 replicates-------------------~200 GB

R scripts for analysis and visualization

Data use tutorials

• What type of data are you going to collect?

• What file type will be produced?

• What size will these files be? How many files?

• What other research outputs will be produced?• Code/Software?• Templates/protocols?

Page 8: Data and Donuts: How to write a data management plan

Data formats

• Avoid proprietary formats• Know what software can read your data

Proprietary Format Alternative FormatExcel (.xls, .xlsx) Comma Separated Values (.csv)Word (.doc, .docx) plain text (.txt)PowerPoint (.ppt, .pptx) PDF/A (.pdf)Photoshop (.psd) TIFF (.tif, .tiff)Quicktime (.mov) MPEG-4 (.mp4)MPEG 4 Protected audio (.m4p) MP3 (.mp3)

Page 9: Data and Donuts: How to write a data management plan

Exercise: Data InventoryWhat kind of data are you going to collect?

What file type will be produced?

What size will these files be? How many files?

What other research outputs will be produced?

Page 10: Data and Donuts: How to write a data management plan

A strategy for describing the data

• Metadata: Relevant information for re-creation and re-use

• Contact info• How data was collected• Details about collection• Date, location of collection• Units

• Can be as simple as a text file

Page 11: Data and Donuts: How to write a data management plan

Genomics example (README)This project contains next-generation miRNA sequencing data from 64 mouse strains.

Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit. They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed in R using Bioconductor.

The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The scripts used to analyzed the data are available on github (URL). Tutorials for data use will be made available in the Digital Collections of Colorado (handle).

Contact Tobin Magle ([email protected]) for more information. http://orcid.org/0000-0003-3185-7034

Page 12: Data and Donuts: How to write a data management plan

Metadata standards• Dublin Core: http://dublincore.org/documents/dcmi-terms/

• Can be applied to anything

• Many discipline specific metadata standards• EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html• MIAME: http://fged.org/projects/miame/

• Search for other standards: • http://www.dcc.ac.uk/resources/metadata-standards• https://biosharing.org/standards/

Page 13: Data and Donuts: How to write a data management plan

Genomics example (NCBI template)

Page 14: Data and Donuts: How to write a data management plan

Exercise: Describe your dataWhat do people need to know to reuse your data?

Are there any discipline-specific metadata standards?

What format will you describe your data in (text, XML, tabular)?

What fields will you include (author, date, format, identifier?)

Page 15: Data and Donuts: How to write a data management plan

A plan for preserving the data long term

• What will you do to ensure data are properly stored and preserved?

• Include metadata and other products needed for reuse

• Might change over course of the project

Page 16: Data and Donuts: How to write a data management plan

Preservation questions

• What will you store?

• Who will be in charge?

• How long will you store it?

• Where will you store it? • Multiple copies

Page 17: Data and Donuts: How to write a data management plan

Recommendations for backing up data

• Store in geographically distinct locations

• Automation: Will you remember to do it manually?

• Security: Are you working with PHI?

Page 18: Data and Donuts: How to write a data management plan

Exercise: Preservation planWhat will you store?

Who will be responsible for the data (person or position)?

How long will you store it?

Where will you store it?

How will you back it up?

Page 19: Data and Donuts: How to write a data management plan

A method to access the data

• Important to funding agencies• Reproduce existing research• Promote further research

• Must be easily available: • No “by request only”• Embargoes are “ok”

• Data security: consider privacy and IP issues before sharing

Page 20: Data and Donuts: How to write a data management plan

Data access and sharing best practices

• Non-proprietary formats

• Include metadata

• Proper storage

• Stable identifier

• Licensing: conditions for reuse

Page 21: Data and Donuts: How to write a data management plan

Trusted Repositories: store and share• Discipline specific repositories

• Search: http://service.re3data.org/browse/by-subject/

• Generic: • Figshare - https://figshare.com/• Dryad - http://datadryad.org/

• CSU Digital Repository:• http://lib.colostate.edu/digital-collections/ http://

67.media.tumblr.com/6228cbe58a9652f1a85e8ab1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png

Page 23: Data and Donuts: How to write a data management plan

Stable identifiers

• URLs break

• Stable identifiers are permanent in a database

• Some provide linking capabilities• DOI – https://

doi.org/10.1109/5.771073

• Handle- http://hdl.handle.net/10217/177356

Page 24: Data and Donuts: How to write a data management plan

Licensing

• State your conditions for reuse• Paper citation?

• Disclaimers

• Must justify limitations, describe how you’ll advertise them

• Creative common licenses are a good starting point

Page 25: Data and Donuts: How to write a data management plan

Exercise: Access methodsWhere will people be able to access the data?

Does your discipline have a repository?What kind of stable identifier will it have?

What are the conditions for reuse?Are there any limitations to use of these data? Why?

Page 26: Data and Donuts: How to write a data management plan

Need help?

• Email: [email protected]

• DMPTool: http://dmptool.org/

• Data Management Services website: http://lib.colostate.edu/services/data-management

• Being updated