Coping with Data for WHOI JP Students

62
Coping With Your Data Carly Strasser California Digital Library [email protected] WHOI 10 April 2014 Tips & Tools

description

Data management best practices presentation for JP Students at Woods Hole Oceanographic Institution, 12 April 2014.

Transcript of Coping with Data for WHOI JP Students

Page 1: Coping with Data for WHOI JP Students

Coping With Your Data

Carly Strasser California Digital Library [email protected]

WHOI 10 April 2014

Tips & Tools

Page 2: Coping with Data for WHOI JP Students

Roadmap

3. Toolbox

1. Background

2. Best practices

Page 3: Coping with Data for WHOI JP Students

C. S

trass

er

Page 4: Coping with Data for WHOI JP Students

From Flickr by robertpaulyoung

Scientists are bad at data management.

Page 5: Coping with Data for WHOI JP Students

Many tables

Page 6: Coping with Data for WHOI JP Students

Embedded figures

Page 7: Coping with Data for WHOI JP Students

my spreadsheet

No headings

Page 8: Coping with Data for WHOI JP Students

my spreadsheet

Page 9: Coping with Data for WHOI JP Students

my spreadsheet

Page 10: Coping with Data for WHOI JP Students
Page 11: Coping with Data for WHOI JP Students

?

Page 12: Coping with Data for WHOI JP Students

From Flickr by ransomtech

Didn’t share the data Didn’t document the data (metadata) Didn’t document provenance/workflow

Page 13: Coping with Data for WHOI JP Students

From Flickr by ransomtech

Reproducibility Transparency Reuse NO

Page 14: Coping with Data for WHOI JP Students

From Flickr by johntrainor

Why should I care?

Page 15: Coping with Data for WHOI JP Students

Because they care:

From Flickr by Redden-McAllister

Page 16: Coping with Data for WHOI JP Students

the Truth

From

san

dier

past

ures

.com

Data management Metadata Data repositories Data sharing

You need to know

about

Page 17: Coping with Data for WHOI JP Students

… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”

Feb 2013

Page 18: Coping with Data for WHOI JP Students

1.  Maximize free public access 2.  Ensure researchers create data

management plans 3.  Allow costs for data preservation and

access in proposal budgets 4.  Ensure evaluation of data management

plan merits 5.  Ensure researchers comply with their data

management plans 6.  Promote data deposition into public

repositories 7.  Develop approaches for identification and

attribution of datasets 8.  Educate folks about data stewardship

From Flickr by Joe Crimmings Photography

Page 19: Coping with Data for WHOI JP Students

data management

From

Flic

kr b

y Bi

g Sw

ede

Guy

Best Practices

Page 20: Coping with Data for WHOI JP Students

From Flickr by Mark Sardella

Plan before data collection

Page 21: Coping with Data for WHOI JP Students

•  Create a key (data dictionary) •  Make sure names are unique •  Define codes

From

Flic

kr b

y ze

bbie

Planning Design sample naming scheme

Page 22: Coping with Data for WHOI JP Students

PhDcomics.com

Planning Design file naming scheme

Page 23: Coping with Data for WHOI JP Students

Use descriptive file names •  Unique •  Reflect contents

From  R  Cook,  ESA  Best  Practices  Workshop  2010  

Bad: Mydata.xls 2001_data.csv best version.txt

Better: Eaffinis_nanaimo_2010_counts.xls

Site name

Year What was measured

Study organism

*Not for everyone

*

Planning Design file naming scheme

Page 24: Coping with Data for WHOI JP Students

From S. Hampton

Planning Design file organization

Page 25: Coping with Data for WHOI JP Students

Biodiversity

Lake

Experiments

Field work

Grassland

Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv …

From S. Hampton

Planning Design file organization

Consider… •  Dependencies? •  File formats? •  Time of collection? •  Order of analysis?

Workflows!

Page 26: Coping with Data for WHOI JP Students

Planning

Constrain entries Atomize Break down spreadsheets

Design your spreadsheet

From Flickr by Ulleskelf

Page 27: Coping with Data for WHOI JP Students

A relational database is A set of tables Relationships among the tables A language to specify & query the tables

A RDB provides

Scalability: millions+ records Features for sub-setting, querying, sorting Reduced redundancy & entry errors

From Mark Schildhauer

Planning Consider a database

Page 28: Coping with Data for WHOI JP Students

You should invest time in learning databases if your data sets are large or complex

Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old

Planning

From Mark Schildhauer

Consider a database

Page 29: Coping with Data for WHOI JP Students

Store your data in a repository Institutional archive

Discipline/specialty archive

Pick a data repository

From Flickr by torkildr

Ask a librarian

Repos of repos: databib.org re3data.org

Planning

Page 30: Coping with Data for WHOI JP Students

From

Flic

kr b

y se

pa s

ynod

From Flickr by taberandrew

From Flickr by withassociates

What software? What hardware? What personnel?

How often? Set up reminders!

Test system

Decide on preservation/backup Planning

Page 31: Coping with Data for WHOI JP Students

…document that describes what you will

do with your data throughout

the research project

From Flickr by Barbies Land

Write a data management plan!

Planning

Page 32: Coping with Data for WHOI JP Students

DMP components

But they all have different requirements and express them in

different ways

•  What will be collected •  Methods •  Standards •  Metadata •  Sharing/access •  Long-term storage

Planning

From Flickr by Barbies Land

Page 33: Coping with Data for WHOI JP Students

Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community

dmptool.org Planning

Page 34: Coping with Data for WHOI JP Students

During Data Collection & Entry

From Flickr by Julia Manzerova

Page 35: Coping with Data for WHOI JP Students

Realistically: •  Archive .csv version of raw data •  Make a “raw” tab in working data file •  Do all work on other tabs

During collection Keep raw data raw

Page 36: Coping with Data for WHOI JP Students

Raw data as .csv

R script for processing & analysis

During collection

Ideally: •  Use scripts to process data •  Save them with data

Keep raw data raw

Page 37: Coping with Data for WHOI JP Students

During collection Document your workflow

Temperature data

Salinity data

Data import into Excel

Analysis: mean, SD

Graph production

Quality control & data cleaning “Clean” T

& S data

Summary statistics

Data in spread-sheet

Workflow: how you get from the raw data to the final products of your research

Simple workflow: flow chart

Page 38: Coping with Data for WHOI JP Students

During collection

Workflow: how you get from the raw data to the final products of your research

Simple workflow: commented script

•  R, SAS, MATLAB… •  Well-documented code is

Easier to review Easier to share Easier to use for repeat analysis

# % $

&

Document your workflow

Page 39: Coping with Data for WHOI JP Students

Fancy schmancy workflows Resulting output

https://kepler-project.org

During collection Document your workflow

Page 40: Coping with Data for WHOI JP Students

Workflows enable •  Reproducibility •  Transparency •  Reuse

From Flickr by merlinprincesse

During collection Document your workflow

Page 41: Coping with Data for WHOI JP Students

Constrain data entries •  Excel lists •  Data validation •  Google docs forms

Modified from K. Vanderbilt

During collection

Page 42: Coping with Data for WHOI JP Students

Atomize During collection

One piece of information per cell

Page 43: Coping with Data for WHOI JP Students

Create parameter table

From doi:10.3334/ORNLDAAC/777

From doi:10.3334/ORNLDAAC/777

From R Cook, ESA Best Practices Workshop 2010

During collection Break down spreadsheets

Fake a relational database

Create a site table

Page 44: Coping with Data for WHOI JP Students

Why are you promoting

Excel?

During collection Create metadata

Page 45: Coping with Data for WHOI JP Students

Metadata: data reporting

WHO created the data? WHAT is the content

of the data set? WHEN was it created? WHERE was it collected? HOW was it developed? WHY was it developed?

From

Flic

kr b

y /\

/\ich

ael P

atric

|{

During collection Create metadata

Page 46: Coping with Data for WHOI JP Students

Digital context •  Name of the data set •  The name(s) of the data file(s) in the data set •  Date the data set was last modified •  Example data file records for each data type

file •  Pertinent companion files •  List of related or ancillary data sets •  Software (including version number) used to

prepare/read the data set •  Data processing that was performed Personnel & stakeholders •  Who collected •  Who to contact with questions •  Funders

Scientific context •  Scientific reason why the data were

collected •  What data were collected •  What instruments (including model & serial

number) were used •  Environmental conditions during collection •  Temporal & spatial resolution •  Standards or calibrations used

Information about parameters •  How each was measured or produced •  Units of measure •  Format used in the data set •  Precision & accuracy if known

Information about data •  Definitions of codes used •  Quality assurance & control measures •  Known problems that limit data use (e.g.

uncertainty, sampling problems)

During collection Create metadata

Page 47: Coping with Data for WHOI JP Students

•  Provide structure to describe data Common terms | definitions | language | structure

•  Come in many flavors EML , FGDC, ISO19115, DarwinCore,…

•  Can be met using software tools Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)

What is metadata?

Metadata standards…

During collection

Standard < Create metadata

Page 48: Coping with Data for WHOI JP Students

Back up daily During collection

From Flickr by lippo

From Flickr by see phar

Original Near

Far

Page 49: Coping with Data for WHOI JP Students

During collection

From Flickr by Barbies Land

Remember that data management plan?

Revisit Review Revise

Page 50: Coping with Data for WHOI JP Students

During collection

Schedule a time each week or month

Revisit Review Revise

From Flickr by purplemattfish

Page 51: Coping with Data for WHOI JP Students

From

Flic

kr b

y di

pste

r1

Toolbox

Page 52: Coping with Data for WHOI JP Students

Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community

dmptool.org Write a DMP

Page 53: Coping with Data for WHOI JP Students

databib.org

Where should I put my data?

Find a repository

Page 54: Coping with Data for WHOI JP Students

Get help

From

Flic

kr b

y th

ewm

att

Page 55: Coping with Data for WHOI JP Students

DCXL blog: dcxl.cdlib.org Toolbox:

Get help

Page 56: Coping with Data for WHOI JP Students

From

Flic

kr b

y No

rth C

arol

ina D

igita

l He

ritag

e Ce

nter

From Flickr by Madison Guy

Get help from your library

Page 57: Coping with Data for WHOI JP Students

[email protected]

Get help from me

Page 58: Coping with Data for WHOI JP Students

From Flickr by Andy Graulund

Make a resolution • Triage on current

projects • Get advisor, lab

mates, collaborators on board • Do better next time

Page 59: Coping with Data for WHOI JP Students

From

Flic

kr b

y tw

m13

40

Culture Shift Ahead

Page 60: Coping with Data for WHOI JP Students

science source notebook content access data government knowledge

From

Flic

kr b

y cd

sess

ums

Page 61: Coping with Data for WHOI JP Students

From Flickr by dotpolka

Doing science is a privilege. Data hoarding is science malpractice.

Manage & share your data!

Page 62: Coping with Data for WHOI JP Students

Website Email

Twitter Slides

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser