Agile Curation: 2015 AGU Presentation

Post on 11-Apr-2017

401 views 1 download

Transcript of Agile Curation: 2015 AGU Presentation

Agile Data Curation: A Conceptual Framework and Approach for Practitioner

Data Management

Presenting Author: Josh Young1

Co-Authors: Karl Benedict2 and Christopher Lenhardt31. University Corporation for Atmospheric Research (UCAR) Unidata Program Center, Boulder, USA

3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA2. University of New Mexico, Albuquerque USA

ScopeImagine a project:• that includes a well-thought out and documented

data management plan, • and robust implementation of that plan through

out the project and beyond. • This talk is not for that project; it is for the rest of

us.

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: public access & reproducibility Risk of becoming dark data (Heidorn,

2008)

Why care about external access?• Intangibles for an Investigator

• Maybe someday I’ll benefit from someone else’s data• Maybe I’ll learn something through informal dialogue• Most science funding is from public resources and should/could be

considered a public trust resource• Peer pressure

• Tangibles for an Investigator• Increased efficiency• My funders require it.

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: greater impactAgile Curation

Workflows Internal

Public-Access Workflows

Agile Curation:• Means taking implementable steps to

improve data management for external access.

• Philosophically, it attempts to apply lessons from agile software development to data management.

Agile Curation Principles, 2nd Generation

1) Delivery, access, use and citation of research data are the primary measures of success.

2) Maximize the impact of research data through the continuous integration of curation activities

3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.

Agile Curation Principles, 2nd Generation

4) Make data open and accessible as early in the process as possible.

5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized.

6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.

Agile Curation Principles, 2nd Generation continued

7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process.

8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them.

9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.

Agile Curation Principles, 2nd Generation continued

9) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry.

10)Continuous attention to technical excellence and good design enhances agility.

11)Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.

What happens next?• Case Studies documentation:

To clarify and/or verify these principles To provide workflow examples that can

be adopted or revised for reuse• Nascent community of interest within

the Research Data Alliance

ScopeImagine a project:• that includes a well-thought out data

management plan, • and robust implementation of that plan through

out the project. • This talk is not for that project; it is for the rest of

us.

Unidata is one of the University Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded primarily by the National

Science Foundation (Grant NSF-1344155).

Questions?

Contact me at: jwyoung@ucar.edu @unidata_josh 303-497-8646

Background

Agile Curation Principles, 1st Generation

1) Access to data is the first goal;2) Generative value is supported (Zittrain, 2006)3) Researcher involvement through a participatory

framework that aligns data management with scientific research processes (Yarmey and Baker, 2013)

4) Projects will utilize free open-source resources to the greatest extent practical;

5) Community participation increases project capacity;

Josh Young
Based on 2014 poster

Agile Curation Principles, 1st Generation part 2

6) Data management requirements and practices evolve as the research project proceeds;

7) Bright and dedicated individuals can learn appropriate skills and respond to the demands of their particular project, as they proceed;

8) Approaches apply across scales9) Consider technical debt10) Data evaluation can be conducted through use and

feedback;

How we got here• Idea formulated during discussion of Data

Management Lifecycles at GeoData 2014• Principles drafted for AGU 2014• Two Research Data Alliance (RDA) Birds of

a Feather sessions to explore community experiences