Agile Data Curation: A Conceptual Framework and Approach for Practitioner
Data Management
Presenting Author: Josh Young1
Co-Authors: Karl Benedict2 and Christopher Lenhardt31. University Corporation for Atmospheric Research (UCAR) Unidata Program Center, Boulder, USA
3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA2. University of New Mexico, Albuquerque USA
ScopeImagine a project:• that includes a well-thought out and documented
data management plan, • and robust implementation of that plan through
out the project and beyond. • This talk is not for that project; it is for the rest of
us.
So why do we care about data management?
• Internal reasons: do good research, write papers, get tenure, win more grants.
• External reasons: public access & reproducibility Risk of becoming dark data (Heidorn,
2008)
Why care about external access?• Intangibles for an Investigator
• Maybe someday I’ll benefit from someone else’s data• Maybe I’ll learn something through informal dialogue• Most science funding is from public resources and should/could be
considered a public trust resource• Peer pressure
• Tangibles for an Investigator• Increased efficiency• My funders require it.
So why do we care about data management?
• Internal reasons: do good research, write papers, get tenure, win more grants.
• External reasons: greater impactAgile Curation
Workflows Internal
Public-Access Workflows
Agile Curation:• Means taking implementable steps to
improve data management for external access.
• Philosophically, it attempts to apply lessons from agile software development to data management.
Agile Curation Principles, 2nd Generation
1) Delivery, access, use and citation of research data are the primary measures of success.
2) Maximize the impact of research data through the continuous integration of curation activities
3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.
Agile Curation Principles, 2nd Generation
4) Make data open and accessible as early in the process as possible.
5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized.
6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.
Agile Curation Principles, 2nd Generation continued
7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process.
8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them.
9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.
Agile Curation Principles, 2nd Generation continued
9) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry.
10)Continuous attention to technical excellence and good design enhances agility.
11)Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.
What happens next?• Case Studies documentation:
To clarify and/or verify these principles To provide workflow examples that can
be adopted or revised for reuse• Nascent community of interest within
the Research Data Alliance
ScopeImagine a project:• that includes a well-thought out data
management plan, • and robust implementation of that plan through
out the project. • This talk is not for that project; it is for the rest of
us.
Unidata is one of the University Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded primarily by the National
Science Foundation (Grant NSF-1344155).
Background
Agile Curation Principles, 1st Generation
1) Access to data is the first goal;2) Generative value is supported (Zittrain, 2006)3) Researcher involvement through a participatory
framework that aligns data management with scientific research processes (Yarmey and Baker, 2013)
4) Projects will utilize free open-source resources to the greatest extent practical;
5) Community participation increases project capacity;
Agile Curation Principles, 1st Generation part 2
6) Data management requirements and practices evolve as the research project proceeds;
7) Bright and dedicated individuals can learn appropriate skills and respond to the demands of their particular project, as they proceed;
8) Approaches apply across scales9) Consider technical debt10) Data evaluation can be conducted through use and
feedback;
How we got here• Idea formulated during discussion of Data
Management Lifecycles at GeoData 2014• Principles drafted for AGU 2014• Two Research Data Alliance (RDA) Birds of
a Feather sessions to explore community experiences
Top Related