Post on 01-Nov-2014
description
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Digital Curation 101
University of Glamorgan21 January 2013
Michael DayDigital Curation Centre
UKOLN, University of Bathm.day@ukoln.ac.uk
http://www.dcc.ac.uk/
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Agenda• Part 1. Introduction to research data management:
activities, roles and requirements• Exercise: Data management quiz• Part 2. Developing data policies and services• Exercise: Developing a roadmap• Part 3: DMP Online tool and guidance
• With thanks to Joy Davidson, Sarah Jones and Kerry Miller (DCC)
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Introduction to Research Data Management: activities, roles and
requirements Michael Day and Kerry Miller
Digital Curation Centre
UKOLN, University of Bath
m.day@ukoln.ac.uk
http://www.dcc.ac.uk/
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
A Quick Introduction• What is research data management? • Who is involved and how? • What skills and support are needed?
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
What is Research Data Management?• Caring for,• Facilitating access to,• Preserving and • Adding value to digital
research data throughout its lifecycle.
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Typical Activities• Creation and sharing of data
• File naming and description
• Dealing appropriately with sensitive data
• Data storage
• Appraisal, selection and disposal
• Data licensing
• Data management planning
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
What are the main drivers?• National and international policy development
• The Organisation for Economic Co-operation and Development describes data as a public good that should be made available
• Research Councils UK in its Code of Good Research Conduct says data should be preserved and accessible for 10 years +
• The data management policies of funding bodies are increasingly demanding of institutional commitment and provisions ...
• The needs of• Researchers
• Institutions
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Benefits to researchers• Scholarly communication/access to data • Re-purposing and re-use of data • Stimulating new networks/collaborations & • new research • Knowledge transfer to industry • Verification of research/research integrity • Re-purposing data for new audiences • Secure storage for data intensive research • Availability of data underpinning journal articles • Increased visibility/citation
Keeping Research Data Safe Factsheet http://www.beagrie.com/KRDS_Factsheet_0910.pdf Keeping Research Data Safe Factsheet http://www.beagrie.com/KRDS_Factsheet_0910.pdf
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
The researcher perspective• Managing and sharing data is simply part of good
research:• Adhering to disciplinary and/or institutional codes of practice
and policies• Has been practiced since the advent of modern science, but
not always consistently; data intensive research makes it even more critical
• Meeting the specific requirements of funding bodies
• Reputational risks if data management is not handled properly
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Institutional drivers• Safeguarding research integrity• Increasing number of FOI requests for data• Adhering to existing codes of research practice and ethics • Developing new institution-wide strategies, policies and services
for data storage and management• Increased institutional focus on research management (e.g., in
response to REF) • Benchmarking – self-assessing infrastructure and planning for
improvement • More demands but less resources to work with
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Research codes of practice (1)• UK Research Integrity Office Code of Practice for
Research (2009)Data management planning is an essential part of research design
Organisations should have in place procedures, resources (including physical space) and administrative support to assist researchers in the accurate and efficient collection of data and its storage in a secure and accessible form [3.12.5]
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Research codes of practice (2)• RCUK Code of Conduct on the Governance of Good
Research Conduct (2011)Primary data and research evidence [should be made] accessible to others for reasonable periods after the completion of the research: data should normally be preserved and accessible for 10 yrs (in some cases 20 yrs or longer)
Responsibility for proper management and preservation of data and primary materials is shared between the researcher and the research organisation [although deposit within national collections is endorsed]
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Research funding bodies• UK Research Councils
• Help fund some data archives, e.g.:• Archaeology Data Service, European Bioinformatics
Institute, the NERC data centres, UK Data Archive• Support for JISC (and DCC)• RCUK Common Principles on Data Policy
• Recognises that data are a critical output of the research process
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
RCUK Principles (in a nutshell)• Publicly funded research data should be made openly available
• Data with acknowledged long-term value should be preserved and remain accessible and usable for future research
• Sufficient metadata should be recorded to enable other researchers to find and understand the research to enable re-use; published results should always include information on how to access the supporting data
• Recognition that there may be legal, ethical and commercial constraints
• Recognition that researchers may need privileged use of data for a limited period
• All users of research data should acknowledge their sources
• Appropriate to use public funds to support MRD
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Funder expectations• Institutions need to inform themselves about main
funder policies (mandates) with respect to research data management
• There is an explicit link between research income and appropriate data management infrastructures
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Funder policies
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
EPSRC expectations (1)• EPSRC policy (2011) expected all institutions
receiving grant funding:• To develop a roadmap aligning their policies and processes
with EPSRC’s expectations by 1st May 2012• To be fully compliant with these expectations by 1st May
2015
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
EPSRC expectations (2)• Appropriate metadata (including unique IDs) to be made
freely available on the Internet within 12 months of data generation
• Data not generated in digital format should be stored in a manner to facilitate it being shared
• Data should be securely preserved for a minimum of 10 years after privileged access expires or the last date access was requested by a third party
• Adequate resources from existing funding streams• EPSRC will monitor progress and compliance, and reserves
the right to impose appropriate sanctions
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Implications for researchers• Increasing number of research councils and funding bodies with data
management and sharing requirements
• Potential loss of research income if these mandates are not met
• Need to determine the costs associated with short and longer-term management and curation and to request funds as part of grant
• Responsibility for infrastructure shifting more to HEIs and less to centralised data archives, but institutional infrastructures and services are still emerging
• Need guidance - some good external support
• But also need more local support; often fragmented (need to draw upon existing channels within your institution wherever possible)
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Activities, roles, requirements (1)• Requirements gathering
• Identifying researchers’ data requirements• Developing a shared understanding of what needs to be
done (e.g., identifying where data exist, its form and scale, any existing retention requirements)
• Identifying good practice within the institution (and the opposite)
• Methods: surveys, focus groups, case studies, joint R&D projects, assessment tools (e.g. DAF)
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Activities, roles, requirements (2)• Identifying motivations and benefits
• For researchers, support services, the institution
• Identifying risks• Data loss (institution, research group, individual)• Increased costs (lack of planning, service inefficiency, data
loss)• Legal compliance (research funder, H&S, ethics, FoI)• Reputation (institution, unit, individual)
• Identifying costs• Keeping Research Data Safe (KRDS) toolkit
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Activities, roles, requirements (3)• Assessing institutional preparedness
• Identifying institutional stakeholders, existing data support services, gaps
• Benchmarking and planning for the future
• Skills audit
• DCC CARDIO tool
• Policy development• Policies – approval by senior management is just the start; policies
need to be embedded in research practice and responsive to changing requirements
• Data management planning• DMP online, DCC How-to Develop a Data Management Plan guide
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Activities, roles, requirements (4)• Implementation and service development
• Integrating where possible with existing services, e.g. IR, CRIS, VRE, HPC, cloud services, social media, etc.
• Appraisal, deciding what needs to be kept and for how long• Storage choices – no one-size-fits-all solution, e.g. Bristol’s
BluePeta petascale storage facility, Bath’s X-Drive approach, cloud approaches
• Data documentation and metadata – layered approaches: top-level discovery (core metadata, collection/experiment-level?), role of standards like DCMI, CERIF, DDI, etc.
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Activities, roles, requirements (5)• Data issues:
• Appraisal: selection criteria, retention periods (who decides?)• DCC How to appraise and select research data for
curation guide• Documentation: metadata, schema, semantics• Formats: proprietary formats, community standards, etc.• Provenance and authenticity• Citation (assignment of persistent IDs?) • Access (embargo policies?)• Licensing
• DCC How to license research data guide
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Who are involved?• Funding bodies• Archives / long-term data repositories• At institutions:
• Senior management• Researcher(s)• Research support officers / project staff• Lab technicians• Librarians / Data Centre staff• Faculty ethics committees• Institutional legal / IP advisors• FOI officer / DPA officer / records manager• Computing support• Institutional compliance officers
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Approaching the Issue• What data exist and are being created?• Where are greatest recoups on investment available?
• Training?• Storage?• Policy development
• What are the requirements?• Who needs to be involved?
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Making the most of what we’ve got
• Local expertise more widespread than you think• Ethics committees
• Data protection office
• IT Services
• Repository Service
• If you need help, ask!
From University of Glasgow’s Data Management micro-site
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Data management planning• A plan to address critical data management issues:
• What data will be created (format, types) and how?
• How will the data be documented and described?
• How will ethics and intellectual property considerations be addressed?
• What are the plans for data sharing and access?
• What is the strategy for long-term preservation?
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Integrating is a tricky business• Make a sound case for investing in data management training • Draw upon existing policies and mandates wherever you can• Spend some time identifying current data holdings, researchers’
practice and future training needs• Make sure you are putting your effort where it will count• Don’t reinvent the wheel – augment or adapt existing training
and support materials with data management aspects
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
What the DCC can help with
Delivering support
Customised Data Management Plans – templates / guidance to be added to DMP Online
Training – institutional/disciplinary tailored courses, online resources
Incremental – repackaging existing support to raise awareness and make guidance more meaningful to researchers
Developing strategic institutional RDM framework
Strategy development – getting key people together to discuss/plan for RDM
Policy development – scoping, defining, embedding research data policies
Costing - assist with the development of costing and pricing for RDM services
Risk management - identify risks in RDM practice and recommend mitigations
Institutional data catalogues - recommend options for exposing metadata about your research data via CRIS systems, repositories, or a mix of these
Needs assessment
CARDIO Tool– collaborative assessment & benchmarking of RDM strengths/weaknesses
Data Asset Framework – interviews to scope current RDM practice and recommend improvements
Workflow assessment – methodology for analysing current RDM workflows
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Exercise: How are you performing?• Individually, complete the quick data management
quiz (5 mins)• Compare results, try to learn from those with
confidence in those areas in which you consider yourself to be weaker (10 mins)
• Based on your group’s discussions... • Write down one practical thing you can do at work in order to
edge towards an A.
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Part 2:Developing data policies and
services
Based on a presentation prepared by Sarah Jones (Digital Curation Centre)
sarah.jones@glasgow.ac.uk
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Outline• Who is responsible for RDM?
• What are the components of a data service?
• Learning lessons from other HEIs
• Developing roadmaps
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Who is responsible for RDM?
Research Organisations
Funders
Data centres
Advisory bodies
Support services
Researchers
Publishers
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Components of a research data service?
RDM policies
Archive
Preserve
& Share
Advocacy (senior mgmt & researcher)
Storage
Back-up
Access
Support staff & services
Research
environment&
systems
Tools
Metadata and documentation
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Data storage – Bristol example
Blue Peta at Bristol
• £2m funding to date• Petascale facility – expandable• 3 machine rooms – resilience (tape archive 2012)• Available to all researchers for research data
http://data.bris.ac.uk
1st 5TB free per Data Steward then £400 per TB p.a. for disk storage; tape backup £40 per TB
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Tools – an ‘academic dropbox’
National level negotiation via Janet brokerage?
www.dataflow.ox.ac.uk Piloted at Lincoln & Edinburgh
http://tiny.cc/owncloud-pilot
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Archiving – institutional data repositories
Not intended to replace national, subject or other
established data collections
Acknowledgment of hybrid environment
http://datashare.is.ed.ac.uk
www.dspace.cam.ac.uk/https://databank.ora.ox.ac.uk
Essex-RDR and DataPool at Southampton
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Archiving – external data centresResearch funders’ data centres…
List of data centres: http://databib.org
Structured databases
Disciplinary& community initiatives
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Data catalogues (metadata)
Develop a research dataextension to the CERIF standard
JISC & DCC planning national coordinationCan we learn lessons from overseas?
http://cerif4datasets.wordpress.com
• DataFinder at Oxford
• DDI metadata by ResearchData@Essex
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Guidance and trainingCollate guidancewww.gla.ac.uk/datamanagement
Online traininghttp://datalib.edina.ac.uk/mantra
Embed into curriculum via Doctoral Training Centres e.g. Research360@Bathhttp://blogs.bath.ac.uk/research360
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
www.dcc.ac.uk/training/train-trainer/ disciplinary-rdm-training
Disciplinary training (RDMTrain)
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Early research data policies
www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies
“Statement of commitment” Infrastructure policy
“10 commandments”mutual promises
aspirational
Baseline of RCUK Code+ procedures & support
legal compliance stylea section in uni DM policyuseful guide as appendix
Based on Edin. with a few additions
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
How are others developing policies?
Theme from MRD workshop in Leeds:
High level policy (ratified)
+
User guides, practical support
+
RDM Infrastructure
http://tiny.cc/MRD-policy-workshop
Developing data policies: a trend for 2012
http://tiny.cc/PolicyNews
(news post from Dec 2011)
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Policy development
“EPSRC expects all those it funds to have developed a clear roadmap to align their policies and processes with EPSRC’s
expectations by 1st May 2012, and to be fully compliant with these expectations by 1st May 2015.”
www.epsrc.ac.uk/about/standards/researchdata/Pages/impact.aspx
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
What is the EPSRC looking for?
• Know what you hold – publish metadata
• Link publications and data
• Share data wherever possible
• Curate and preserve valuable data
http://tiny.cc/EPSRC-data-policy
The same as other funders (i.e. good research practice) so think broadly when you develop your strategy
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Exercise: Developing a roadmap for RDM
Think about the potential components of a RDM service
Based on the strengths/weaknesses you identified in the quiz:
• Draft a list of actions needed at your institution
• Attempt to prioritise your list and pencil in timeframes (consider quick wins!)
• Decide who needs to be involved to make this happen?
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Part 3DMP Online tool and guidance
Based on a presentation prepared by Sarah Jones and Joy Davidson (DCC)
sarah.jones@glasgow.ac.uk
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Funders have DMP requirements
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Funding body requirements• Typically a short (c.1-2 pp) statement, covering:
• What data will be created (format, types, volume, avoidance of duplication)
• Standards and methodologies to be used (including metadata)
• How ethics and Intellectual Property will be addressed• Plans for data sharing and access • Strategy for long-term preservation
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
DCC support• Guidance• Examples• Tools
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
What is DMP Online? • A web-based tool to help researchers write plans• It features:
• Templates based on different requirements • Tailored guidance (disciplinary, funder etc) • Customised exports to a variety of formats• Ability to share DMPs with others
• https://dmponline.dcc.ac.uk
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Start a plan Pick relevant
fundertemplate
Get a list of their
specific questions
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Create a plan at the
bid stage
...answer the questions based on initial researchideas
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Once funded, flesh the plan
out(roles, etc)
...answer the questions based on detailed workplan
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
When project is finished
...answer the questions based on the outputs that are beingkept
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Institutional customisation
Select desired questions
Add your logo, URL, colours
http://www.dcc.ac.uk/blog/tailoring-dmp-online-for-your-institution
Profile local support, boilerplate text
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Links to specific examples
Thinks about why the questions are
being asked – what are funders looking
for?
Gives examples, local if possible
http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/framework.html
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Top tips• Encourage researchers to start early - not wait
until the last minute!• The plan will - and should - change over life of
project.• Get other support staff involved - ethics, IT,
library, RM, DP/FoI• Update the plan with project updates• Use plan as a communication tool - with
partners, funding bodies and yourself!
… because good research needs good data
DCC 101, University of Glamorgan, 21 January 2013
Funded by:
Thank you!
Any questions?
Michael Day,Digital Curation Centre
UKOLN, University of Bathm.day@ukoln.ac.uk
http://www.dcc.ac.uk/