The cultural challenge in Umbria: local institutions growing up in the cloud
Facing the Data Challenge : Institutions, Disciplines...
-
Upload
truongdiep -
Category
Documents
-
view
214 -
download
1
Transcript of Facing the Data Challenge : Institutions, Disciplines...
A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
Facing the Data Challenge : Institutions, Disciplines, Services & Risks
Dr Liz Lyon, Director, UKOLN, University of Bath, UK
Associate Director, UK Digital Curation Centre
1st DCC Regional Roadshow, Bath November 2010
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Overview1. Facing the data challenge :
Requirements, Risks, Costs
2. Reviewing Data Support Services : Analysis, Assessment, Priorities
3. Building Capacity & Capability : Skills Audit
4. Developing a Strategic Plan : Actions and Timeframe
Institutional Diversity
http://www.flickr.com/photos/mintchocicecream/7491707/
Facing the Data Challenge
http://www.flickr.com/photos/30435752@N08/2892112112/
SCARP Case studies
• Atmospheric data
• Neuro-imaging
• Tele-health
• Architecture
• Mouse Atlas
http://www.dcc.ac.uk/sites/default/files/documents/publications/SCARP%20SYNTHESIS.pdf
Recommendations:
• JISC• HE & Research funders• Publishers & Learned societies• HEIs and research institutions• Researchers & scholars
http://www.data-archive.ac.uk/media/203597/datamanagement_socialsciences.pdf
http://opus.bath.ac.uk/20896/1/erim2rep100420mjd10.pdf
• Quick & simple deposit• Software tools • Laboratory archive
• Crystallography community engaged• ‘Embargo’ facility• Structured foundations• Discoverable & harvestable
Exercise 1a: Gathering requirements
• What are the researchers’ data requirements? • What datasets exist already? Standards?• What are their data priorities? Skills?• Research methodologies? Plans?• Equipment and instrumentation? Formats?• Where are the “pain points”?
• How will you find out? Approaches to use?• How will you use the information?
Exercise 1b: Motivation, benefits, risks• What are the RDM drivers and enablers for
research staff and post-grad students? • RDM drivers and enablers for Libraries / IT /
Computing Services / Information Services?• RDM drivers and enablers for the institution? • What are the barriers? What are the risks?• How will you articulate the benefits?
• How will you find out? Approaches to use?• How will you use the information?
Exercise 1c: Costs & sustainability
• What are the costs associated with RDM? • For the researcher?• For the institution?• Direct / indirect costs? Fixed / variable costs? • What cost data already exists?• What time horizon are you considering?
• How will you find out? Approaches to use?• How will you use the information?
• Survey e.g. Oxford, Parse.Insight• Focus groups : semi-structured interviews• Case studies departmental / disciplinary • Joint R&D projects• Data champions in departments• Data Preservation readiness : AIDA tool• Data audit / assessment : DAF tool
Requirements gathering: Approaches and tools
Benefits:
Prioritisation of resources
Capacity development and planning
Efficiency savings – move data to more cost-effective storage
Manage risks associated with data loss
Realise value through improved access & re-use
Scale:Departments, institutions
Dealing with Data Report : Rec 4
• DAF Implementation Guide October 2009
• Collating lessons of pilot studies
• Practical examples of questionnaires and interview frameworks
• DAF online tool autumn 2010
http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
http://sudamih.oucs.ox.ac.uk/docs/Use%20of%20the%20DAF.pdf
http://eprints.ucl.ac.uk/15053/1/15053.pdf
Some lessons learned….
“CeRch had four false starts before finding a willing audit partner”“Pick your moment” ….“Timing is key” (avoid exams, field trips, Boards…)
Plan well in advance!“Be prepared to badger senior management”
Little documentation/knowledge of what exists:“a ni ghtmare”
Defining the scope and granularity is crucial
Collect as much information as possible in interviews/surveys
Variable openness of staff and their data
Identifying risks• Data loss (institution, research group,
individual)• Increased costs (lack of planning, service
inefficency, data loss)• Legal compliance (research funder, H&S,
ethics, FoI)• Reputation (institution, unit, individual)
Dimension 1
Direct Indirect (costs avoided)
Dimension 2
Near-term Long-term
Dimension 3
Private Public
Benefits Taxonomy: Summary
Keeping Research Data Safe2 Report: April 2010
KRDS• Which costs?• Effect over time?• Benefits taxonomy• Repository models• Case studies• Key cost variables• Recommendations
• User Guide, business templates forthcoming 2010
1. Scale, Complexity, Predictive Potential
2. Continuum of Openness3. Citizen Science4. Credentials, Incentives,
Rewards5. Institutional Readiness &
Response6. Data Informatics Capacity &
Capability
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009
•Open Science at Web-Scale
A centre of expertise in digital information management
www.ukoln.ac.uk
1. Leadership 2. Policy 3. Planning
4. Audit
6. Repositories & Quality assurance
8. Access & Re-use
10. Community building
9. Training & skills
Data Informatics
Top 10
5. Engagement7. Sustainability
Exercise 2: Analysis, Assessment, Priorities
• Institutional stakeholders?• Data support services?• Range, scope, coverage?• Gaps?• Fitness for purpose?• Timeliness?• Resources?• Skills?
• SWOT
Digital Preservation Policies Study
High-level pointers and guidance
Outline policy model/framework
Mappings to institutional strategies
ExemplarsReport October 2008
State-of-the-Art Report : Models & Tools (Alex Ball, June 2010)
• Data Lifecycles• Data Policies (UK) incl DMP• Standards & tools• Data Asset Framework (DAF) • DANS Seal of Approval• Preservation metadata• Archive management tools• Cost / benefit tools
Jeff Haywood, RDMF V October 2010 http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF5/Haywood.pdf
Jeff Haywood, RDMF V October 2010 http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF5/Haywood.pdf
Jeff Haywood, RDMF V October 2010 http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF5/Haywood.pdf
Assessing cloud options
3 JISC Reports in 2010 :• Technical Review
• Cloud computing for research
• Environmental & Organisational issues
• North Carolina universities
• Cyber-infrastructure project
• Data cloud across three campuses
• “regional”
• Policy & practice
• Data types, formats, standards, capture• Ethics and Intellectual Property• Access, sharing and re-use• Short-term storage & data management• Deposit & long-term preservation• Adherence and review
Planning Dealing with Data Report : Rec 9
Checklist questions mapped to funder’s data requirements
Checklist for a Data Management Plan
Slide : Martin Donnelly, DCC
DMP Online v2.0 (coming soon)
• Cleaner interface• Funder-specific guidance• Versioning feature• CSV output
Slide : Martin Donnelly, DCC
http://www.dcc.ac.uk/dmponline
DMPs next steps?
• Embed DMPs in funder policies & research lifecycles as the norm
• Code of Conduct for Research• Assess & review DMPs (not just
the science content of proposals)• Educate reviewers (DCC guidance
for social science in prep)• Manage compliance of researchers• Infrastructure to share DMPs
• Integrate in institution research management information system
Data challenges?1. Data management plans2. Appraisal: selection criteria3. Data retention and handover4. Data documentation: metadata,
schema, semantics5. Data formats: applying standards6. Instrumentation: proprietary formats 7. Data provenance: authenticity8. Data citation & versions: persistent IDs 9. Data validation and reproducibility10. Data access: embargo policy11. Data licensing12. Data linking: text, images, software
Exercise 3: Skills Audit• What skills do you have in house?• What are your strengths? Core data skills?• Gaps? Do these matter?• Can / should they be developed?• How? Resource implications?• Other sources of expertise?• Key partnerships?• Team science roles?
Data Access & Re-use
“Community Criteria for Interoperability”(Scaling Up Report 2008)
• Domain data format standard: CIF• Domain data validation standard: CheckCIF• Metadata schema: eCrystals Application Profile
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• Crystallography Data Commons: TIDCC Data Model in development
• Domain identifier: International Chemical Identifier • Citation & linking: DOI
http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
• Embargo & Rights http://ecrystals.chem.soton.ac.uk/rights.html
Data Licensing
• Bespoke licences• Standard licences• Multiple licensing• Licence mechanisms
• Forthcoming 2010
Quality AssuranceTrust
Standards
Audit and certification tools
• TRAC
• DRAMBORA
• PLATTER
• NESTOR
• DANS Data Seal of Approval
Sustainability
PREMIS Data Dictionary
OAIS
• Representation Information
• Registry/Repository RRORI
http://www.loc.gov/standards/premis/
Training
• Consortial• Institutional• Departmental
• Laboratory• Project
• Library• Computing Services
• Research staff / postdocs• Postgraduate students
“excellent : probably the best course I have been on since starting my role as an Informatics Liaison Officer”
Research Data Management Forum http://www.dcc.ac.uk/data-forum/
1) Roles & Responsibilities 2) Value & Benefits
3) Sensitive Data: Ethics, Security, Trust 4) Economics of Applying & Sustaining
digital curation
Optimising organisational support
• Organisational structures• Library / IT / IS / research support structure• Where does data management fit?• Leadership?• Co-ordination?
• Roles : data librarian, data manager, research support officer, data scientist, data curator...
• New roles?
Exercise 4: Actions and Timeframe
• Vision and Objectives: Are they clear?• Organisational structures: Fit for purpose?• Library / IT / IS structure : Is it optimal?• Roles : who is best placed to take action?• Responsibility : for each service / activity?• Priorities : what will you stop doing?• Resources : Do you need to bid for funding?• Partnerships : Who do you need to talk to?• Plan: What? Who? How? When?
Actions and TimeframeShort-term0-12 months
Medium-term12-36 months
Long-term>3 years
• Identify quick wins• What can you do tomorrow?
Take homes1. Understand the research data
requirements of your campus / institutional consumers
2. Agree research data service delivery priorities
3. Define data roles and responsibilities
4. Collaborate and strengthen the data support provided
5. Be pro-active! Engage! Be part of team science!
http://www.pnl.gov/science/images/highlights/computing/biopilotlg.jpg