FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

41
www.dans.knaw.nl DANS is an institute of KNAW and NWO FAIR Data in Trustworthy Data Repositories: Everybody wants to play FAIR, but how do we put the principles into practice? Peter Doorn, Director DANS Ingrid Dillo, Deputy Director DANS EUDAT/OpenAIRE webinar, 12 and 13 December 2016

Transcript of FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Page 1: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

www.dans.knaw.nlDANS is an institute of KNAW and NWO

FAIR Data in Trustworthy Data Repositories:

Everybody wants to play FAIR, but how do we put the principles into practice?

Peter Doorn, Director DANSIngrid Dillo, Deputy Director DANS

EUDAT/OpenAIRE webinar, 12 and 13 December 2016

Page 2: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Who we are

Watch our videos on YouTubehttps://www.youtube.com/user/DANSDataArchiving

Page 3: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

What you can expect to learn

• General understanding of core requirements for trustworthy data repositories

• General understanding of the FAIR principles

• Introduction to a possible way of operationalizing the FAIR principles

Page 4: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

DANS and DSA

• 2005: DANS to promote and provide permanent access to digital research information

• Formulate quality guidelines for digital repositories including DANS (TRAC, Nestor)

• 2006: 5 basic principles as basis for 16 DSA guidelines

• 2009: international DSA Board

• Over 60 seals acquired around the globe, but with a focus on Europe

Page 5: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

The certification landscape

Page 6: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

DSA and WDS: look-a-likes

Communalities:• Lightweight, community review

Complementarity:• Geographical spread• Disciplinary spread

Page 7: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Partnership

Goals:• Realizing efficiencies• Simplifying assessment options• Stimulating more certifications• Increasing impact on the community

Outcomes:• Common catalogue of requirements for core repository

assessment• Common procedures for assessment• Shared testbed for assessment

Page 8: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

New common requirements

• Context (1)

• Organizational infrastructure (6)• Digital object management (8)• Technology (2)

• Additional information and applicant feedback (2)

Page 9: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

1. Organisational infrastructure

R1. The repository has an explicit mission to provide access to and preserve data in its domain.

R2. The repository maintains all applicable licenses covering data access and use and monitors compliance.

R3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.

R4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.

R5. The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission.

R6. The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in-house, or external, including scientific guidance, if relevant).

Page 10: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

2. Digital object management (1)

R7. The repository guarantees the integrity and authenticity of the data.

R8. The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users.

R9. The repository applies documented processes and procedures in managing archival storage of the data.

R10. The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.

Page 11: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

2. Digital object management (2)

R11. The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.

R12. Archiving takes place according to defined workflows from ingest to dissemination.

R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation.

R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

Page 12: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

3. Technical infrastructure

R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.

R16. The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.

Page 13: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016
Page 14: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

New requirements are out now!

http://www.datasealofapproval.org/en/news-and-events/news/2016/11/25/wds-and-dsa-announce-uni-ed-requirements-core-cert/

https://www.icsu-wds.org/news/news-archive/wds-dsa-unified-requirements-for-core-certification-of-trustworthy-data-repositories

Page 15: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Back to 2005: DSA principles

TheDSAisintendedtoensurethat:

• Thedatacanbefoundontheinternet• Thedataareaccessible(clearrightsandlicenses)• Thedataareinausableformat• Thedataarereliable• Thedataareidentifiedinauniqueandpersistentwaysothattheycanbe

referredto

Page 16: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

FAIR Data Principles

Workshop Leiden 2014: minimal set of community agreed guiding principles to make data more easily discoverable, accessible, appropriately integrated and re-usable, and adequately citable.

FAIR principles:• Findable• Accessible• Interoperable• Reusable

(both for machines and for people)

Page 17: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

FAIR principlesIn the FAIR approach, data should be:

1. Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets;

2. Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content;

3. Interoperable – Ready to be combined with other datasets by humans as well as computer systems;

4. Reusable – Ready to be used for future research and to be processed further using computational methods.

Source: http://www.dtls.nl/fair-data/

Page 18: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Paper in: http://www.nature.com/articles/sdata201618

Page 19: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Everybody loves FAIR!

EverybodywantstobeFAIR…Butwhatdoesthatmean?Howtoputtheprinciplesintopractice?

Page 20: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Implementing the FAIR Principles?

See:http://datafairport.org/fair-principles-living-document-menu andhttps://www.force11.org/group/fairgroup/fairprinciples

Page 21: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Different implementations for different aims?

1. FAIR data management: posing requirements for new data creation

2. FAIR data assessment: establishing the profile of existing data

3. FAIR data technologies: transformation tools to make data FAIR

Creation Assessment Transformation

Page 22: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Creation: New Horizon 2020 guidelines on FAIR Data Management

Section 2. FAIR data1. Making data findable, including provisions for

metadata (5 questions)2. Making data openly accessible (10 questions)3. Making data interoperable (4 questions)4. Increase data re-use (through clarifying

licenses - 4 questions)

Additional sections:1. Data summary (6 questions, 5 of which also cover aspects of FAIRness)3. Allocation of resources (4 questions)4. Data security (2 questions)5. Ethical aspects (2 questions)6. Other issues (2 questions)

Total of 23 + 16 = 39 questions!!

Page 23: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

This may result in:

FAIRDMP

Page 24: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Assess: Resemblance DSA – FAIR principles

DSAPrinciples(fordatarepositories) FAIRPrinciples(fordatasets)

datacanbe foundontheinternet Findable

dataareaccessible Accessible

dataareinausableformat Interoperable

dataarereliable Reusable

datacanbereferred to (citable)

Theresemblanceisnotperfect:• usableformat(DSA)isanaspectofinteroperability(FAIR)• reliability(DSA)isaconditionforreuse(FAIR)• FAIRexplicitlyaddressesmachinereadability• citabilityisinFAIRanaspectoffindability

Page 25: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Combine and operationalize: DSA & FAIR

• Growing demand for quality criteria for research datasets and a way to assess their fitness for use

• Combine the principles of core repository certification and FAIR

• Use the principles as quality criteria:• Core certification – digital repositories• FAIR – research data (sets)

• Operationalize the principles to make them easily implementable in any trustworthy digital repository

Page 26: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

How could it work?• Consider F, A and I as separate dimensions of

data quality • Interaction effects among dimensions

complicate scoring, and so do elements that occur under different FAIR dimensions

• Score datasets on each dimension (from 1 to 5)• Consider Reusability as the resultant of the

other three:• Consider R, the average FAIRness as an

indicator of data quality• (F + A + I) / 3 = R

• Make scoring as automatic as possible, although not all principles can be established objectively

• scoring at ingest by data archivists of TDR• after reuse by data users (community review)

Page 27: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Each dataset a FAIR profileExamples:• Dataset X has FAIR profile F4-A3-I2 è R=3

• PID with limited metadata (well findable; 4)• Accessible with some restrictions (3)• Fairly low interoperability (2)

• Dataset Y has FAIR profile F5-A2-I3 è R=3,3• PID and extensive metadata (very easy to find; 5)• Only metadata accessible (2)• Average rating on interoperability (3)

Numbers of assessments, reviews and downloads indicated as well

Page 28: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

By the way, alternative visualisation ideas…

FAIR dice suggested byHerbert van de Sompel

FAIR letters suggested byIan Duncan

FAIR leaves suggested byWouter Haak

FAIR clover 1

FAIR clover 2FAIR sunflower

Page 29: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Operationalising F - Findable

Findable - defined by metadata, documentation (and identifier for citation):1. No PID and no metadata/documentation2. PID without or with insufficient* metadata3. Sufficient* metadata without PID4. PID with sufficient* metadata

– Information on data provenance5. PID, rich metadata and additional documentation

– Additional explanation of how data can be used

* Sufficient = enough metadata to understand what the data is about

Page 30: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Operationalising F - Findable

Findable - defined by metadata, documentation (and identifier for citation):1. No PID and no metadata/documentation2. PID without or with insufficient* metadata3. Sufficient* metadata without PID4. PID with sufficient* metadata

– Information on data provenance5. PID, rich metadata and additional documentation

– Additional explanation of how data can be used

* Sufficient = enough metadata to understand what the data is about

Page 31: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Operationalising A - Accessible

Accessible - defined by presence of a user license [metadata retrievable by identifier: already included under F]:1. No user license / unclear conditions of reuse / metadata nor

data are accessible2. Metadata are accessible (even when the data are not or no

longer available)3. User restrictions apply (of any kind, including privacy,

commercial interests, embargo period, etc.)4. Public Access (after registration)5. Open Access (unrestricted, CC0 – perhaps also CCby?)

Note 1: Some people want “Openness” to be separate from AccessibleNote 2: A3 could be seen as an acceptable threshold (e.g. by funders)

Page 32: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Operationalising I - Interoperable

Interoperable - defined by the data format:1. Proprietary, non-open format data2. Proprietary format, accepted by DSA

Certified Trusted Data Repository3. Non-proprietary, open format (= “preferred”

or “archival” format)4. Data is additionally harmonized/

standardized, using standard vocabularies 5. Data is additionally linked to other data to

provide context

Note: this is an adaptation of Tim Berners-Lee’s 5-star open data plan!

Page 33: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

First we attempted to operationalise R –Reusable as well… but we changed our mind

Reusable – is it a separate dimension? Partly subjective: it depends on what you want to use the data for!

Idea for operationalization Why werejected itClear provenance of data (to facilitate both replication and reuse)

Aspect of F4

Data is in a TDR – unsustained data will not remain usable

Aspect of Repository àData Seal of Approval

Explication on how data was or can be used is available

Aspect of F5

Data automatically usable by machines Aspect of I5

Data is reliable (replicable) Can only be known after re-analysis

Page 34: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

What it would look like in the DANS EASY archive

Page 35: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Or in Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, … (if they comply with the DSA)

Page 36: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Or in Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, … (if they comply with the DSA)

Page 37: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Towards a FAIR Data Assessment Tool

• Independent website like the DSA-website• Repositories will link to the FAIR assessment

website• The website will provide:

Ø Assessment tool (“questionnaire” with explanation andexamples for each criterion)

Ø Online database containing:o Repository holding the dataseto PID (+ basic metadata such as name) of dataseto Reviewer info (ID can be withheld – anonymous

reviews should be possible)o FAIR profile and scores

Ø Analytics of FAIR profiles

Page 38: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Can FAIR Data Assessment be automatic?Criterion Automatic?

Y/N/SemiSubjective?Y/N/Semi

Comments

F1 NoPID/NoMetadata Y N

F2 PID/Insuff.Metadata S S Insufficient metadataissubjective

F3 NoPID/Suff.Metadata S S Sufficient metadataissubjective

F4 PID/Sufficient Metadata S S Sufficient metadataissubjective

F5 PID/Rich Metadata S S Rich metadataissubjective

A1 NoLicense/NoAccess Y N

A2 MetadataAccessible Y N

A3 UserRestrictions Y N

A4 PublicAccess Y N

A5 OpenAccess Y N

I1 Proprietary Format S N Depends onlistofproprietary formats

I2 Accepted Format S S Dependsonlistofaccepted formats

I3 Archival Format S S Dependsonlistofarchival formats

I4 +Harmonized N S Dependsondomainvocabularies

I5 +Linked S N Dependsonsemantic methods used

Optional: qualitative assessment / data review

Page 39: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Main takeaways • The core certification requirements and the FAIR

principles form a perfect couple for quality assessment of research data and trustworthy data repositories

• DANS is developing an operationalization of these principles:

• Data archive staff will assess FAIRness of data upon ingest

• Data users will assess FAIRness upon reuse• Scoring mechanism as automatic as possible

• Ideally: all certified trustworthy repositories should contain FAIR data

• But: the FAIR scoring mechanism is applicable in any repository

Page 40: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Hands-on FAIR data management: IDCC workshops

12th International Digital Curation Conference, Edinburgh, 20-23 February 2017

How EUDAT Services could support FAIR data (workshop 4)… In this workshop we relate the FAIR principles to EUDAT’s Service Suite. You will experiment with community metadata and with an annotation tool for curating and retrieving data. Community representatives will talk about their experiences...

OpenAIRE services and tools for Open Research Data in H2020 (workshop 6)… The workshop will focus on practical issues of storage and data sharing aspects, on how toselect an appropriate data repository, on guidance for efficient data management plans, on themonitoring of contextualized (linked) research…

Essentials 4 Data Support: the Train-the Trainer version (workshop 9)… You learn how Research Data Netherlands made the popular ‘Essentials 4 Data Support’ training with rich online content, weekly assignments and expert speakers… to practice hands-on writing DMPs and making a data management stakeholder analysis…

http://www.dcc.ac.uk/events/idcc17For more information and registration see:

Page 41: FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016

Thank you for listening

[email protected]@dans.knaw.nlwww.dans.knaw.nl