Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

75
Enriching Scholarship May 6, 2014 Natsuko Nicholls, UM Libraries Elizabeth Moss, ICPSR

description

 

Transcript of Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Page 1: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Enriching Scholarship

May 6 2014

Natsuko Nicholls UM Libraries

Elizabeth Moss ICPSR

NIH (2003) Data Sharing Policy that all funding applications of $500000 or more per year are expected to address data-sharing in their application

NSF (2011) All funding proposals submitted on or after January 18 2011 must include a ldquoData Management Planrdquo describing how the proposal will conform to NSF policy on the dissemination and sharing of research results

US Federal Funding Mandates

International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo

2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo

Journal Mandates

Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found

As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo

Questions

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 2: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

NIH (2003) Data Sharing Policy that all funding applications of $500000 or more per year are expected to address data-sharing in their application

NSF (2011) All funding proposals submitted on or after January 18 2011 must include a ldquoData Management Planrdquo describing how the proposal will conform to NSF policy on the dissemination and sharing of research results

US Federal Funding Mandates

International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo

2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo

Journal Mandates

Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found

As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo

Questions

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 3: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo

2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo

Journal Mandates

Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found

As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo

Questions

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 4: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Journal Mandates

Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found

As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo

Questions

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 5: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Questions

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 6: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Paradigm Shift

The nature of research has becomehellip More quantitativedata-intensive

More funder-driven

More interdisciplinarycollaborative

More transparent

More complicated in terms of cross-linking

More diverse in terms of citable scholarly outputs

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 7: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

The focus of scholarly communication

has changedhellip From

Preserve publications

Preserve data

Preserve both (at least separately)

To

Preserve publications and data lsquotogetherrsquo

Preserve the lsquorelationshipsrsquo among them

Paradigm Shift

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 8: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Publishing and Archiving Scholarly

Communication

Availability Citability Validation

Scholarly Publishing Data Archiving

Scholarly Publishing that includes lsquoData Publicationrsquo

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 9: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty

journal publication

42

faculty project website

36

conference presentation

11

upon request 11

NSF Engineering Data Management Plan Analysis N=156

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 10: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Dissemination Methods Submitted with journal article

Appear in journal article upon publication

Supplemental materials (including codebooks)

Websites (priorpost publication)

Institutional repositories (priorpost publication)

Data archive per disciplinersquos culture of sharing

Data repository (may be assigned by journal publishers)

Data papers in data journals (may be independent of the journal article)

ldquoData upon requestrdquo via email (someall)

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 11: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Repository Directory Lists IR

OpenDOAR (over 2600 academic open access repositories listed)

Deep Blue (University of Michigan Library)

DR NIH Data Sharing Repositories (57 repositories)

Thomson Reuters Data Citation Index (174 repositories)

Databib (975 repositories listed)

re3Dataorg (609 repositories listed)

DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 12: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Disciplinary Data Repositories What to Look for SubjectDiscipline focus

Hosted byhellip

Access to data open vs restricted

Deposit of data open vs restricted

Deposit fee

Persistent identifiers (DOI hdl)

Sustainability amp preservation policy

(Non-) Proprietary file formats

Amount of data descriptionmetadata

(data package level file level data item level)

Associated codesoftware

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 13: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects

Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012

DOI can be assigned by only DOI registration agencies eg DataCite CrossRef

Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)

DOI prefix + suffix

bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1

DOI prefix is unique to each publisherrepository

bull ICPSR 103886

bull UK Data Service 105255

bull Figshare 106084

bull PANGAEA 101594

bull Dyad 105061

Very similar to lsquohandlesrsquo in terms of persistency

bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575

Moving towards ldquoData with DOIrdquo just as any scholarly articles

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 14: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Repositories

Letrsquos take a closer look at this example

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 15: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Papers Going beyond Appendices and Supplements

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 16: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Journals Number of lsquoData Journalsrsquo

As of today 70+ data journals

Journal host

a) Authors

b) Journals

c) Publisher data repositories

d) Data repositories (IRDR)

Data journal article structure

a) IntroOverview

b) Methods

c) Dataset description

d) Reuse potential

Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013

UP

Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 17: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Journal Example Geoscience Data Journal by Wiley

Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)

datasets that have been deposited in approved data centersrepositories and awarded DOIs

A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data

The data paper should allow the reader to understand the when why and how the data was collected and what the data is

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 18: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal

3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field

experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 19: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Journal Example (continued)

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 20: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Publisher Examples

Wiley

Geoscience Data Journal

Ubiquity Press

Journal of Open Archaeology Data

Journal of Open Psychology Data

Open Health Data

Journal of Open Research Software

Nature

Scientific Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 21: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Journal Examples (to name only a few) Some Feature Comparison

Publisher Journal OA Publication

Fee per Article Publisher

hosts data

Approved data center

repositories recommended

for data deposit

How is the article called

DOI

Wiley Geoscience

Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes

Ubiquity

Press

Open

Archeology

Data

Yes $40 No Yes lsquoData Paperrsquo Yes

Nature

Publishing

Group

Scientific

Data Yes $700 No Yes lsquoData Descriptorrsquo Yes

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 22: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Located on U of M Campus

wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 23: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing

over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog

of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully

integrated with ICPSRrsquos collection Data preservation standards followed for data long-term

guarding against deterioration accidental loss and digital obsolescence

Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data

Physical and virtual data enclaves for analyzing restricted-use data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 24: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw

research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers

Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard

Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard

All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 25: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Replication Datasets

httpwwwicpsrumicheduicpsrwebdepositpraindexjsp

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 26: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Open Sharing for DMP Proposals

httpopenicpsrorg

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 27: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)

Title Archive Downloads

National Longitudinal Study of Adolescent Health (Add Health) 1994-2008

DSDR 1188

General Social Survey 1972-2012 [Cumulative File] ICPSR 737

Chinese Household Income Project 2002 DSDR 720

India Human Development Survey (IHDS) 2005 SAMHDA 445

Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]

CPES 407

National Survey on Drug Use and Health 2012 SAMHDA 314

Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289

National Crime Victimization Survey 2012 NACJD 260

National Prisoner Statistics 1978-2011 NACJD 249

Historical Demographic Economic and Social Data The United States 1790-2002

ICPSR 245

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 28: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Who uses these shared data How are they used With what impact

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 29: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

The ICPSR Bibliography of Data-related Literature

Link research data to the scholarly literature about it

Aid students instructors researchers and funders to

discover and understand data use

A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

It generates study bibliographies linking each study with the literature about it and out to the full text

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 30: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Linking the Data to the Literature

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 31: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Altmetrics for research data

Easier to access and analyze much more research data online

New focus on sharing that research data

Increasing use of social media to discuss via tweets likes and blog posts

More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR

Dependent on good citation practice

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 32: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Publishers Springer

Elsevier

Wiley

Cambridge Journals

BMJ Journals

Nature Publish Group

PLoS

Altmetrics Aggregators bull Altmetric

bull ImpactStory

bull Plum Analytics

Funders bull NSF

bull Sloan Foundation

bull MacMillan

bull EBSCO

The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 33: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Impact Story Product-level Metric

ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo

Open metrics with context using diverse products

to provide researchers with a ldquocomprehensive impact reportrdquo of their research output

Source httpsimpactstoryorgabout

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 34: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Artifact-level Metric

Source httpwwwplumanalyticscommetricshtml

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 35: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Integration with Web of Science All Databases Research data is equal to research literature

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 36: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 37: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Elsevier Connect

ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo

ldquoElsevier encourages authors to submit their data sets to

external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th

data linking partnership Elsevier has established rdquo

Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 38: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 39: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

For Better Metrics on Research Data Impact Need more aggregator and repository data to be

exposed for altmetric harvesters like ImpactStory

More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive

Alfred P Sloan Foundation grant to connect publications and their linked data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 40: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Formal Citation in the References with the DOI

doi103886ICPSR21240

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 41: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

httpwwwflickrcomphotospapertrix38028138

Some Challenges

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 42: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

No Common Practice of Formal Data Citation Abstract

Acknowledgements

Charts and Tables

Appendices

Discussion

Footnotes

Sample

Methods

References

Without an explicit citation reader must infer or be out of luck

No attributionmdashno credit

No accessmdashno reuse

No discernible impact

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 43: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Examples of Bad Data Citation Poorly described and cited data

+

Excessive human search effort extensive collection knowledge

=

Too costly too questionable for confident measure of impact

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 44: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Examples of Good Data Citation Formal data

Citing with

a DOI

+

Minimal human search effort

=

High hit accuracy for the cost and better confidence of impact measures

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 45: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Basic Data Citation Format

Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)

Core Elements

Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)

Source httpdatapubcdliborgdatacitation

How to Cite Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 46: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset

Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets

Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets

Format Material Designator eg database CD-ROM

Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)

Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum

Series Used if the dataset is part of series of releases (eg monthly)

Contributor eg editor compiler

Source httpdatapubcdliborgdatacitation

How to Cite Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 47: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 48: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Joint Declaration of Data Citation Principles

1 Future Of Research Communication and E-Scholarship (FORCE11)

2 Committee on Data for Science and Technology (CODATA)

3 Digital Curation Centre (DCC)

Source httpswwwforce11orgdatacitation

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 49: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Eight Principles 1 Importance--Data should be considered

legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications

2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 50: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Eight Principles

3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited

4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 51: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Eight Principles

5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data

6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 52: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Eight Principles

7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim

Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 53: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Eight Principles

8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 54: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Make Your Data Count

If itrsquos not cited it canrsquot be counted

Without counting data use there is no accurate way to measure the impact of your shared data

Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing

Store your data where citations are unique and persistent

Cite your own data and othersrsquo in your publications

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 55: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Questions Answered

Sharing datamdashhow does it happen

What is data publishing

Is data archiving the same

How can we find data access it and reuse it How can we measure the impact of sharing data

Whatrsquos the common denominator

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu

Page 56: Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing the Underlying Data

Thank you

Natsuko Nicholls

hayashinumichedu

Elizabeth Moss

eammossumichedu