Enriching Scholarship
May 6 2014
Natsuko Nicholls UM Libraries
Elizabeth Moss ICPSR
NIH (2003) Data Sharing Policy that all funding applications of $500000 or more per year are expected to address data-sharing in their application
NSF (2011) All funding proposals submitted on or after January 18 2011 must include a ldquoData Management Planrdquo describing how the proposal will conform to NSF policy on the dissemination and sharing of research results
US Federal Funding Mandates
International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo
2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo
Journal Mandates
Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found
As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo
Questions
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
NIH (2003) Data Sharing Policy that all funding applications of $500000 or more per year are expected to address data-sharing in their application
NSF (2011) All funding proposals submitted on or after January 18 2011 must include a ldquoData Management Planrdquo describing how the proposal will conform to NSF policy on the dissemination and sharing of research results
US Federal Funding Mandates
International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo
2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo
Journal Mandates
Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found
As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo
Questions
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
International Mandates Aug 2011hellip ldquoexpectation that all our funded researchers should maximise access to their research data with as few restrictions as possible hellip submit a data management and sharing plan as part of the application processrdquo
2007hellip ldquoResearchers are to retain research data and primary materials manage storage of research data and primary materials maintain confidentiality of research data and primary materialsrdquo
Journal Mandates
Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found
As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo
Questions
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Journal Mandates
Dec 2013 ldquoWe ask you to make available the data underlying the findings in the paper which would be needed by someone wishing to understand validate or replicate the work Our policy has not changed in this regard What has changed is that we now ask you to say where the data can be found
As the PLOS data policy applies to all fields in which we publish we recognize that wersquoll need to work closely with authors in some subject areas to ensure adherence to the new policy Some fields have very well established standards and practices around data while others are still evolving and we would like to work with any field that is developing data standards We are aiming to ensure transparency about data availabilityrdquo
Questions
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Questions
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Paradigm Shift
The nature of research has becomehellip More quantitativedata-intensive
More funder-driven
More interdisciplinarycollaborative
More transparent
More complicated in terms of cross-linking
More diverse in terms of citable scholarly outputs
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
The focus of scholarly communication
has changedhellip From
Preserve publications
Preserve data
Preserve both (at least separately)
To
Preserve publications and data lsquotogetherrsquo
Preserve the lsquorelationshipsrsquo among them
Paradigm Shift
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Publishing and Archiving Scholarly
Communication
Availability Citability Validation
Scholarly Publishing Data Archiving
Scholarly Publishing that includes lsquoData Publicationrsquo
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Dissemination Methods Indicated in DMPs Written by UM Engineering Faculty
journal publication
42
faculty project website
36
conference presentation
11
upon request 11
NSF Engineering Data Management Plan Analysis N=156
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Dissemination Methods Submitted with journal article
Appear in journal article upon publication
Supplemental materials (including codebooks)
Websites (priorpost publication)
Institutional repositories (priorpost publication)
Data archive per disciplinersquos culture of sharing
Data repository (may be assigned by journal publishers)
Data papers in data journals (may be independent of the journal article)
ldquoData upon requestrdquo via email (someall)
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Repository Directory Lists IR
OpenDOAR (over 2600 academic open access repositories listed)
Deep Blue (University of Michigan Library)
DR NIH Data Sharing Repositories (57 repositories)
Thomson Reuters Data Citation Index (174 repositories)
Databib (975 repositories listed)
re3Dataorg (609 repositories listed)
DataCite re3dataorg and Databib announced collaboration towards one service under the auspices of DataCite by 2015
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Disciplinary Data Repositories What to Look for SubjectDiscipline focus
Hosted byhellip
Access to data open vs restricted
Deposit of data open vs restricted
Deposit fee
Persistent identifiers (DOI hdl)
Sustainability amp preservation policy
(Non-) Proprietary file formats
Amount of data descriptionmetadata
(data package level file level data item level)
Associated codesoftware
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
More on Persistent IDs A DOI is a system for persistently identifying and locating digital objects
Originally designed and developed for ldquojournal articlesrdquo ISO 26324 since 2012
DOI can be assigned by only DOI registration agencies eg DataCite CrossRef
Assigning DOI is not free (eg Costing ~$1 per DOI via CrossRef in 2013)
DOI prefix + suffix
bull eg DOI for a dataset httpdoiorg103886ICPSR27282v1
DOI prefix is unique to each publisherrepository
bull ICPSR 103886
bull UK Data Service 105255
bull Figshare 106084
bull PANGAEA 101594
bull Dyad 105061
Very similar to lsquohandlesrsquo in terms of persistency
bull eg U of M IR Deep Blue eg httphdlhandlenet202742106575
Moving towards ldquoData with DOIrdquo just as any scholarly articles
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Repositories
Letrsquos take a closer look at this example
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Papers Going beyond Appendices and Supplements
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Journals Number of lsquoData Journalsrsquo
As of today 70+ data journals
Journal host
a) Authors
b) Journals
c) Publisher data repositories
d) Data repositories (IRDR)
Data journal article structure
a) IntroOverview
b) Methods
c) Dataset description
d) Reuse potential
Source K Akers and J Green Data Sharing and Publication Presented at the Cyberinfrastructure (CI) Days Event University of Michigan Ann Arbor MI November 13-14 2013
UP
Note To see a full list of data journals that currently exist see K Akersrsquo blog post at httpmlibrarydatawordpresscom20140509data-journals
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Journal Example Geoscience Data Journal by Wiley
Launched in Fall 2012 Published on behalf of Royal Meteorological Society OA with author-pay model ($1500 per article) Publishes short data papers cross-linked to (and citing)
datasets that have been deposited in approved data centersrepositories and awarded DOIs
A data article describes a dataset giving details of its collection processing file formats etc but does not go into detail of any scientific analysis of the dataset or draw conclusions from that data
The data paper should allow the reader to understand the when why and how the data was collected and what the data is
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Journal Example (continued) Data centersrepositories approved by Geoscience Data Journal
3TUDatacentrum British Atmospheric Data Centre (BADC) British Oceanographic Data Centre (BODC) CISL Research Data Archive CSIRO Data Access Portal Environmental Information Data Centre (EIDC) Figshare IEDAEarthChem IEDAMGDS National Center for Atmospheric Research (NCAR) USA Earth Observing Lab (EOL) observational and supporting data from atmospheric science field
experiments and arctic research Research Data Archive (RDA) reference datasets for weather and climate research National Geoscience Data Centre (NGDC) NERC Earth Observation Data Centre (NEODC) NOAA National Climatic Data Center (NCDC) NOAA National Oceanographic Data Center (NODC) NOAA National Geophysical Data Center (NGDC) PANGAEA Polar Data Centre (PDC) Zenodo
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Journal Example (continued)
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Publisher Examples
Wiley
Geoscience Data Journal
Ubiquity Press
Journal of Open Archaeology Data
Journal of Open Psychology Data
Open Health Data
Journal of Open Research Software
Nature
Scientific Data
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Journal Examples (to name only a few) Some Feature Comparison
Publisher Journal OA Publication
Fee per Article Publisher
hosts data
Approved data center
repositories recommended
for data deposit
How is the article called
DOI
Wiley Geoscience
Data Journal Yes $1500 No Yes lsquoData Paperrsquo Yes
Ubiquity
Press
Open
Archeology
Data
Yes $40 No Yes lsquoData Paperrsquo Yes
Nature
Publishing
Group
Scientific
Data Yes $700 No Yes lsquoData Descriptorrsquo Yes
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Located on U of M Campus
wwwicpsrumichedu ICPSR Inter-university Consortium for Political and Social Research
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Signs of a Trusted Repository A unit of ISR ICPSR is governed by a Counsel representing
over 700 member institutions including U of M Long-term sustainability ldquopublishingrdquo data for 52 years Largest social science data repository in US with a catalog
of over 8000 studies containing thousands of files Awarded the Data Seal of Approval from DANS Federal agenciesrsquo archives are housed at ICPSR and fully
integrated with ICPSRrsquos collection Data preservation standards followed for data long-term
guarding against deterioration accidental loss and digital obsolescence
Data are screened for confidentiality and privacy concerns Stringent protections are in place for securing and distributing sensitive data
Physical and virtual data enclaves for analyzing restricted-use data
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Rich Metadata for Better Access Discovery Context and Reuse ICPSR formats organizes and enhances deposited raw
research data with meaningful metadata and documentation to make it complete self-explanatory and usable for future researchers
Study metadata and codebooks are generated according to the Data Documentation Initiative (DDI) XML standard
Search and filter online catalog with fielded metadata records to enhance discovery side-by-side comparison using structured variable-level documentation in XML tagged according to the DDI standard
All studies are registered with a unique identifiermdashDOIs from DataCite ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Replication Datasets
httpwwwicpsrumicheduicpsrwebdepositpraindexjsp
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Open Sharing for DMP Proposals
httpopenicpsrorg
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Top 10 Data Downloads (last six months) (non-anonymous distinct users downloading one or more files)
Title Archive Downloads
National Longitudinal Study of Adolescent Health (Add Health) 1994-2008
DSDR 1188
General Social Survey 1972-2012 [Cumulative File] ICPSR 737
Chinese Household Income Project 2002 DSDR 720
India Human Development Survey (IHDS) 2005 SAMHDA 445
Collaborative Psychiatric Epidemiology Surveys (CPES) 2001-2003 [United States]
CPES 407
National Survey on Drug Use and Health 2012 SAMHDA 314
Children of Immigrants Longitudinal Study (CILS) 1991-2006 DSDR 289
National Crime Victimization Survey 2012 NACJD 260
National Prisoner Statistics 1978-2011 NACJD 249
Historical Demographic Economic and Social Data The United States 1790-2002
ICPSR 245
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Who uses these shared data How are they used With what impact
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
The ICPSR Bibliography of Data-related Literature
Link research data to the scholarly literature about it
Aid students instructors researchers and funders to
discover and understand data use
A searchable database currently containing over 65000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
It generates study bibliographies linking each study with the literature about it and out to the full text
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Linking the Data to the Literature
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Altmetrics for research data
Easier to access and analyze much more research data online
New focus on sharing that research data
Increasing use of social media to discuss via tweets likes and blog posts
More online tools to download collaborate and share like Mendeley Figshare SlideShare Dryad and ResearchGate DeepBlue openICPSR
Dependent on good citation practice
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Publishers Springer
Elsevier
Wiley
Cambridge Journals
BMJ Journals
Nature Publish Group
PLoS
Altmetrics Aggregators bull Altmetric
bull ImpactStory
bull Plum Analytics
Funders bull NSF
bull Sloan Foundation
bull MacMillan
bull EBSCO
The Alfred P Sloan Foundation helps fund ImpactStory and is now funding the National Information Standards Organization (NISO) to develop standards and recommended best practices for altmetrics
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Impact Story Product-level Metric
ldquoNew ways to measure the research impact of emerging products like blog posts datasets and software to build a new scholarly reward system that values and encourages web-native scholarshiprdquo
Open metrics with context using diverse products
to provide researchers with a ldquocomprehensive impact reportrdquo of their research output
Source httpsimpactstoryorgabout
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Artifact-level Metric
Source httpwwwplumanalyticscommetricshtml
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Integration with Web of Science All Databases Research data is equal to research literature
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Articles linked to underlying data Increased data discovery Reward for data citation Potential for automated tracking
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Elsevier Connect
ldquoElsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect As part of the Article of the Future project this reciprocal linking aims to expand the availability of research data and improve the researcher workflowrdquo
ldquoElsevier encourages authors to submit their data sets to
external repositories But not all authors know how or where to submit their data and not all authors are aware of the possibilities that data linking offers The recent agreement with Dryad Digital Repository marked the 35th
data linking partnership Elsevier has established rdquo
Source httpwwwelseviercomconnectbringing-data-to-life-with-data-linking
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Source httpwwwslidesharenetElsevierConnectcolumbia-27feb13v2ext
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
For Better Metrics on Research Data Impact Need more aggregator and repository data to be
exposed for altmetric harvesters like ImpactStory
More integrated efforts among libraries publishers archives and funders For example The Data Conservancy IEEE and Portico receive
Alfred P Sloan Foundation grant to connect publications and their linked data
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Formal Citation in the References with the DOI
doi103886ICPSR21240
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
httpwwwflickrcomphotospapertrix38028138
Some Challenges
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
No Common Practice of Formal Data Citation Abstract
Acknowledgements
Charts and Tables
Appendices
Discussion
Footnotes
Sample
Methods
References
Without an explicit citation reader must infer or be out of luck
No attributionmdashno credit
No accessmdashno reuse
No discernible impact
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Examples of Bad Data Citation Poorly described and cited data
+
Excessive human search effort extensive collection knowledge
=
Too costly too questionable for confident measure of impact
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Examples of Good Data Citation Formal data
Citing with
a DOI
+
Minimal human search effort
=
High hit accuracy for the cost and better confidence of impact measures
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Basic Data Citation Format
Creator (Year) Title Publisher Identifier (For datasets that have DOIs DataCite and CrossRef provide a citation formatter to generate a citation in various journal styles)
Core Elements
Creator(s) Individual(s) or organization responsible for creating the dataset Year Year the dataset was published not necessarily created Title Should be as descriptive as possible Publisher Organization that provides access to the dataset (eg Dryad Zenodo) Identifier Persistent unique identifier (eg a DOI)
Source httpdatapubcdliborgdatacitation
How to Cite Data
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Additional Elements Location Availability The web address of the dataset is essential when the identifier canrsquot be used to reach the dataset
Version Edition Version of the dataset used in the present publication Needed to reproduce analysis of versioned dynamic datasets
Access Date Date of access for analysis in the present publication Needed to reproduce analysis of continuously updated dynamic datasets
Format Material Designator eg database CD-ROM
Feature Name A description of the subset of the dataset used May be a formal title or a list of variables (eg concentration optical density)
Verifier Used to confirm that two datasets are identical Most commonly a UNF or MD5 checksum
Series Used if the dataset is part of series of releases (eg monthly)
Contributor eg editor compiler
Source httpdatapubcdliborgdatacitation
How to Cite Data
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Data Citation Examples Deschenes Elizabeth Piper Susan Turner and Joan Petersilia Intensive Community Supervision in Minnesota 1990-1992 A Dual Experiment in Prison Diversion and Enhanced Supervised Release ICPSR06849-v1 Ann Arbor MI Inter-university Consortium for Political and Social Research [distributor] 2000 doi103886ICPSR06849v1 Esther Duflo Rohini Pande 2006 Dams Poverty Public Goods and Malaria Incidence in India httphdlhandlenet19021IOJHHXOOLZ UNF5obNHHq1gtV400a4T+Xrp9g== Murray Research Archive [Distributor] V2 [Version] Sidlauskas B (2007) Data from Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny a case study from characiform fishes Dryad Digital Repository doi105061dryad20
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Joint Declaration of Data Citation Principles
1 Future Of Research Communication and E-Scholarship (FORCE11)
2 Committee on Data for Science and Technology (CODATA)
3 Digital Curation Centre (DCC)
Source httpswwwforce11orgdatacitation
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Eight Principles 1 Importance--Data should be considered
legitimate citable products of research Data citations should be accorded the same importance in the scholarly record as citations of other research objects such as publications
2Credit and Attribution--Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data recognizing that a single style or mechanism of attribution may not be applicable to all data
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Eight Principles
3 EvidencemdashIn scholarly literature whenever and wherever a claim relies upon data the corresponding data should be cited
4 Unique IdentificationmdashA data citation should include a persistent method for identification that is machine actionable globally unique and widely used by a community
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Eight Principles
5 AccessmdashData citations should facilitate access to the data themselves and to such associated metadata documentation code and other materials as are necessary for both humans and machines to make informed use of the referenced data
6PersistencemdashUnique identifiers and metadata describing the data and its disposition should persist -- even beyond the lifespan of the data they describe
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Eight Principles
7 Specificity and VerifiabilitymdashData citations should facilitate identification of access to and verification of the specific data that support a claim
Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice version andor granular portion of data retrieved subsequently is the same as was originally cited
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Eight Principles
8 Interoperability and flexibilitymdashData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Make Your Data Count
If itrsquos not cited it canrsquot be counted
Without counting data use there is no accurate way to measure the impact of your shared data
Without a well-formed citation your data cannot take advantage of the potential of linked scholarly publishing
Store your data where citations are unique and persistent
Cite your own data and othersrsquo in your publications
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Questions Answered
Sharing datamdashhow does it happen
What is data publishing
Is data archiving the same
How can we find data access it and reuse it How can we measure the impact of sharing data
Whatrsquos the common denominator
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Thank you
Natsuko Nicholls
hayashinumichedu
Elizabeth Moss
eammossumichedu
Top Related