ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director,...

33
ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences -- Bonn, Germany February 1, 2011

Transcript of ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director,...

Page 1: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

ICPSR’s Approach to Data Citation and Persistent Identifiers

Mary VardiganAssistant Director, ICPSR

Workshop on Persistent Identifiers in the Social Sciences -- Bonn, GermanyFebruary 1, 2011

Page 2: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Today’s Presentation

• ICPSR’s use of data citations and persistent identifiers

• Ways that ICPSR encourages good practice

• Issues to be resolved• Future directions

Page 3: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

ICPSR’s Use of Citations

• ICPSR has been providing citations to its data since 1990

• Citations based on “Cataloging Machine-Readable Data Files“ by Sue Dodd, American Library Association, 1982

Page 4: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

What Makes Up an ICPSR Citation?

• Content Creator/Principal Investigator• Title• Distributor [ICPSR]• Distribution place and date• ICPSR study number• Version number• Materials designation [Computer file]• DOI

Page 5: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Example

Schneider, Barbara, and Linda J Waite. The 500 Family Study [1998-2000: United States] [Computer file]. ICPSR04549-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-05-30. doi:10.3886/ICPSR04549

Page 6: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

ICPSR’s Use of DOIs

• ICPSR started assigning DOIs in 2008• DOIs apply at the study or collection

level (a study can have multiple datasets)

• DOIs are of the form: doi:10.3886/ICPSR04549

• DOIs resolve to the study homepage (metadata record)

Page 7: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

How ICPSR Obtains DOIs

• ICPSR uses the CrossRef service, “the official DOI® link registration agency for scholarly and professional publications”

• ICPSR pays a modest annual Publisher Fee (based on publishing revenues) and pays 6 cents per DOI

• To begin assigning DOIs, in 2008 sent CrossRef an XML file containing metadata on all ICPSR 7000+ studies

• Now get DOIs weekly

Page 8: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Weekly Process

• ICPSR runs script to create XML metadata in CrossRef format:– Contributors and their roles– Title– Publication date– Update date– Study number– DOI– URL

Page 9: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Weekly Process, continued

• ICPSR submits XML file to register new DOIs

• CrossRef sends email confirming the file is correct

• At that point, the DOI has an associated URL on the ICPSR Web site

Page 10: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Alternative Process

• Registration could happen in a script-driven manner through an API

• This would happen without human intervention

• ICPSR database could communicate with the CrossRef database with DOIs registered automatically

Page 11: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Requests for DOIs

• Journals are requiring that authors provide PIDs to data they analyzed for their articles

• Authors are coming to ICPSR for DOIs pre-publication, generally depositing data into the Publication-Related Archive

Page 12: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Encouraging Good Practice

• Bibliography of Data-Related Literature includes 60,000 citations to publications based on ICPSR data

• Two-way linking: Studies link to publications, Bibliography links back to studies

• Widely used DOIs for data would make searching for and harvesting related publications much easier

Page 13: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.
Page 14: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.
Page 15: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.
Page 16: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Making Citations and DOIs More Prominent

• ICPSR provides RIS export for data citations into bibliographic citation software

• ICPSR highlights the data citation and DOI in several places

Page 17: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.
Page 18: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

For each study

Page 19: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Working with Vendors to Promote Links to Data

• ICPSR has a project with Thomson Reuters to display data linkages in Web of Knowledge

• Full and summary records in Web of Knowledge will link to related data when appropriate

• ICPSR is providing a periodic data feed of datasets and related publications to TR

• TR is integrating data feeds from others including UK Data Archive

Page 20: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Influencing Journals

• On behalf of the Data-PASS partners, ICPSR wrote to professional associations in sociology, political science, and economics

• Letters urged them to raise the standards for data citations in their journals

• Professional associations are in a position to set standards for their members and for journal editors (including copy editors)

Page 21: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

More on Influencing Journals

• Approach was to point to the variety of ways that data were cited in specific journal issues

• The letter stressed the importance of citing data the same way that publications are cited and the value of persistent identifiers

• Organizations discussed the letters at recent national meetings

• American Sociological Review just revised its Notice to Contributors to reflect the importance of data citations and DOIs

Page 22: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Updating Citation Software

• ICPSR worked with EndNote (owned by Thomson Reuters) to ensure that data citations display correctly

• The result is that “Dataset” is now a Reference Type in EndNote.

• Zotero also needs adjustment for datasets

Page 23: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Working with the Community

• ICPSR has joined DataCite as an associate member

• ICPSR has joined ORCID – Open Researcher and Contributor ID. ORCID aims to create a central registry of unique identifiers for individual researchers

• ICPSR is heading up an IASSIST special interest group on data citation (SIGDC)

Page 24: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

IASSIST Session

• IASSIST SIGDC has proposed a session as part of a data citation track including DataCite:

• Tracking Data Reuse: Motivations, Methods, and Obstacles -- Heather Piwowar, NESCent, University of British Columbia

• Building Data Citations for Discovery – Hailey Mooney, Michigan State University, and Mark Newton, Purdue University

• ICPSR’s Efforts to Encourage Data Citation -- Elizabeth Moss, Inter-university Consortium for Political and Social Research (ICPSR)

• Reactor Panel from SIGDC

Page 25: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Issues to Resolve

• With the community, address situations when data resources have multiple distributors (and multiple DOIs)

• Implement versioning in DOIs• Address level of granularity for DOIs • Move to DataCite

Page 26: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Multiple DOIs for “Same” Data• Eurobarometer 72.2 (Nuclear Energy, Corruption,

Gender Equality, Healthcare, and Civil Protection)   DOI: doi:10.4232/1.10009 Principal Investigator: Antonis Papacostas Publication Agent: GESIS - Leibniz-Institut für Sozialwissenschaften

• Papacostas, Antonis. Eurobarometer 72.2: Nuclear Energy, Corruption, Gender Equality, Healthcare, and Civil Protection, September-October 2009 [Computer file]. ICPSR28186-v1. Cologne, Germany: GESIS/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-07-19. doi:10.3886/ICPSR28186

Page 27: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

From CrossRef’s Publisher Rules:

“CrossRef only registers DOIs for Definitive Works… but not for Duplicative Works, as defined in the CrossRef Glossary.  …CrossRef does not permit multiple DOIs to be assigned to certain closely related versions of a work… Where a CrossRef member has content which is substantially Duplicative of Definitive Works, the member must … retrieve the DOIs of Definitive Works for display in such substantially Duplicative Works and must link from the substantially Duplicative Works to the Definitive Works.”

Page 28: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

More on Multiple DOIs

• CrossRef policy oriented toward publications not data• Arrangement between ICPSR and GESIS is clear, but

there are other co-distributor relationships• How much of a problem is this and can we develop a

community solution?• Can we use the DataCite metadata kernel

(relationType) to specify relationships?• Would providing explanatory text and cross-

referencing DOIs in archives’ metadata records be useful?

Page 29: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Versioning and DOIs

• ICPSR has decided to add version numbers to its DOIs

• ICPSR may not have previous versions online• User will have to contact ICPSR for access• So far the number of users requesting older

versions has been very small

Page 30: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Level of Granularity for DOIs

• ICPSR’s current practice is to assign the DOI at the study level

• DOI resolves to the study homepage, which includes Version History detailing changes to all files in the collection

• Assigning dataset-level DOIs is a challenge because ICPSR has over 65,000 datasets

• ICPSR is undertaking a large project to revamp archival management and dataset-level DOIs will be integrated in the new infrastructure

Page 31: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Moving to DataCite for DOIs

• DataCite offers several advantages because of its focus on data

• Metadata kernel more robust and intended to describe data

• Community of trusted data centers is a shared goal

Page 32: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Future Directions

• Address situations when data resources have multiple distributors and multiple DOIs

• Approach other vendors including Google Scholar after TR service deployed

• Contact other professional associations and journals• Work with other data producers on providing visible

citations and DOIs and encouraging their use• Continue spreading the word about data citation

and persistent identifiers!

Page 33: ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences.

Thank you…

Questions?