Changing Cultures, Building Standards Linda Beebe Senior Director, PsycINFO.

36
Enhancing Access to Data in Scholarly Research Changing Cultures, Building Standards Linda Beebe Senior Director, PsycINFO

Transcript of Changing Cultures, Building Standards Linda Beebe Senior Director, PsycINFO.

Enhancing Access to Data in Scholarly Research

Changing Cultures, Building Standards

Linda BeebeSenior Director, PsycINFO

ICSTI Annual Meeting 2012

About 12 years ago Supplemental Materials emerged with a bang!

ICSTI Annual Meeting 2012

And authors and publishers did─◦ Text (extended methodology sections,

bibliographies, survey results, derivations. . .)◦ Tables and figures◦ Multimedia◦ Gene sequences, protein structures, chemical

compounds, structures, 3-D images◦ Computer programs—algorithms, code,

executables◦ Datasets—and raw research data

Technology allowed us to add almost anything outside the article. . .

ICSTI Annual Meeting 2012

No standards Very different cultures

and practices from one discipline to another

Inconsistent identifiers Poor metadata Lack of discovery tools Abuse of readers and

reviewers

We had rapid, unplanned growth.

ICSTI Annual Meeting 2012

Business Policies & Practices cover selecting, editing, hosting, assuring discoverability, referencing, packaging, maintaining links, providing context, and preserving.

Technical Recommendations emphasize metadata, persistent identifiers, preservation, packaging and exchange.

Bi-directional linking using DOIs, emphasis on persistent linking reliability.

Flexibility and simplicity to support either a simple approach or the most detailed and granular metadata.

Clear definitions of metadata elements. Attention to preservation and migration, including saving of

objects along the migration chain.

NISO-NFAIS Recommended Practices

Nearing Final

Publication

ICSTI Annual Meeting 2012

Following 2 slides from Howard Ratner good reminder of the growth

Borrowed with permission from his talk December 2011 STM Innovations meeting

Ideas generated by the STM Future Lab Committee.

Today the buzz is around raw data.

ICSTI Annual Meeting 2012

Important Topic #1: API Platforms*New Access to Content

NEW ACCESS TO CONTENT

*API-platforms for third party developers available at Elsevier, Springer, NPG, IEEE (search)Getting ready for launch:IoPP, T&F, CABIMany more expected to follow

Curiosity driven R&DGRANULARITY OF CONTENT

SEMANTICS

LET THE OUTSIDE WORLD IN

OUR CONTENT YOUR WAY

CREATE CROSS-PUBLISHER STANDARDS

Common metadata

Full text formats

HTML5

API PLATFORMS

XHTML

THIRD PARTY APPS

App store

LINKED DATA

LINKED OPEN DATA

RDF

MOBILE PRODUCTIVITYMULTI-DEVICE PRODUCTIVITY

Seamlessly linked platforms

M-commerce MOBILE

Transmedia itemsVoice Activation

ICSTI Annual Meeting 2012

Important Topic #2: Research DataNew Presentations for Re-use

RESEARCH DATA

DATA OBJECTS ARE FIRST CLASS RESEARCH OBJECTS

MAKE DATA INTERACTIVE

share the actual workflow of the researcher?

graphics represent data sets; how to open them up?

ACTIONABLE DATA

DATA CREATION

What formats do users want?COMMON STANDARDS

AUTHORING TOOLS

how to treat supplemental files to journals?

Guidelines for:-Reuse and sharing-Incentives and barriers-Editorial policies

Discoverability of data

BIG DATA

Deep Linking

REPOSITORIESDATACITE

Bibliographic tools

User behaviour

MendeleyCiteSeer

ColWizReadCUBE

Data journal

ICSTI Annual Meeting 2012

“Hard sciences” such as Physics and Chemistry—long history of handling supplemental material and requiring access to data.

Disciplines that study human subjects (psychology, sociology, health sciences)—far less likely to have such practices.

There is growing interest in standards and other support for data deposits and access.

Different Cultures & Practices

ICSTI Annual Meeting 2012

Study of Matter AAAS—must deposit

in approved repository.

ACS—must submit data and deposit.

AGU—must deposit data in approved repository

ASPB—must submit to journal.

Study of Humans APA—to date only

expected to supply for verification.

APS—no requirements

ASA—no requirements posted

AAA—no requirements posted

The Divide on Data Deposits

ICSTI Annual Meeting 2012

In the “softer” sciences, increased quantities of data are scattered on laptops, in file drawers, on the web—all in danger of being lost, even thrown away.

Question: how do we preserve these data and make them available for further research?

ICSTI Annual Meeting 2012

Actually, there are many questions

What constitutes

data?

What must the author do to it?

Who will maintain it?

What about confidentiali

ty?

How does one cite

data?

. . . And many more

ICSTI Annual Meeting 2012

Websters—factual information (as measurements or statistics) used as a basis for reasoning,discussion, or calculation.

Chaim Zins (2006)—statistical observations and other recordings or collections of evidence

NSF—any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor data streams, video, audio, algorithms, software, models and simulations, images, etc.

Altman & King—systematic compilation of measurements for machine reading; must be systematically organized and described

What constitutes data?

ICSTI Annual Meeting 2012

Report on Integration of Data and Publications, October 17, 2011. Susan Reilly, Wouter Schallier, Sabine Schrimpf, Eefke Smit, and Max Wilkinson. Retrieved 10/11/2012 from http://www.stm-assoc.org / 2011_12_5_ODE_Report_On_Integration_of_Data_and_Publications.pdf

ICSTI Annual Meeting 2012

Replication standard—sufficient information to enable a third party to to replicate with no additional information from the author (King 1995). So authors must— Provide clear metadata. Code consistently and list coding instructions. Explain how data were used. Provide all raw data. Organize data in a way that can be used by

others. Making data available requires a different

workflow and more work—but makes for a better scientist.

What must an author do?

ICSTI Annual Meeting 2012

Natural sciences, many options such as Crystallography, ChemStar, ChemSpider, PubChem, PANGAEA.

Life Sciences, Protein DataBank now one entity with data from former banks in US, Europe, and Japan. Also Dryad, National Biological Information Infrastructure.

Not so many options in Social Sciences.

Who will maintain the data?

ICSTI Annual Meeting 2012

Inter-university Consortium for Political and Social Research (ICPSR)

U of Michigan Data deposit and

management Publication-Related

Archive quickly available, but ICPSR does not process.

Institute for Quantitative Social Science (IQSS) Dataverse Network

Harvard Maintains dataverses

(individual repositories). Delivers formal

persistent citations.

Two options in Social Sciences

ICSTI Annual Meeting 2012

IQSS Dataverse Network terms and conditions (paraphrased): Agree not to use materials to obtain information

that could ID subjects in any way, produce links that could ID them or do anything that could constitute invasion of privacy or breach of confidentiality.

Also, will not download or use in any way prohibited by applicable law.

And will always include the bibliographic citation for the data in any publication that references the data.

What about confidentiality and attribution?

ICSTI Annual Meeting 2012

Like any citation, it must contain basic elements that identify the dataset as unique:Title, Author, Date, Version, Persistent Identifier

DataCite, the organization that manages DOIs for data, recommends Creator (Publication/Year): Title. Version. Publisher. ResourceType. Identifier.

Example: Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855

And how does one cite data?

ICSTI Annual Meeting 2012

How do we know the data have not changed? Altman & King (2007) advocated the

Universal Numeric Fingerprint (UNF)—a short fixed-length string of numbers and characters

Example: UNF: 3: ZNQRI1405389xOBffg?== in which the 3 is the version number, the suffix is the fingerprint. If that number changes, the set is a new version of the data.

Another issue—fixity. . .

ICSTI Annual Meeting 2012

Just like citing other sources of information—encourages findability, credits the creator, makes any impact trackable.

Promotes more and better science, as it enables reuse and verification of data.

Rewards the data producer—may encourage others to deposit data.

The importance of citations. . .

ICSTI Annual Meeting 2012

DataCite—very international with members around the world (CDL & Purdue US members, Microsoft & ICPSR associates)

Co-Data—International Council for Science, Committee on Data for Science & Technology

International Association for Social Science Information Services & Technology

Day-PASS—Data Preservation Alliance for the Social Sciences, membership organization of archives and research centers to date

Some Advocates for a Culture of Data Citation

ICSTI Annual Meeting 2012

Linkability and Citability of Research Data Responsibilities for researchers, data

archives, publishers Co-reponsibility for bi-directional linking

between datasets and publications using persistent identifiers

Support for data reuse Issued in June 2012 Joined by CrossRef in July

Joint STM-DataCite Statement

ICSTI Annual Meeting 2012

Researcher

Institution

Funder

Publisher

Data Manage

r

Need for collaboration among all major participants in the Research Cycle

ICSTI Annual Meeting 2012

Funder mandates for data sharing plans encourage new thinking from some disciplines.

Connection with the publications is needed. FundRef new initiative within CrossRef Collaboration between publishers and funders to

make connections between grants and resulting publications

Pilot for publishers to create and submit standard metadata with funder name and grant number.

Working group includes several publishers and funders. http://www.crossref.org/fundref/index.html

Funder/Research/Publishing Connections

ICSTI Annual Meeting 2012

Established to solve the name abiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers.

Provides an open and transparent linking mechanism between ORCID, other identifiers, and research objects—pubs, grants, patents, etc.

Governed by a board representing all stakeholders.

Launching this month. http://about.orcid.org/

ORCID Another Example

ICSTI Annual Meeting 2012

Designed to facilitate information exchange about research and scholarship.

Funded by NIH, National Center for Research Resources

Initially, 7academic institutions, but growing APA has instance: www.vivo.apa.org Semantic web of information to support

interconnectedness and trust support maintenance of research data.

And VIVO still another . . .

ICSTI Annual Meeting 2012

Data standards, changing cultures, new

infrastructures will help us avoid the tumult we’ve

experienced with supplemental materials.

ICSTI Annual Meeting 2012

Past expectation--psychologists do not withhold data and will share for verification of results.

New expectation—authors must agree to share data.

In Psychology—a new model for data sharing at APA

ICSTI Annual Meeting 2012

New Journal Open in Every Regard

ICSTI Annual Meeting 2012

Psychologists worried— Potential nefarious uses—unscrupulous people

could twist the data or hector people author trying to help.

Well-intentioned but inept secondary analysis—they might get it wrong!

Loss of potential publications for self—I haven’t written all my articles from this data!

But most common fear—loss of academic credit for what may be years of data collection.

Sharing broadly a rare event in the past

ICSTI Annual Meeting 2012

Archives of Scientific Psychology is very different from other APA journals in 4 regards:

Authors must submit data to APA or approved repository.

Journal is electronic only. It is an open access/author pays model. Authors must submit two methods

sections: 1 scientific and 1 in lay language.

A radical change for psychology. . .

ICSTI Annual Meeting 2012

Authors sign a Collaboration Agreement specifying that others may reuse their data.

Researchers who wish to reuse the data must sign a Collaboration Agreement stating1. They will not do anything to reveal identity

of subjects.2. They will not engage in “gotcha” publishing

—run analyses to prove author wrong and publish the results.

3. They will offer the original data collector co-authorship.

Data Collaboration Most Radical Aspect

ICSTI Annual Meeting 2012

Change the paradigm for use and reuse of data in psychological research by assuring full attribution and credit for the original creator of the data.

Contribute to the culture of transparency and prevention of fraud in science.

Maintain APA’s high standards for peer-reviewed literature and contributions to science.

APA’s Goals

The jury is still out—but manuscripts are coming in.

ICSTI Annual Meeting 2012

As all the participants in the

scholarly communications process work to

enhance access to data, there

undoubtedly will be more revolutionary

changes.

ICSTI Annual Meeting 2012

Linda BeebeSenior Director, PsycINFO

American Psychological [email protected]

www.apa.org/pubs/index.aspx

Thanks for Listening!

By building standards, we can change cultures.