Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an...

23
Journal Metrics Perspective from an Open Access Publisher Martin Fenner Technical Lead Article-Level Metrics Public Library of Science Open Metrics http://pixabay.com/en/ruler-straight-edge-tool-geometry-145940/

Transcript of Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an...

Page 1: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Journal Metrics Perspective from an Open Access Publisher Martin Fenner

Technical Lead Article-Level Metrics Public Library of Science

Open Metrics

http://pixabay.com/en/ruler-straight-edge-tool-geometry-145940/

Page 2: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Usage Stats

Most immediate metric that directly reflects usage

Only useful if data are collected in a standardized way – COUNTER is the standard and looks at HTTP status codes, double-click intervals, and excludes robots

Page 3: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

http://open-access.net/fileadmin/OAT/OAT14/Tage-Koeln-Traue_keiner_Statistik_Recke_2014.pdf

Page 4: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

4

2854

34113

3247 2558

Scopus Citations ≥ 10

HTML Views ≥ 2000

Usage is different from scholarly citations

Metrics collected August 8, 2012

42,772 PLOS ONE Papers

Page 5: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Citations

http://dx.doi.org/10.1371/journal.pone.0063184

Citations have become a proxy for scholarly impact

Many problems with unreflected use of citation metrics, in particular in the assessment of individual researchers

Page 6: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Citations are collected via reference lists

On a broader scale, these findings suggest that as a communitywe are far from managing data citation in a manner that allows thedevelopment of tools and services such as data citation metrics, orother integrative applications. Text-mining can be used to extendstructured data citation, and could be a basis for the developmentof services to help authors or editors to add structured content atthe beginning of the publication process, rather than after the fact.Furthermore, feeding this citation information back to the sourcedatabases provides leads for database curators and contributes tothe deeper integration of public data resources.

Outcomes

N All the data described in this article are available at http://europepmc.org/ftp/oa/AccNoAnalysisData/.

N The Whatizit ANA pipeline for ENA, UniProt and PDBaccession numbers is integrated into the ePMC infrastructureand all the gathered accession numbers are available via theePMC web site and web services (http://europepmc.org/WebServices).

N The extensions and improvements to the Whatizit ANApipeline will be applied to the ePMC core program of namedentity recognition and will be available via the web site andweb services.

N Tagged versions of the OA article set will be made available onan ongoing basis from the FTP site in the future.

N Accession numbers mined from articles will be fed back to thesource databases to further improve the integration ofliterature with data.

Supporting Information

File S1 Journal based analysis of the OA-PMC articles.This file presents distribution of the articles as well as thepublisher-annotated articles based on the journals in the OA-PMCset.(XLSX)

Acknowledgments

We would like to thank the Rebholz research group at the EBI (2003-2012)for developing the Whatizit service, the EBI Literature Services Group forthe development of many of the core data services used in this study,Andrew Caines for help producing the figures and Alex Bateman forcritical reading of the manuscript.

Author Contributions

Conceived and designed the experiments: SK JK JRM. Performed theexperiments: SK JK. Analyzed the data: SK JK JRM. Wrote the paper: SKJK JRM.

References

1. Kahn P, Hazledine D (1988) NAR’s new requirement for data submission to theEMBL data library: information for authors. Nucleic Acids Res 16(10): I–IV.

2. Science as an Open Enterprise (2012) The Royal Society. Available: http://royalsociety.org/policy/projects/science-public-enterprise/report/. Accessed2013 Apr 8.

3. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Yepes AJ (2007) Textprocessing through Web services: calling Whatizit. Bioinformatics 24(2):296–298.

4. McEntyre JR, Ananiadou S, Andrews S, Black WJ, Boulderstone R, et al. (2011)UKPMC: a full text article resource for the life sciences. Nucleic Acids Res39:d58–65.

5. Neveol A, Wilbur WJ, Lu Z (2011) Extraction of data deposition statements fromthe literature: a method for automatically tracking research results. Bioinfor-matics 27(23):3306–3312.

6. Neveol A, Wilbur WJ, Lu Z (2012) Improving links between literature andbiological data with text mining: a case study with GEO, PDB and MEDLINE.Database (Oxford) 2012: bas026.

7. Fink JL, Kushch S, Williams PR, Bourne PE (2008) BioLit: integrating biologicalliterature with databases. Nucleic Acids Res 36(Web Server Issue):W385–9.

8. Haeussler M, Gerner M, Bergman CM (2011) Annotating genes and genomeswith DNA sequences extracted from biomedical articles. Bioinformatics27(7):980–6.

9. Finn RD, Mistry J, Tate J, Coggill P, Heger A, et al. (2010) The Pfam proteinfamilies database. Nucleic Acids Res Database Issue 38:D211–222.

10. Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, et al.(2005) ArrayExpress – a public repository for microarray gene expression data atthe EBI. Nucleic Acids Res (2005) 33 (Suppl 1): D553–D555.

11. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. (2002)InterPro: an integrated documentation resource for protein families, domainsand functional sites. Brief Bioinform (3):225–35.

Database Citation in Full Text Biomedical Articles

PLOS ONE | www.plosone.org 9 May 2013 | Volume 8 | Issue 5 | e63184

http://dx.doi.org/10.1371/journal.pone.0063184

Page 7: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Reference lists have to be collected in a central resource and in a standard format

Page 8: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

http://www.crossref.org/01company/02history.html

CrossRef's specific mandate is to be the citation linking backbone for all scholarly information in electronic form

Page 9: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Limitations

CrossRef citation linking built around references that have DOIs from CrossRef members – usually scholarly articles

CrossRef Cited-By service only available to CrossRef members for their own articles

CrossRef is a non-profit organization with publishers as members – no academic institutions, funders, other stakeholders

Page 10: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Alternative citation indexes for non-publisher users

Page 11: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Reference lists increasingly contain non-article references

http://dx.doi.org/10.1371/journal.pone.0115253 Loss of Current and Past ContextIn this experiment, we aim at providing an insight into the loss of current and pastcontext surrounding articles in our corpora as a result of, respectively, link rot andlack of temporally representative Mementos for referenced URIs. We consideredthe network consisting of individual articles and the specific URIs they referenceand found that it was made up of numerous unconnected clusters and hencewould not be very helpful for our purpose. Instead, we explored a more abstractnetwork consisting of the corpora as sources of URI references and the TLD foreach of those references as targets. The number of URI references to web at largeresources sourced per corpus is known (Table 3) and the number of URIreferences targeting a TLD was obtained by simply reducing each referenced URIto its TLD. In order to keep the visualization of the resulting networkinterpretable, we then restricted our analysis to the top six target TLDs for ourcorpora: org, edu, com, gov, uk, and de.

Fig. 16 shows the references flowing from our three corpora on the right to thesix TLDs on the left. The width of the connections is proportional to the numberof references. The figure reveals that all three corpora have a large fraction of theirURI references targeting resources in the org TLD. In addition, the majority ofarXiv references lead into the edu domain. Given the scholarly nature of the

Fig. 3. STM articles and URI references per publication year - PMC corpus.

doi:10.1371/journal.pone.0115253.g003

Scholarly Context Not Found

PLOS ONE | DOI:10.1371/journal.pone.0115253 December 26, 2014 18 / 39

Page 12: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

DataCite provides DOIs for academic institutions and data centers

Page 13: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

DataCite DOIs are not just for datasets

http://datacite.labs.orcid-eu.org/help/status

Page 14: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

DataCite citation linking works differently, and is separate from CrossRef

A DOI is not a DOI

http://datacite.labs.orcid-eu.org/help/status

Page 15: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

CrossRef and DataCite announce initiative Nov 2014

• Provide comprehensive support for interlinking between articles and data.

• Develop open APIs and open source tools to surface citations and other relationships between publications and data sets.

• Integrate into their services other existing scholarly communications initiatives such as ORCID and FundRef.

• Develop systems, workflows and best practices for using DOIs to reference large, highly granular and dynamic data.

https://www.datacite.org/CrossRefDataCiteinitiative

Page 16: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

There is more than usage stats and citations

RESEARCH ARTICLE

VIEWED SAVED DISCUSSED RECOMMENDED CITED

PLOS HTMLPLOS PDFPLOS XMLPMC HTMLPMC PDF

CiteULikeMendely

NatureBlogsScienceSeekerResearchBloggingPLOS CommentsWikipediaTwitterFacebook

F1000 Prime CrossRefPMCWeb of ScienceScopus

Increasing Engagement

http://dx.doi.org/10.3789/isqv25no2.2013.04

Page 17: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Days since publicationAugust 27, 2014

PLOS collects metrics from 22 data sources

http://alm.plos.org/articles/info:doi/10.1371/journal.pone.0105948

Page 18: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

New metrics not ready for impact assessment

Any metric we use should have good reliability (consistency) and validity.

More work is needed in these areas for novel assessment metrics such as Mendeley or Twitter.

http://en.wikipedia.org/wiki/Validity_(statistics)#mediaviewer/File:Reliability_and_validity.svg

Page 19: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Work on best practices and standards has started

Alternative Metrics Initiative Phase 1

White Paper

June 6, 2014

http://www.niso.org/topics/tl/altmetrics_initiative/

Phase II of the project starts in early 2015

Page 20: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

Altmetrics data can be obtained from commercial service providers

Page 21: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

https://github.com/articlemetrics/alm-report

Open source software to collect and analyze the metrics data

https://github.com/articlemetrics/lagotto

Page 22: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

CrossRef DOI Event Tracker (DET) PilotCrossRef Labs has started a pilot project to collect events around all CrossRef DOIs issued since January 2011.

A DOI Event Tracker (DET) CrossRef working group was formed in May 2014, initiated by members of the Open Access Scholarly Publishers Association (OASPA).

The service is available at http://det.labs.crossref.org, and is using the Lagotto open source software.

Page 23: Journal Metrics Perspective from an Open Access Publisher · Journal Metrics Perspective from an Open Access Publisher Martin Fenner ... biological data with text mining: a case study

This presentation is made available under a CC-BY 4.0 license.

http://creativecommons.org/licenses/by/4.0/