Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly...

33
@openaire_eu Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments DI4R December 2017, Brussels Dimitris Pierrakos, Athena Research Center Jochen Schirrwagen, Bielefeld University

Transcript of Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly...

Page 1: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

@openaire_euEverything Counts in Large Amounts:Measuring the Impact of Usage Activity in Open Access Scholarly Environments

DI4R December 2017, Brussels

Dimitris Pierrakos, Athena Research CenterJochen Schirrwagen, Bielefeld University

Page 2: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● OpenAIRE infrastructure and Usage Statistics Service.

● Usage Data Collection strategies.

● Using Piwik for tracking and analytics.

● Applying COUNTER rules.

● Metrics in the Repository Manager Dashboard.

● Relation to Open Metrics and Next Generation Metrics.

Overview

Page 3: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• A pan-European Research Information platform to

monitor OA research outcomes from EC and other

national funders.

• Research analytics tools to promote new scientific

metrics & support evidence-based decision-making.

• Implementation of an OpenAIRE usage statistics

service for usage data collected from data providers.

OpenAIRE 2020

Page 4: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Task in OpenAIRE2020 covers:○ aligning policies and standards for gathering and sharing of usage data

-> guidelines○ considering legal aspects (data protection / data privacy)○ relating usage statistics to other kinds of metrics○ collecting and processing of usage data and producing consolidated,

standards-based usage statistics

● Task team: Athena Research Center, University of Bielefeld, University of Minho, Jisc IRUS-UK, Couperin + NOADs

Usage Statistics in OpenAIRE

Page 5: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● OpenAIRE collects from 980 compatible data providers

~21 Mio documents

● currently 32 active data providers participating in

Usage statistics + IRUS-UK

● Usage statistics deployment under cc-0.

○ in OpenAIRE dashboard, portal and API.

Usage Statistics in the OpenAIRE Infrastructure

Page 6: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Tracking of views and downloads / collecting COUNTER reports

○Push or Pull collection workflows.

● Anonymisation of IP-addresses.● Metadata de-duplication enables accumulation of

views and downloads for same documents ● COUNTER Code of Practice compatibility.

○standards based usage statistics.○enables comparability with statistics from other data sources.

Usage Statistics Service Features

Page 7: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• World's leading open-source analytics platform.• Valuable insights into website traffic and visitors activity. • Piwik collects and stores PII (personally identifiable

information).• Keeps full data ownership and can control who has access. • Robot filtering plugin.• Compliant with EU regulations.• Recommended by privacy organizations such as ULD

(Germany) and CNIL (France).

Piwik Analytics platform

Page 8: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Piwik Google Analytics

Number of Hits per Month Unlimited 10 million

Number of user accounts per login Unlimited 10

Data storage time Unlimited 25 months

Number of properties

(websites, apps etc.) tracked per

account Unlimited 50

Custom Variables 5 5

Data Export Unlimited 5000 rows

Real time Analytics

Piwik offers real-time web

analytics

in all of its reports.

GA monitors user activity right

after it happens,

although period of delay is not

explicitly stated.

Piwik Facts

Page 9: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Metadata-Index

UsageStatistics-DB

● Repository

● CRIS

● eJournal

● National

Statistics Node

● Publisher

PULLCOUNTER

Report

PUSHtracked

event

IP-Anonym.

processing script

processing script

2-Tiers Collection Workflows for Usage Statistics

Page 10: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• An institutional repository is registered in Piwik.• Server side tracking: Plugins (Dspace) or patches

(Eprints) using Piwik’s HTTP API.• Usage Activity is tracked and logged at Piwik

platform in real time.• Ιnformation is transferred offline, using Piwik’s API,

to OpenAIRE’s DBs for statistical analysis.• Statistics are deployed via OpenAIRE’s Portal or

Sushi-Lite API.

Tier-1: Push Usage Statistics Tracking Workflow

Page 11: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Parameter Description

idSite the ID of the repository

idVisit a visitor/session ID (an 8 byte binary string)

visitIP (optionally anonymized) the IP address of the visitor

action the action performed (view, download, outlink, etc)

url the url of the requested item

timestamp the date & time of the request

OAI-PMH Identifier

the Open Access Initiative identifier of the item being

viewed/downloaded

agent the Web Browser and the operating system of the visitor

referrer The url linked to the item requested

Tier-1: Piwik Tracking Parameters

Page 12: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Usage events can be considered privacy-sensitive information (user-agent, ip-address, ...)

● Usage statistics services must comply with data protection laws and regulations for both usage data- and service-providers○ but legal situation differs between the countries○ OpenAIRE must comply with the EU-General Data Protection

Regulation

● Tracking plugins issued by OpenAIRE anonymize usage data already on the client-side

Data Protection Aspects

Page 13: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Usage Activity in real time

Page 14: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Real time Visitor Map

Page 15: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• Applying data processing rules according to COUNTER Code of Practice:• ie. counting requests depending on session duration, tracing double-

clicks

• Bot filtering• Piwik Bot Plugin• COUNTER Robots Working Group

• Link of usage event with metadata record in OpenAIRE

• Accumulate views and counts of de-duplicated records

Cleaning and Consolidation

Page 16: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Repository Pilot Statistics

Page 17: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• Gathering of consolidated statistics reports from aggregation services, such as IRUS-UK, using protocols such as SUSHI-Lite.

• Statistics are stored to OpenAIRE’s DB for statistical analysis.

• Statistics are deployed via OpenAIRE’s Portal or Sushi-Lite API.

Tier-2: Collecting (Pull) Consolidated Usage Statistics Reports

Page 18: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

entityId/orid

entityId/orid

entityId/orid

entityId/orid

source

source

OpenAIRE Usage Statistics DB

Page 19: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Four steps to join OpenAIRE Usage Statistics1. Download. 2. Configure. 3. Deploy. 4. Validate (by OpenAIRE).

● Or enter SUSHI endpoint to let OpenAIRE collect COUNTER reports

OpenAIRE Repository Manager Dashboard

Page 20: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Content Provider Dashboard -Start Page

Page 21: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Content Manager’s Datasource selection for Metrics

Page 22: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Enable Metrics for selected Datasource

Page 23: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Configure Metrics for selected Datasource

000

01233456

Page 24: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Summarized Usage Statistics on the content provider level

Page 25: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Usage Statistics on the Article Level

Page 26: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Available as beta with the help of IRUS-UK○ http://beta.services.openaire.eu/usagestats/sushilite/

● Supports COUNTER R4 compatible reports:○ Article Reports (AR) and Book Reports (BR) using identifiers like

openaire, doi, oai-record-id○ Item Reports (IR)○ Repository Reports (RR) using identifiers issued by OpenAIRE or

OpenDOAR○ Journal Reports (JR) using identifiers like ISSN

SUSHI-Lite Interface

Page 27: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

Repository Report Item Report

SUSHI response example (JSON)

Page 28: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• Quantitative indicators for research

• Governance

• Management

• Assessment

• Dimensions

• Robust metrics in terms of accuracy and scope;

• Humble metrics recognizing that quantitative evaluation should support qualitative,

expert assessment;

• Open and Transparent metrics;

• Diverse metrics by field in order to support the plurality of research and researcher career

paths across the system;

• Reflexible metrics for recognising, anticipating and updating the systemic and potential

effects of indicators.

OpenAIRE: A Usage Statistics Hub for Responsible Metrics

Page 29: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

• Standardization: following COUNTER Code of Practice• by update to COUNTER R5• by contribution to COUNTER Robots Working Group

• Put usage statistics into context with conventional and alternative metrics and (open) peer review

Considering the HLEG Altmetrics Recommendations

Page 30: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● Develop Piwik plugins for other Repository platforms (eg. Fedora, Samvera)

● Promote the service to content provider managers● Support national usage statistics initiatives to

become a node in OpenAIRE Usage Statistics● Contribute to the Open Metrics concept and vision● Activities in OpenAIRE-Advance starting in 2018:

○support LA Referencia to set up a regional usage statistics network and interlink

○working towards Open Metrics

Next Steps

Page 31: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

●Standardize usage statistics to enable assessment of research impact

○Standardize usage statistic metrics across OpenAIRE and EOSC-hub

○Collaborate with RDA (e.g. Make Data Count BoF working group)

○Promote common guidelines to and across communities

○Take EC rules and GDPR regulations into account

●Enable the collection/aggregation of usage stats from content providers

○Adopt OpenAIRE and EOSC-hub services for collecting user statistics, services in scope:

■EGI: Accounting System, AppDB

■ EUDAT: DPMT, B2SHARE, B2FIND, B2SAFE

○Adopt OpenAIRE Usage Statistics Services to collect user stats for all products of

science

■ e.g. literature, datasets, software, research objects

■ Integrating with EOSC-hub services for usage statistics/metrics

Collaboration with EOSC-hub

Page 32: Everything Counts in Large Amounts: Measuring the Impact of Usage Activity in Open Access Scholarly Environments - #DI4R2017 session

● OpenAIRE Usage Statistics Deliverable Report○ https://doi.org/10.5281/zenodo.1034163

● Repository Tracking Plugins (github)○ https://github.com/openaire/OpenAIRE-Piwik-DSpace○ https://github.com/openaire/EPrints-OAPiwik

● SUSHI-Lite API (beta)○ http://beta.services.openaire.eu/usagestats/sushilite/

References