D2.3 – Instantiation of the OSSMETER metamodels to...

38
Project Number 318772 D2.3 – Instantiation of the OSSMETER metamodels to Represent Real-life OSS Projects Version 1.1 30 June 2014 Final Public Distribution University of L’Aquila Project Partners: Centrum Wiskunde & Informatica, SOFTEAM, Tecnalia Research and Innovation, The Open Group, University of L 0 Aquila, UNINOVA, University of Manchester, University of York, Unparallel Innovation Every effort has been made to ensure that all statements and information contained herein are accurate, however the OSSMETER Project Partners accept no liability for any error or omission in the same. © 2014 Copyright in this document remains vested in the OSSMETER Project Partners.

Transcript of D2.3 – Instantiation of the OSSMETER metamodels to...

Project Number 318772

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Version 1.130 June 2014

Final

Public Distribution

University of L’Aquila

Project Partners: Centrum Wiskunde & Informatica, SOFTEAM, Tecnalia Research and Innovation,The Open Group, University of L′Aquila, UNINOVA, University of Manchester,University of York, Unparallel Innovation

Every effort has been made to ensure that all statements and information contained herein are accurate, howeverthe OSSMETER Project Partners accept no liability for any error or omission in the same.

© 2014 Copyright in this document remains vested in the OSSMETER Project Partners.

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Project Partner Contact Information

Centrum Wiskunde & Informatica SOFTEAMPaul Klint Alessandra BagnatoScience Park 123 Avenue Victor Hugo 211098 XG Amsterdam, Netherlands 75016 Paris, FranceTel: +31 20 592 4126 Tel: +33 1 30 12 16 60E-mail: [email protected] E-mail: [email protected]

Tecnalia Research and Innovation The Open GroupJason Mansell Scott HansenParque Tecnologico de Bizkaia 202 Avenue du Parc de Woluwe 5648170 Zamudio, Spain 1160 Brussels, BelgiumTel: +34 946 440 400 Tel: +32 2 675 1136E-mail: [email protected] E-mail: [email protected]

University of L′Aquila UNINOVADavide Di Ruscio Pedro MalóPiazza Vincenzo Rivera 1 Campus da FCT/UNL, Monte de Caparica67100 L’Aquila, Italy 2829-516 Caparica, PortugalTel: +39 0862 433735 Tel: +351 212 947883E-mail: [email protected] E-mail: [email protected]

University of Manchester University of YorkSophia Ananiadou Dimitris KolovosOxford Road Deramore LaneManchester M13 9PL, United Kingdom York YO10 5GH, United KingdomTel: +44 161 3063098 Tel: +44 1904 325167E-mail: [email protected] E-mail: [email protected]

Unparallel InnovationNuno SantanaRua das Lendas Algarvias, Lote 1238500-794 Portimão, PortugalTel: +351 282 485052E-mail: [email protected]

Page ii Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Contents

1 Introduction 21.1 Structure of the deliverable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 42.1 Ohloh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 FLOSSmole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 SourceForge Research Data Archive (SRDA) . . . . . . . . . . . . . . . . . . . . . 6

2.4 GHTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.5 MARKOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.6 DOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 OSSMETER Importers 123.1 The Eclipse importer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 The SourceForge importer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 The GitHub importer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 The GoogleCode importer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 The Redmine importer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 OSSMETER Importers Execution 26

5 Conclusions 32

A Appendix 34

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page iii

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Document Control

Version Status Date0.1 Document outline 18 April 20140.2 First draft 4 May 20140.7 First full draft 24 May 20140.8 Further editing draft 30 May 20141.0 For internal review 19 June 20141.1 QA review 30 June 2014

Page iv Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Executive Summary

This document presents the software components (called importers hereafter) that have been devel-oped to automatically generate models conforming to the OSSMETER metamodels defined in D2.1- Domain Analysis of OSS Projects and subsequently refined and validated in D2.2 - Metamodelsfor Describing OSS projects. The developed metamodels are organized in a hierarchy consisting ofa top metamodel defining forge-agnostic aspects that are shared by open source projects. Such acommon metamodel is refined and extended by further metamodels each with forge-specific infor-mation. We developed forge-specific metamodels for Eclipse, GitHub, SourceForge, GoogleCodeand Redmine. For each of them a corresponding importer has been implemented. Such importersare integrated in the OSSMETER platform to automatically retrieve the meta-data of the projects tobe monitored. Such meta-data consists of repository URLs and further information that enable theapplication of the analysis tools developed in the other work packages.

Existing works that collect projects meta-data from different forges are discussed and compared withwhat OSSMETER is doing in this respect.

We discuss the execution of the developed importers to collect meta-data from different forges. Basedon the performed experiments, we draw interesting conclusions e.g., how much time is required tothe importers for downloading the meta-data of the projects at hand.

We conclude the report by discussing how we intent to continue the work with the aim of managingand scheduling the importers in the OSSMETER platform that is discussed in the deliverable D5.3 -Component Integration (Interim).

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 1

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

1 Introduction

Deciding whether an open source software (OSS) meets the required standards for adoption in termsof quality, maturity, activity of development and user support is not a straightforward process as it in-volves exploring various sources of information. The decision becomes even more challenging whenone needs to discover and compare several OSS projects that offer software of similar functional-ity. This task involves exploring a variety of sources of information including the project’s sourcecode repositories, communication channels, and bug tracking systems. To deal with such problems,the OSSMETER is developing an extensible, scalable, cloud-based platform to monitor, analyse, andpresent the wealth of data available in OSS projects in a systematic and automated manner.

The OSSMETER platform will be able to analyse the lifecycle of open source projects. With theterm lifecycle we mean all the activities around an open source software project (i.e., how frequentlynew software releases are issued, how active are software developers) aiming at increasing its quality.To this end the sources of information that OSSMETER will consider are manifold, i.e., source coderepositories, communication channels and bug tracking systems. Considering such heterogeneoussources of information to support automated measurements of open source software represents a keynovelty (e.g., SOFAS, Boa, Ohloh, Alitheia-Core, etc). A detailed comparison of such systems withOSSMETER is discussed in deliverable D5.3.

In OSSMETER metamodels play a key role since they specify the key aspects of OSS projects thathave to be considered for analysis and comparison purposes. In D2.1 - Domain Analysis of OSSProjects a first version of the OSSMETER metamodels has been defined with the aim of creatingmodels representing in a homogeneous manner different aspects of OSS projects (e.g. types anddetails of source code repositories, communication channels and bug tracking systems, types of li-cences, etc.). Subsequently, such metamodels have been refined and validated in D2.2 - Metamodelsfor Describing OSS projects. The developed metamodels are organized as a set of forge-specificmetamodels, which capture relevant project information such as a software repository’s URL or thenumber of downloads. The forge-specific metamodels refine a forge-agnostic metamodel that con-sists of key concepts that are shared by any software forge.

To support the automatic import of projects to be monitored, a set of metadata importers has beenimplemented. These importers extract the information defined in the aforementioned metamodelsfrom the various forges and store them in the database, so that it can be used by the platform andthe metric providers. In particular, importers will be executed before the execution of the projectmetrics to retrieve the information which are required by the metric providers such as the address ofthe source repositories, newsgroups, and bug tracking systems.

In this document we present the importers we have developed by advancing the work that WP2 isdoing according to the following D2.3 deliverable description as in the OSSMETER DoW:

Deliverable D2.3: The deliverable will provide a model-based infrastructure based onEMF/Ecore to support the specification OSS projects in terms of models conformingto the metamodels in D2.2. The outcome of the deliverable D2.3 will be used by theplatform in WP5.

In this activity a number of issues have been identified and managed. In particular, there are twokinds of forges with respect to the way they export project meta-data, ie.,

Page 2 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

– forges like GitHub or SourceForge providing open APIs or Web Services that can be used toextract the data in a structured way;

– forges like GoogleCode providing no further information than the project pages in HTML.

Furthermore, since there is a huge amount of data to be managed in efficient and scalable ways, aNoSQL database has been used. In this respect, WP5 identified MongoDB as a candidate databasemanager. Moreover, to make the development process easier, WP5 has developed a Java POJOgenerator for MongoDB, that uses annotated EMF files to generate all the infrastructure to persist,search and read the data from a MongoDB database [5].

It is important to recall that the OSSMETER metamodels, and consequently the corresponding im-porters, will be continuously refined to address unforeseen requirements and to extend their expres-siveness with respect to the information required by new metric providers that will be developed.

1.1 Structure of the deliverable

The deliverable is structured as follows:

– in Section 2 we review the work collecting project metadata from different forges and weidentify commonalities and differences with the OSSMETER importers;

– in Section 3 we present the developed OSSMETER importers able to retrieve and collectproject metadata from the different forges;

– in Section 4 we discuss the experiments we have done by executing all the importers presentedin Section 3;

– in Section 5 we conclude this document.

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 3

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

2 Related Work

In this section we overview representative platforms that collect the metadata of software projectsfrom different forges. The collected information is used for various purposes, i.e., to permit re-searchers to study the Free/Open Source Software phenomenon, and to enable OSS projects discov-ery and comparison.

2.1 Ohloh

Ohloh [1] is a free but proprietary and closed source system and it is owned and operated by BlackDusck Softwars. It provides OSS project classification facilities (through user-defined tags), enablesOSS project discovery and comparison, and presents source-code and activity-related metrics in anintuitive and understandable manner. Ohloh gathers project metadata from different forges and analy-ses information related to the source code of OSS projects and does not take communication channelsor bug tracking systems into consideration.

Ohloh is certainly a relevant source of information to OSSMETER even though there are both legaland technical issues that might compromise the re-use of Ohloh data by OSSMETER. In particular,according to the Black Duck Software: Ohloh Terms of Use1

“[...] Other than as expressly set forth in these Terms, you may not copy, modify, publish,transmit, upload, participate in the transfer or sale of, reproduce, create derivative worksbased on, distribute, perform, display, or in any way exploit, any of the Content, software,materials, or Sites in whole or in part without written permission from Black Duck. Yourpermission to use and access the Sites and Content will terminate immediately, withoutany further action by Black Duck, if you violate any of these Terms. Black Duck maychange, suspend, or discontinue the Sites at any time, including the availability of anyContent. Black Duck may restrict your access to the Sites or Content or impose limits onsuch access at any time, without prior notice. [...]”

Moreover, from a technical point of view, no automated mechanism is allowed to access the site otherthan the API. Unfortunately, the use of the API is restricted with a limit of 1.000 calls per day.

By considering such issues, at the moment we are not importing data from Ohloh as we show in thenext sections we do not have the needs to do so, since the information that we are able to retrievedirectly from the forges are enough for applying currently developed project analysis as discussed inthe deliverable D5.3 - Component Integration (Interim).

2.2 FLOSSmole

The intention of FLOSSmole [2] (formely OSSmole) is to provide a quality and widely used data-set of OSS projects. The OSSmole data model is designed to support data collection, storage and

1http://meta.ohloh.net/terms/

Page 4 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Forge Data collection details 2 Last im-port3

Free SoftwareFoundation

By parsing the HTML pages of the FSF directory project metadataare extracted and saved to the database.

September2012

Freshmeat Every month Freecode’s own RDF file is downloaded, parsed, andloaded into the FLOSSSmodel database in order to gather informa-tion about the managed projects.

March 2014

GitHub No details available about the data collection process. February2013

Google Code No details available about the data collection process. November2012

Launchpad No details available about the data collection process. September2012

Objectweb Every month data from the OW2 web site are downloaded by creat-ing a list of all the hosted projects, and then by parsing the HTMLpage of each project. From such web pages relevant project informa-tion are gathered and saved to the database.

March 2014

Rubyforge Every month (or so) the Rubyforge list of projects are collected andsome basic developer information are calculated.

May 2014

Savannah No details available about the data collection process. March 2014SourceForge From 2004-2009, approximately six times per year (every other

month) FLOSSmole collected, parsed, and stored metadata abouteach of the projects on Sourceforge. For periods after 2009 FLOSS-mole recommends the use of the SourceForge Research Data Archive(SRDA)

June 2009

SourceKibitzer SourceKibitzer was an initiative to collect metrics about the perfor-mance of various open source software products. SourceKibitzersent FLOSSmole their data on a regular basis from February 2007through September 2007.

September2007

Tigris No details available about the data collection process. March 2014

Table 1: FLOSSmole data collection

analysis from multiple open source forges. The tool supports mainly the collection of metadata suchas a project’s name, the programming language(s) a project is implemented in, supported platform(s),license types, and developer information. This metadata can then be used to construct basic summaryreports about the state of the open source community as well as for conducting social network analysisstudies.

Anyone can download the FLOSSmole raw and summary data. The raw data is provided as multipletext file "data dumps" from the FLOSSmole database. Summary files are compiled periodically,and show basic statistics [4] i.e., the number of projects using a particular open source license type,the number of projects in a particular forge by month and year, and the number of projects that arewritten using each programming language. The data gathered using FLOSSmole can be compared tothe data that are extracted to enable the analysis performed by the OSSMETER platform as discussedin D5.3.

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 5

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

According to the database schema available at http://flossmole.org/content/database-schema-0 and to the data col-lection details discussed at http://flossmole.org/collection_details the supported forges by FLOSSmole are/were:Free Software Foundation, Freshmeat, GitHub, Google Code, Launchpad, Objectweb, Rubyforge,Savannah, SourceForge, SourceKibitzer, and Tigris. Table 1 shows information about how data arecollected from such forges and when the last import operations have been performed.

Currently, FLOSSmole does not provide any API that we could use to retrieve project metadata andthus create OSSMETER models. However, as soon as such APIs will be available we might evaluateto extend the OSSMETER importers. Such evaluations will be performed by considering also if thedata of the forges we are interested in are still updated or not (see Table 1)

2.3 SourceForge Research Data Archive (SRDA)

The SRDA is hosted by the Department of Computer Science & Engineering, University of NotreDame 4. Such data are directly provided by the Dice company (owning SourceForge.net) that de-cided to share certain SourceForge data with the University of Notre Dame for the sole purpose ofsupporting academic and scholarly research on the Free/Open Source Software phenomenon. In par-ticular, on a monthly basis, a complete dump of the databases (minus the data dropped for privacyand security reasons) is shared with Notre Dame. The Notre Dame researchers have built a data ware-house comprised of these monthly dumps, with each dump stored in a separate database. Thus, eachmonthly dump is a snapshot of the status of all the SourceForge.net projects at that point in time. Asof November 2005, the data warehouse was almost 300 GBytes in size, and is growing at about 25GBytes per month. To help researchers determine what data is available, an ER-diagram and the def-initions of tables and views in the data warehouse are provided. For each month, the data warehouseincludes three major parts:

• the tables supporting the SourceForge.net web site, e.g., the tables user, and group;• the tables used to store the statistics of the whole community, including daily page access, and

downloads;• the tables with the history information on the other tables.

Dice has given Notre Dame permission to share this data with other academic researchers studyingthe Free/Open Source Software phenomenon. To get the data, researchers have to send a request tothe Notre Dame PI (Greg Madey). Only academic and scholarly researchers are eligible to receive thedata after having completed, signed and returned a questionnaire and an agreement 5. In particular, aresearcher, who has been granted to access the data receives a user id and password that will provideaccess to a Web-based form permitting direct SQL queries against the data archive.

As in the case of Ohloh, by considering also the existing integration issues due to the lack of apublic API, currently we are importing Sourceforge data directly from SourceForge.net and we havepostponed to a latter phase of the project the decision to exploit the data available in the SRDA uponan agreement with the University of Notre Dame.

4http://srda.cse.nd.edu/5http://www.nd.edu/~oss/Data/Sublicense5.pdf

Page 6 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

2.4 GHTorrent

The GHTorrent project [3] uses the Github API to collect raw data and extract, archive and sharequeryable metadata. GHTorrent monitors the Github public event timeline. For each event, it re-trieves its contents and their dependencies, exhaustively. It then stores the raw JSON responses to aMongoDB database, while also extracting their structure in a MySQL database 6. The project doesnot provide any analyses on the data and it is limited only to collect and share Github data. Everytwo months, the project releases the data collected during that period as downloadable archives 7,also shared with the Bittorent protocol. As described in [3], the Github API supports two types ofqueries:

• resource queries retrieve a specific instance of a resource. According to the REST architecture,the URL identifying a static resource remains constant after the resouce has been initialized;

• range queries retrieve a list of resources, usually related to a given resource and that typicallychanges over time. For instance, the query /{user}/followers retrieves the followers for auser, while the query /{user}/{repo}/commits retrieve a commits for a repository

According to the OSSMETER vision, the data collected by the developed injectors are mostly relatedto information that enable the application of metric providers each implementing a specific analysis.Consequently, the commits of a given project managed by Github are not directly represented inthe corresponding OSSMETER project model that contains only static metadata including the URLof the version control system used by that project. A specific metric provider will exploit suchmetadata, and thus will access to the repository of the project by means of the URL stored in themodel. Consequently, using GHTorrent in OSSMETER might be an overkill since the required databy OSSMETER are less than those collected by GHTorrent. In this respect, we have developed theGithub injector able to collect only the metadata that are required by OSSMETER and that representa subset of those collected by GHTorrent. Consequently, in principle the data that might be collectedwith such specific importers can be more updated than those provided by GHTorrent that are renewedonly every two months.

2.5 MARKOS

The Market for Open Source - An Intelligent Virtual Open Source Project (MARKOS) 8 will realizethe prototype of an automatic service providing an integrated view on the open source softwareavailable on the web. The service will provide semantic querying and browsing tools to inspect thestructure of the software code, showing the defined software entities (components, classes, libraries,etc.), and their dependencies across different projects. Moreover, it will support its users on the legalanalysis of the open source software, to identify the reasons of possible license violations.

OSSMETER shares most of the final goal of MARKOS with the aim of supporting users to decidewhether an open source software (OSS) project meets the required standards for adoption in terms of

6http://ghtorrent.org/7http://ghtorrent.org/downloads.html8http://www.markosproject.eu/

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 7

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

quality, maturity, activity of development and user support. To this end OSSMETER will implementan extensible platform able to explore a variety of sources of information including the project’ssource code repositories, communication channels, and bug tracking systems.

MARKOS has not released yet any API we could rely on to import project metadata. As soon asMARKOS will release their crawler (i.e., software tools similar to the OSSMETER importers pre-sented in this document) we will investigate at what extend we can integrate with them. However,at the time being there is already an established collaboration with MARKOS about the manage-ment of licensing aspects that represent a key role in MARKOS but that are currently neglected inOSSMETER.

2.6 DOAP

DOAP (Description of a Project) 9 is a project to create an XML/RDF vocabulary to describe soft-ware projects, and in particular open source projects. In addition to developing an RDF schemaand examples, the DOAP project aims at providing tool support in all the popular programming lan-guages. In other words, DOAP is a machine-readable document that is used to share informationabout a project. A DOAP descriptor can be used for 10:

• easy importing of projects into directories• automated updating of directories• data exchange between directories• automatic configuration for resources such as mailing lists, shared repositories and issue track-

ers assisting package maintainers who bundle resources for distributors

Table 2: Overview of the DOAP Vocabulary

RDF Class RDF Property Description

Project

homepage : Literal URL of a project’s homepageold-homepage : Literal URL of a project’s past homepage, associated with ex-

actly one project.name : Literal Name of the project.created : Literal Date when the project was created, in YYYY-MM-DD

form. e.g. 2004-04-05shortdesc : Literal Short (8 or 9 words) plain text description of a project.description : Literal Plain text description of a project, of 2-4 sentences in

length.license : Literal The URI of an RDF description of the license the soft-

ware is distributed under. E.g. a SPDX referencerelease : Version A project release.mailing-list : Literal Mailing list home page or email address.category : Literal A category of project.repository : Repository Source code repository.

9https://github.com/edumbill/doap/wiki10http://oss-watch.ac.uk/resources/doap

Page 8 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

download-page : Literal Web page from which the project software can bedownloaded.

download-mirror : Literal Mirror of software download web page.wiki : Literal URL of Wiki for collaborative discussion of project.bug-database : Literal Bug tracker for a project.screenshots : Literal Web page with screenshots of project.maintainer : Person Maintainer of a project, a project leader.developer : Person Developer of software for the project.documenter : Person Contributor of documentation to the project.translator : Person Contributor of translations to the project.<tester : Person A tester or other quality control contributor.helper : Person Project contributor.programming-language : Literal Programming language a project is implemented in or

intended for use with.os : Literal Operating system that a project is limited to. Omit this

property if the project is not OS-specific.implements : Specification A specification that a project implements. Could be a

standard, API or legally defined level of conformance.service-endpoint : Resource The URI of a web service endpoint where software as

a service may be accessedlanguage : Literal ISO language code a project has been translated intovendor : Organization Vendor organization: commercial, free or otherwiseplatform : Literal Indicator of software platform (non-OS specific), e.g.

Java, Firefox, ECMA CLRaudience : Literal Description of target user baseblog : Resource URI of a blog related to a project

Version

revision Revision identifier of a software release.file-release URI of download associated with this releaseos Operating system that a version is limited to.platform Indicator of software platform (non-OS specific), e.g.

Java, Firefox, ECMA CLRSpecification A specification of a system’s aspects, technical or oth-

erwise.In particular it is used to represent a specifica-tion that a project implements. Could be a standard,API or legally defined level of conformance.

Repositoryanon-root : Literal Repository for anonymous accessbrowse : Literal Web browser interface to repositorylocation : Literal Location of a repository

SVNRepository

Those of the superclass. These are all specializations of Repository.

BKRepositoryCVSRepositoryArchRepositoryBazaarBranchGitBranchHgRepository

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 9

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

DarcsRepository

Person

surname : Literal Surname of the given person.familyName : Literal Family name of the person.firstName : Literal First name of the person.workInfoHomepage : Literal Home page of the person.

Listing 1 shows the DOAP document of the gnome-bluetooth project11. The main concepts of theDOAP vocabulary as defined in the corresponding RDF schema 12 are shown in Table 2 and consistsof 12 RDF Classes and 37 RDF properties.

The DOAP concepts are similar to those available in the OSSMETER metamodels. We consideredthe DOAP project to implement such metamodels even though we preferred to define a set of inter-related metamodels instead of a unique vocabulary. In particular, in OSSMETER we have a forge-agnostic metamodel (as recalled in the next section and shown in Fig. 3) amenable to extensions andrefinements. Then, we have defined forge-specific metamodels to properly represent the conceptsof SourceForge, Eclipse, GoogleCode, Redmine, and GitHub forges. Moreover, we have definedalso metamodels to properly capture the concepts of the different VCS technologies, communicationchannels, and bug tracking systems.

1<?xml version="1.0" encoding="UTF-8"?>2<Project xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"3 xmlns:foaf="http://xmlns.com/foaf/0.1/"4 xmlns="http://usefulinc.com/ns/doap#">56 <name>GNOME Bluetooth Subsystem</name>7 <homepage rdf:resource="http://usefulinc.com/software/gnome-bluetooth" />8 <created>2002-11-11</created>9

10 <shortdesc xml:lang="en">11 Support for Bluetooth devices on the GNOME desktop.12 </shortdesc>1314 <description xml:lang="en">15 The GNOME Bluetooth Subsystem is a suite of applications and a16 developer platform for managing Bluetooth devices from the17 GNOME desktop.18 </description>1920 <mailing-list rdf:resource="http://lists.usefulinc.com/mailman/listinfo/gnome-bluetooth/" />2122 <maintainer>23 <foaf:Person>24 <foaf:name>Edd Dumbill</foaf:name>25 <foaf:homepage rdf:resource="http://usefulinc.com/edd" />26 </foaf:Person>27 </maintainer>2829 <release>30 <Version>31 <name>unstable</name>32 <created>2003-06-07</created>33 <revision>0.4.1</revision>34 </Version>35 </release>3637 <license rdf:resource="http://spdx.org/licenses/GPL-2.0+" />

11 https://github.com/edumbill/doap/blob/master/examples/gnome-bluetooth-doap.rdf12https://raw.githubusercontent.com/edumbill/doap/master/schema/doap.rdf)

Page 10 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

38 <bug-database rdf:resource="http://bugzilla.gnome.org/" />39 <screenshots rdf:resource="http://usefulinc.com/software/gnome-bluetooth/#shots" />40</Project>

Listing 1: The gnome-bluetooth-doap.rdf DOAP document

According to the information available in the DOAP web pages, the sites that uses DOAP to presentproject information are the following13:

• http://projects.apache.org/ : the Apache Software Foundation uses DOAP files to manage aregistry of projects housed at the foundation (see http://projects.apache.org/docs/index.htmlwhich provides a pointer to the list of DOAP files).

• http://pypi.python.org/ : the Python Package Index has DOAP available for each package.• http://www.dolibarr.org/ the Dolibarr ERP/CRM has a DOAP to describe Dolibarr software.• http://git.gnome.org/repositories.doap : the GNOME project uses DOAP profiles for its

projects to generate descriptions of the projects in the Git web interface.• http://pear.php.net/package/Auth_PrefManager2/doap : PEAR, the PHP Extension and Appli-

cation Repository, has DOAP available for each package.• https://joinup.ec.europa.eu/asset/adms_foss/description : Asset Description Metadata Schema

for Software made by European Commission that uses DOAP as part of the Schema.• http://packages.qa.debian.org/ : the Debian Package Tracking System uses DOAP to provide

RDF exports of the available packages.• http://ontologi.es/cpan-data/dist/RDF-Trine/project: DOAP data is provided for each package

in CPAN, the Comprehensive Perl Archive Network.

By taking into account the commonalities of DOAP and the developed OSSMETER metamodels,developing an importer able to extract data from DOAP documents and import them in OSSMETERis not complex. Consequently, we could plan to do so if during the project the consortium decides forsome reason to analyse also projects that are managed by one or more of the sites previously listed.

13https://github.com/edumbill/doap/wiki/Sites

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 11

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

3 OSSMETER Importers

In the previous deliverables D2.1 and D2.2 we have defined, implemented, and validated a numberof metamodels representing the key aspects that have to be considered for enabling the analysis andcomparison of OSS projects. Such metamodels have been implemented by means of EMF/Ecoreand are then used to define schema of the database that will be used to store models conforming tosuch metamodels. In particular, as detailed in the deliverable D5.3, MongoDB is used as databaseand Pongo14 is the tool that has been implemented in WP5 to manage the persistence of modelsin MongoDB. Pongo is a template-based POJO generator for MongoDB, which is built atop theMongoDB Java driver15. With Pongo, an engineer can define a model of the data to be stored using atextual modelling language called Emfatic16. Pongo then uses this model to generate strongly-typedJava classes, which can be used to work with the database at a more convenient level of abstraction.For each class in the data model, a POJO class is created that extends the core Pongo class andprovides support for querying the database in an intuitive manner. Any changes made to objects arecached until a synchronise method is invoked.

Base Forge metamodel

GitHub metamodel

SourceForge metamodel

Google Code metamodel

Eclipsemetamodel

Redminemetamodel

Figure 1: The forge metamodel hierarchy: the common meta-data is captured in a forge-agnosticmetamodel, which is specialised for each OSS forge

An overview of the developed metamodels is given in Figure 1: there is a set of concepts that areforge-agnostic and includes: information related to the project’s version control system(s), commu-nication channels (e.g. forums, mailing lists), and bug tracking system(s), as well as the licenses ofthe project, and information related to the people who contribute in some manner to the project. Wecapture this information in a forge-agnostic metamodel; each OSS forge then specialises the meta-model to include any extra meta-information it provides, in a similar way to which object-orientedcode can be specialised.

In the next section we describe the importers we have developed to collect project metadata fromdifferent forges. Importers have been developed in Java and organized as shown in Fig. 2.b.Each importer consists of a Java class implementing the methods of the abstract class shown inFig. 2.a. In particular, each importer implements three methods: importAll, importProjects,and importProject. The importAll method retrieves the list of all the projects hosted in theconsidered forge. For each project, the importProject method is executed and it uses the Pongo-generated code to create and save the project information using the retrieved meta-data. The method

14http://code.google.com/p/pongo/15http://docs.mongodb.org/ecosystem/drivers/java/16http://www.eclipse.org/epsilon/doc/articles/emfatic/

Page 12 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Importer abstract class Structure of the implementation code

Figure 2: OSSMETER project importers

importProjects retrieves the meta-data of the set of projects given as input. Such a method makesuse of the importProject method as well to create the retrieved project information and save it inthe used MongoDB by using the Pongo-generated code. All projects, irrespectively of their forge,are stored in the same MongoDB collection. It is important to note that the importAll method hasbeen implemented mainly for testing purposes. In fact, the OSSMETER platform will manage a listof projects that users are interested in. Consequently, importers will be executed to collect meta-dataof such monitored projects only, instead of importing meta-data of all the projects managed by theconsidered forges.

In the following, we describe the developed importers able to collect project metadata from Eclipse,GitHub, SourceForge, GoogleCode, GitHub, and Redmine. Such importers are able to collect thecommon project metadata represented in the forge-agnostic metamodel (see Fig.317) plus those ad-ditional metadata that are forge-specific as recalled in the next sections. The development of forge-specific importers has been done by considering two different ways that forges use to show their data.In particular, there are

– forges like GitHub or SourceForge that provide open APIs that can be used to extract the datain a structured way.

– forges like GoogleCode that do not provide further information than the project Web pages.In these cases specific HTML parsers are required to collect project meta-data from the Webpages of the considered forge and store them in the platform database.

The adoption of structured APIs is the preferred method even though there might be some restrictionsapplied by forges to take into account. For instance, the rate limit for the GitHub APIs is 5,000requests per hour for authenticated users, or 60 requests per hour for unauthenticated users. As such,our Github importer enters an idle state once the rate limit is reached, and it automatically restartswhen possible. Due to network problems that may occur during the importing process, we have

17For readability reasons some auxiliary metaclasses and attributes that are used by the OSSMETER platform toexecute metric providers are not shown

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 13

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Figure 3: The metamodel capturing concepts common between OSS forges

implemented mechanisms that permit the importers to restart from where they left off. Without sucha mechanism, in case of network disconnections, importers have to be re-executed from scratch,losing the metadata of the projects already imported.

3.1 The Eclipse importer

The Eclipse Foundation18 provides a hosting system for projects that contribute to the Eclipse in-tegrated development environment. Unlike forges such as SourceForge19 or GitHub20, Eclipse pre-scribes a rigorous set of requirements that a project must meet in order to be hosted in their forge.Furthermore, new projects are initially entered into an incubation period until they meet the stan-dards to be accepted as a full Eclipse project. Figure 4 presents our specialisation of the forge-agnostic metamodel for the Eclipse forge. The specialisation adds extra meta-information – such asa project description, the project’s homepage on the Eclipse website – as well as information relatedto the reviewing process of the project, any related documentation articles, and its list of releases.Furthermore, contributors to a project are assigned roles, such as mentor, committer, and leader.

The collection of data from the Eclipse forge has been done by exploiting an API that permits thegathering of project meta-data from a JSON representation of them. Such API is not public yet andit has been provided to us by Eclipse. For each project it is possible to retrieve a JSON document

18www.eclipse.org19www.sourceforge.net20www.github.com

Page 14 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Figure 4: The specialisation of the common metamodel for the Eclipse forge.

containing most of the meta-data of the considered project as in Listing 2 that shows a fragment of aJSON document containing the meta-data of the birt project21.

1{2 projects: {3 birt: {4 title: "Business Intelligence and Reporting Tools (BIRT)",5 description: [6 {7 value: "<p>BIRT is an open source Eclipse-based reporting system that integrates with your Java

/Java EE application to produce compelling reports.</p> ",8 summary: "",9 format: "filtered_html",

10 ...11 }12 ],13 parent_project: [ ],14 bugzilla: [15 {16 product: "BIRT",17 component: "",18 create_url: "https://bugs.eclipse.org/bugs/enter_bug.cgi?product=BIRT",19 query_url: "https://bugs.eclipse.org/bugs/buglist.cgi?product=BIRT"20 }21 ],22 download_url: [],23 licenses: [],24 mailing_lists: [25 {26 name: "birt-dev",27 email: "[email protected]",28 url: "https://dev.eclipse.org/mailman/listinfo/birt-dev"29 },30 ...31 ],32 other_links: [],33 plan_url: [],34 proposal_url: [],35 tags: [ ],

21Business Intelligence and Reporting Tools: http://projects.eclipse.org/projects/birt/

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 15

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

36 website_url: [],37 wiki_url: [],38 scope: [],39 source_repo: [40 {41 type: "github",42 name: "birt",43 path: "https://github.com/eclipse/birt",44 url: "https://github.com/eclipse/birt"45 }46 ],47 state: [],48 build_description: [ ],49 build_doc: [ ],50 build_technologies: [ ],51 forums: [],52 logo: [ ],53 techology_types: [],54 contrib_message: [ ],55 downloads: [ ],56 downloads_message: [ ],57 marketplace: [ ],58 update_sites: [ ],59 related: [ ],60 team_project_sets: [ ],61 documentation: [ ],62 releases: []63 }64 }65}

Listing 2: Fragment of the JSON document with the metadata of the birt project retrieved fromEclipse

However, as shown in Table 3 not all the information of a given Eclipse project is available via thisAPI. In particular, the committers and the list of supported platforms of a given project are availableonly from the Web pages of the project. This has required the development of specific HTML parsersfor retrieving the metadata not available via the API but that are necessary to fully represent anEclipse project. The HTML page of each project has been considered also to retrieve some URLs ofthe different communication channels provided for a project. In particular, the information related toDocumentation, Wiki, MailingList, and Forums are available in JSON, the URL of the NNTP serveris only available from the HTML pages of the project.

In order to import all the required Eclipse projects metadata completely from JSON documents, weare already in contact with the Eclipse Foundation in order to extend the JSON export for includingalso the data that currently are available only from the project Web pages.

3.2 The SourceForge importer

Figure 5 presents our specialisation of the forge-agnostic metamodel for SourceForge. The speciali-sation adds extra meta-information such as project releases and supported platforms as shown in thelower side of Table 4.

Similarly to the previous forge, SourceForge provides APIs for retrieving information of the hostedprojects. In particular, given a projectId, the corresponding meta-data represented in a JSON docu-ment is available at http://sourceforge.net/api/project/name/projectId (see Listing 3). Unfortunately, the API does not

Page 16 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Eclipse project meta-data Sourcename JSON

shortName JSONdescription JSON

parent JSONvcsRepositories JSON

communicationChannels (Documentation) JSONcommunicationChannels (Wiki) JSON

communicationChannels (MailingList) JSONcommunicationChannels (EclipseNewsGroup) HTML

bugTrackingSystems JSONpersons HTMLlicenses JSON

paragraphUrl JSONdescriptionUrl JSONdownloadsUrl JSON

homePage JSONprojectplanUrl JSONupdatesiteUrl JSONprojectStatus JSON

platforms HTMLreviews JSONarticles JSONreleases JSON

Table 3: Main project meta-data imported by the Eclipse importer

provide the means to retrieve the list of all the projects hosted by SourceForge. A workaround wehave implemented to solve this problem exploits the Web interface of the SourceForge projects direc-tory. In particular, at the link http://sourceforge.net/directory/?page=id it is possible to retrieve a list of 25 projectswhere id is an integer identifying a directory page. The importAll method of the SourceForge im-porter iterates on the identifier id, and for each value it parses the corresponding HTML page in orderto retrieve the list of 25 project identifiers to be used for executing the importProject method.

1Project: {2 name: "Clonezilla",3 created: "Jul 26, 2004",4 created_timestamp: 1090888156,5 id: 115473,6 shortdesc: "clonezilla",7 description: "Clonezilla is a partition and disk imaging/cloning program similar to True Image®.

It saves and restores only used blocks in hard drive. Two types of Clonezilla are available,Clonezilla live and Clonezilla SE (Server Edition). ",

8 percentile: 99.9904925573,9 ranking: 54,

10 download-page: "http://sourceforge.net/project/showfiles.php?group_id=115473",11 support-page: "http://sourceforge.net/projects/clonezilla/support",12 summary-page: "http://sourceforge.net/projects/clonezilla",13 mailing-list: "http://sourceforge.net/mail/?group_id=115473",14 homepage: "http://clonezilla.org",15 base_url: "http://sourceforge.net",16 licenses: [

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 17

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Figure 5: The specialisation of the common metamodel for the SourceForge forge.

17 {18 name: "GNU General Public License version 2.0 (GPLv2)",19 url: "http://www.opensource.org/licenses/gpl-license.php"20 }21 ],22 os: [23 ...24 ],25 topics: [26 ...27 ],28 programming-languages: [29 ...30 ],31 audiences: [32 ...33 ],34 translations: [35 ...36 ],37 environments: [38 ...39 ],40 categories: [41 ...42 ],43 trackers: [44 {45 name: "Patches",46 location: "http://sourceforge.net/tracker/?group_id=115473&atid=671652"47 },48 ...49 ],50 maintainers: [51 {

Page 18 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

52 name: "steven_shiau",53 homepage: "http://sourceforge.net/users/steven_shiau",54 mbox_sha1sum: "f20da2be2c9a5c3e25b1acf962765f4934d97096"55 }56 ],57 developers: [58 {59 name: "thomas_tsai",60 homepage: "http://sourceforge.net/users/thomas_tsai",61 mbox_sha1sum: "62e3a7bfee2d9ee7b681181c5d263915ee0a450e"62 },63 ...64 ],65 ...66 }67}

Listing 3: Fragment of the JSON document with the metadata of the Clonezilla project retrieved fromSourceForge

SourceForge project meta-data Sourcename JSON

shortName JSONdescription JSON

parent JSONvcsRepositories JSON

communicationChannels JSONbugTrackingSystems JSON

persons JSONlicenses JSONprivate JSON

percentile JSONranking JSON

download-page JSONsupport-page JSON

summary-page JSONhomepage JSON

os JSONtopics JSON

programmingLanguages JSONaudiences JSON

environments JSONcategories JSONtrackers JSON

Table 4: Main project meta-data imported by the SourceForge importer

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 19

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

3.3 The GitHub importer

Figure 6 shows the specialisation of the forge-agnostic metamodel for GitHub. The specialisationadds extra meta-information such as the urls if the considered repository, its size, and informationabout the existing forks of a given project.

Figure 6: The specialisation of the common metamodel for the GitHub forge.

GitHub provides users with complete and advanced APIs that enable the retrieval of all the informa-tion of the hosted projects. This allowed us to develop the importer without parsing HTML pages.Unfortunately, there is not a way to directly retrieve the list of all the projects hosted by GitHub. Atthe link https://api.github.com/repositories?since=id it is possible to retrieve 100 projects having identifiers greaterthan id. Thus the developed importAll method iterates on the value of the attribute id, and foreach value executes the importProject method on sets of 100 projects. The meta-data of a givenproject are available at https://api.github.com/repos/projectId, where projectId is a string identifying the con-sidered project. Listing 4 shows a fragment of the retrieved JSON document corresponding to theproject epsilon of the user eclipse. The main metadata that are retrieved by the developed GitHubimporter are shown in Table 5. Unfortunately, among the most relevant metadata we have not foundany source of information about the licenses of projects.

1{2 id: 7426484,3 name: "epsilon",4 full_name: "eclipse/epsilon",5 owner: {6 login: "eclipse",7 id: 56974,

Page 20 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

GitHub project meta-data Sourcename JSON

shortName JSONdescription JSON

parent JSONvcsRepositories JSON

communicationChannels JSONbugTrackingSystems JSON

persons JSONlicenses -private JSON

fork JSONhtml_url JSONclone_url JSON

git_url JSONssh_url JSONsvn_url JSON

mirror_url JSONhomepage JSON

size JSONmaster_branch JSON

full_name JSONlanguages JSON

Table 5: Main project meta-data imported by the GitHub importer

8 avatar_url: "https://avatars.githubusercontent.com/u/56974?",9 gravatar_id: "c2a46ff510bbe6080941bc84ed1e89d2",

10 url: "https://api.github.com/users/eclipse",11 html_url: "https://github.com/eclipse",12 followers_url: "https://api.github.com/users/eclipse/followers",13 following_url: "https://api.github.com/users/eclipse/following{/other_user}",14 gists_url: "https://api.github.com/users/eclipse/gists{/gist_id}",15 starred_url: "https://api.github.com/users/eclipse/starred{/owner}{/repo}",16 subscriptions_url: "https://api.github.com/users/eclipse/subscriptions",17 organizations_url: "https://api.github.com/users/eclipse/orgs",18 repos_url: "https://api.github.com/users/eclipse/repos",19 events_url: "https://api.github.com/users/eclipse/events{/privacy}",20 received_events_url: "https://api.github.com/users/eclipse/received_events",21 type: "Organization",22 site_admin: false23 },24 private: false,25 html_url: "https://github.com/eclipse/epsilon",26 description: "modeling.emft.epsilon project website",27 fork: false,28 url: "https://api.github.com/repos/eclipse/epsilon",29 forks_url: "https://api.github.com/repos/eclipse/epsilon/forks",30 tags_url: "https://api.github.com/repos/eclipse/epsilon/tags",31 git_url: "git://github.com/eclipse/epsilon.git",32 ssh_url: "[email protected]:eclipse/epsilon.git",33 clone_url: "https://github.com/eclipse/epsilon.git",34 svn_url: "https://github.com/eclipse/epsilon",35 homepage: null,

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 21

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

36 size: 107994,37 forks_count: 1,38 open_issues_count: 0,39 forks: 1,40 open_issues: 0,41 watchers: 0,42 default_branch: "master",43 network_count: 1,44 subscribers_count: 345 ...46}

Listing 4: Fragment of the JSON document with the metadata of the repository retrieved from GitHun

In order to reduce as much as possible the number of API requests per project there are some meta-data that are not imported even though this does not compromise the execution of the metric providerscurrently developed (see D5.3). In particular, we do not import all the milestones and all the issuesof a project even though we import the number of open issues. We do not import the urls of the fileslike pdf, articles, etc. that can be downloaded directly (such elements are represented by means ofthe metaclasses GitHubDownload and GitHubContent in the metamodel shown in Fig. 6).

3.4 The GoogleCode importer

Figure 6 shows the specialisation of the forge-agnostic metamodel for GoogleCode. The specialisa-tion adds extra meta-information such as the urls of the wiki, forum, and issue tracker tools associatedto each project. It is important to remark that on June 14, 2013 Google has discontinued the IssueTracker API and currently, it is possible to retrieve project meta-data only by parsing the HTMLpages. Unfortunately, this is not reliable especially because there are many projects with custompages that affect the way meta-data can be retrieved. As also said in D5.3, because of such diffi-culties, most probably Google Code will not be supported by OSSMETER. However, the alreadydeveloped importer is able to collect the project meta-data shown in Table 6.

GoogleCode project meta-data Sourcename HTML

shortName HTMLdescription HTML

parent -vcsRepositories HTML

communicationChannels HTMLbugTrackingSystems HTML

persons HTMLlicenses HTML

downloads HTMLwiki HTML

forum HTMLissueTracker HTML

Table 6: Main project meta-data imported by the GoogleCode importer

Page 22 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Figure 7: The specialisation of the common metamodel for the GoogleCode forge.

Figure 8: GoogleCode Web page

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 23

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Figure 9: The specialisation of the common metamodel for the Redmine forge.

The development of the method importAll exploits the number shown at the upper right-hand sideof the Web page available at https://code.google.com/hosting/search?q=&sa=Search (see Fig. 8). Obviously, Google-Code is not hosting only 996 projects. However, this is the best that can be done because of the lackof a proper API.

3.5 The Redmine importer

The specialisation of the forge-agnostic metamodel that we have developed for Redmine forges isshown in Figure 9. The specialisation adds extra meta-information such as the urls of the wiki, issuetrackers, and versions associated to each project (see Table 7). The importer has been developedby taking advantage of the REST API provided by Redmine. The API provides access and basicCRUD operations (create, update, delete) for different resources such as projects, issues, and versions(http://www.redmine.org/projects/redmine/wiki/Rest_api). The API supports both XML and JSON formats. As donefor the other importers we decided to develop the importers by considering the JSON format. Listing5shows the JSON document representing some of the meta-data of the Redmine project managed bythe official Redmine forge (http://www.redmine.org/). The implementation of the method importAll usesthe API request http://[instUrl]/projects.json able to collect all the projects managed bythe considered Redmine installation available at instUrl.

1 {2 project: {3 id: 1,4 name: "Redmine",

Page 24 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

5 identifier: "redmine",6 description: "Redmine is a flexible project management web application written using Ruby on

Rails framework.",7 homepage: "",8 status: 1,9 created_on: "2007-09-29T10:03:04Z",

10 updated_on: "2009-03-15T11:35:11Z"11 }12 }

Listing 5: JSON document with some metadata of the project Redmine retrieved from the officialRedmine forge

Redmine project meta-data Sourcename JSON

shortName JSONdescription JSON

parent JSONvcsRepositories JSON

communicationChannels JSONpersons JSONlicenses JSON

created_on JSONupdated_on JSONdownloads JSON

wiki JSONissueTrackers JSON

versions JSONqueries JSON

Table 7: Main project meta-data imported by the Redmine importer

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 25

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

4 OSSMETER Importers Execution

In this section we discuss some experiments we have done by executing all the importers presentedin the previous section. A dataset consisting of 500,000 OSS projects hosted on Eclipse, Source-Forge, and GitHub has been already presented at the 11th Working Conference on Mining SoftwareRepositories (ICSE2014) [6]. For performing the experiments we have used an HP ProLiant Mi-croserver with an AMD Athlon(tm) II Neo N36L Dual-Core Processor (800 Mhz), 2GB RAM con-nected to the network of the University of L’Aquila. Each importer has been executed by means ofthe importAll() method with an empty database. The execution times are shown in Table 8: in thetable we distinguish the number of projects that have been processed from those that have been actu-ally imported in the database. Such a distinction has been necessary because of the SourceForge andGoogleCode forges that can show a given project many times at different pages of their Web directo-ries. Consequently, for such forges the number of projects that are actually imported in the databaseis lower than those processed.

By considering the total number of imported projects, and the time required to import them, Table 8shows the number of projects that can be imported per minute. Also in this case we make a distinc-tion between the average value and the real one. In particular, as mentioned in the previous sections,GitHub, SourceForge, and GoogleCode apply restrictions on the requests that user can do for retriev-ing information about the hosted projects. GitHub temporally blocks the clients that have reachedthe number of APIs requests per hour, SourceForge and GoogleCode reduce their response time. Inthis respect, the value # imported projects per minute (avg) is lower than those calculated by execut-ing the importers only for one minute on an empty database i.e., the value # imported projects perminute (real). This is not true in the case of Eclipse since the retrieval of the projects list performedat the beginning of the importing, consists of downloading a JSON document having a size of about7MB. Consequently, the time required to download such a document affects the value of the last rowof Table 8 for Eclipse. Concerning Redmine we are not aware of any forge with a considerable num-ber of hosted projects. Thus, for performing the experiments we have considered a local installationof Redmine used by the Model-Driven Engineering group of the University of L’Aquila. That in-stallation hosts 12 projects and consequently, the value of # imported projects per minute (real) isnot available, whereas # projects per minute (avg) is an approximation that has been calculated byconsidering the time needed for importing the 12 projects.

Eclipse GitHub SourceForge GoogleCode Redmine

# of processed projects 235 699,987 37,953 937 12

# of imported projects 235 699,987 26,287 769 12

execution time 40 min 23,040 min (16 days) 23,040 min (16 days) 38 min 10 sec

# imported projects perminute (avg)

5.875 30.39 1.14 24.66 72

# imported projects perminute (real)

8 373 123 48 -

Table 8: Executions time of the OSSMETER importers

Page 26 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

a) absolute values

b) % projects processed / projects imported

Figure 10: Projects processed and imported

By considering the issues previously mentioned, Fig. 10 shows two charts related to the importingof GitHub, SourceForge, and GoogleCode projects that have been collected by executing for 16 daysthe GitHub and SourceForge importers, and 38 minutes the GoogleCode importer as shown in Tab. 8.Even though GitHub applies restrictions on its API requests, it permits to import all the projects that

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 27

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

a) absolute values

b) % average projects processed per minute / real projects processed per minute

Figure 11: Projects processed per minute

are processed. This is not the case for neither SourceForge nor GoogleCode. In fact, according toFig. 10.b the number of projects that are actually imported with respect to the processed one are 68%in the case of SourceForge and 85% in the case of GoogleCode.

Page 28 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

Fig. 11 shows the projects that can be imported per minute by considering the values # importedprojects per minute (avg) and # imported projects per minute (real). As shown in Fig. 11.b, thevalues differ substantially especially in the case of SourceForge. Fig. 11.b does not include Redminesince we do not have the value of # imported projects per minute (real) for it. In fact, because of thelimited set of projects hosted in our Redmine forge, the maximum execution time that we can havefor the Redmine importer is of 10 seconds. The differences of the values of # imported projects perminute (avg) and # imported projects per minute (real) are due to two main reasons: i) the policiesthat are applied by forges to prevent abuse of the services, and ii) the HTML parsing operations thatreduce the performance of the whole importing processes. However, it is important to recall thatin the case of SourceForge, HTML parsing operations are performed mainly to retrieve the list ofprojects to be processed. Consequently, the performance of the importer would definitely improve ifwe execute the importer by passing to it directly the identifiers of the projects to be processed (seethe method importProjects discussed in the previous section and shown in Fig. 2).

In the remaining of the section we make some estimations about the time that would be required toupdate the local information of the projects analysed by OSSMETER. At the time of writing this doc-ument, the total number of projects hosted by Eclipse are 235 and as previously said the developedimporter takes 40 minutes to collect the meta-data of all them. This means that the execution of theEclipse importer can be properly scheduled to have daily updates of the local information. Concern-ing GoogleCode it is impossible to know how many projects are hosted, and the best that it is possibleto do is parsing the HTML pages that unfortunately permit to collect only about 900 projects.

Concerning GitHub and SourceForge the situation is different. In particular, let us assume that theOSSMETER platform has to manage 100,000 projects, 60% from GitHub, and 40% from Source-Forge. According to the executions time shown in Fig. Tab. 9 the time needed for updating all theGitHub projects would be ≈33 hours if we consider # imported projects per minute (avg) and lessthan 3 hours otherwise. The time needed for SourceForge is significantly higher if the # importedprojects per minute (avg) is considered. Consequently, in the worst case updating the meta-data of100,000 projects distributed as previously said would require approximatively 16 days. The trend ofsuch total time with respect to different number of projects to be imported is shown in 12. It is inter-esting to note that the time needed for updating GitHub projects duplicates if we double the numberof projects to be imported. This is not the case for SourceForge whose time increases exponentiallyas shown in Fig. 12.b.

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 29

D2.3

–Instantiation

oftheO

SSME

TE

Rm

etamodels

toR

epresentReal-life

OSS

Projects

Number of projects 10,000 50,000 100,000 200,000 500,000 1,000,000

60% from GitHub 6,000 30,000 60,000 120,000 300,000 600,000

40% from SourceForge 4,000 12,000 24,000 48,000 120,000 240,000

Time required to updatethe projects from GitHub[hours (days)]By considering the value #projects per minutes (avg)

3.29 16.46 32.91(1.37)

65.83(2.74)

164.57(6.86)

329.15(13.71)

By considering the value #projects per minutes (real)

0.27 1.34 2.68 5.36 13.40 26.81(1.69)

Time required to updatethe projects from Source-Forge [hours (days)]By considering the value #projects per minutes (avg)

58.43(2.43)

175.30(7.30)

350.59(10.61)

701.18(29.22)

1,752.96(73.04)

3,505.92(146.08)

By considering the value #projects per minutes (real)

0.27 1.63 3.25 6.50 16.26 32.52

Total time required to up-date all the projects [hours(days)]By considering the value #projects per minutes (avg)

61.72(2.57)

191.75(7.99)

383.51(15.98)

767.01(31.96)

1,917.53(79.90)

3,835.06(159.79)

By considering the value #projects per minutes (real)

0.81(0.03)

2.97(0.12)

5.93(0.25)

11.87(0.49)

29.66(1.24)

59.33(2.47)

Table 9: Estimation of the time required to update projects

Page30

Version

1.1C

onfidentiality:PublicD

istribution30

June2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

a) By considering the value # projects per minute (real) in Tab. 8

b) By considering the value # projects per minute (avg) in Tab. 8

Figure 12: Total time (in hours) required to update projects

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 31

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

5 Conclusions

This document has presented the software components (named importers) that have been imple-mented to collect project metadata from Eclipse, SourceForge, Github, GoogleCode and Redmineforges. The task of collecting project metadata from existing forges is not completely novel and it issupported by other existing projects. In the document we have discussed the aim of the OSSMETERimporters and how they are different from similar attempts.

Concrete applications of the importers have been presented by discussing real execution times andby doing estimations that take in consideration different numbers of monitored projects ranging from10’000 to 1’000’000.

We have already started the integration of the importers in the platform developed in WP5. Themetamodels and the corresponding importers will be subject of continuous refinements especially toaddress the requirements of new metric providers that might require additional metadata that are notcovered by the current version of the OSSMETER metamodels. Moreover, if needed we will get intouch with MARKOS and SRDA to make advantage of their data about SourceForge projects.

Page 32 Version 1.1Confidentiality: Public Distribution

30 June 2014

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

References

[1] Black Duck Software. Ohloh. http://www.ohloh.net, 2014. [Online; accessed 3-May-2014].

[2] Megan Conklin, James Howison, and Kevin Crowston. Collaboration using ossmole: a repositoryof floss data and analyses. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 1–5.ACM, 2005.

[3] Georgios Gousios. The ghtorrent dataset and tool suite. In Proceedings of the 10th WorkingConference on Mining Software Repositories, MSR ’13, pages 233–236, Piscataway, NJ, USA,2013. IEEE Press.

[4] James Howison, Megan S. Conklin, and Kevin Crowston. Flossmole: A collaborative repositoryfor floss research data and analyses. International Journal of Information Technology and WebEngineering, 1(3):17–26, 2006.

[5] University of York. D5.2 - Project Metadata Repository Design and Implementation , September2013.

[6] James R. Williams, Davide Di Ruscio, Nicholas Drivalos Matragkas, Juri Di Rocco, and Dim-itris S. Kolovos. Models of oss project meta-information: a dataset of three forges. In Premku-mar T. Devanbu, Sung Kim, and Martin Pinzger, editors, MSR, pages 408–411. ACM, 2014.

30 June 2014 Version 1.1Confidentiality: Public Distribution

Page 33

D2.3 – Instantiation of the OSSMETER metamodelsto Represent Real-life OSS Projects

A Appendix

Answers to the WP2 issues identified in the first technical review report

Considerable effort has been spent in the metamodel definition, implementation and domain analy-sis of popular forges. For OSS model creation, the project should consider using information fromFLOSSMole, Ohloh, GHTorrent (although this is maybe an overkill) and MARKOS (which is devel-oping a model and tools for this problem). Also, the project should outline the specific innovationsbrought by OSSMETER which go beyond state-of-the-art OSS tracking directories like Ohloh.

Answer: In Section 2 we have discussed existing platforms that similarly to theOSSMETER importers collect project metadata from different sources of information.Moreover, Section 2 of the deliverable D5.3 provides an overview of the features of thoseplatforms and identifies commonalities and differences with the whole OSSMETER plat-form.

The project should clarify if the project lifecycle analysis is using information produced by WP3,WP4, and if that is the case, which information. The project may use upcoming deliverables toclarify the notion of project life cycle as used by OSSMETER , because that is not clear in the currentdeliverables, and it is not clear how that information is available in the meta-information from forges.

Answer: In Section 1 we have clarified our meaning of the term lifecycle used inOSSMETER.

The project should define in more detail how often they plan to update metadata information fromforges.

Answer: In Section 4 we have presented in detail some experiments we have done byexecuting the importers to collect data from real forges. Depending on the number ofprojects to be monitored and to the forges that host them, it is possible to decide howoften the project metadata should be updated. For instance, all the Eclipse projects mightbe daily updated, and in the worst cases 1’000’000 projects hosted in Github can beupdated in about 14 days.

It would be interesting that the relationship of the proposed model with DOAP, and why it was notused (if that is the case), is discussed in the deliverable about project modeling, or in further deliver-ables in this WP.

Answer: A dedicated subsection of Section 2 is devoted to overview DOAP and to dis-cuss how the OSSMETER metamodels are different.

Page 34 Version 1.1Confidentiality: Public Distribution

30 June 2014