Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary...

26
ariw and AuthorClaim: current state Thomas Krichel http://openlib.org/home/krichel prepared for the first retreat for disciplinary repositories Monterey 2009-10-19

Transcript of Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary...

Page 1: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

ariw and AuthorClaim: current state

Thomas Krichelhttp://openlib.org/home/krichel

prepared for the first retreat for disciplinary repositoriesMonterey 2009-10-19

Page 2: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

introduction

I am here representing two activites RePEc Open Library Society (OLS)

ariw AuthorClaim (RePEc?)

RePEc is more established. OLS may become an umbrella organiztation

for RePEc.

Page 3: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

open library society

This is a 503 1 c charity set up by Thomas Krichel to support the work on the registration systems.

The purposes of the society are formulated quite generally. The society can support related purposes.

In the next few weeks a formal alignment between RePEc and the society may becoming along such as to enable a legal representation, or at least support, through the society.

Page 4: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

two official projects

ARIW is a registry of academic and research institutions in the world.

It lives at http://ariw.org. The data comes from a similar collection

(academ.cc) and from Isidro Arguillo's data that he uses to build his webometric rankings of academic institutions.

It is not much maintained at this time.

Page 5: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

data structure The data is in AMF, an XML encoded format.

Each record contains a unique id for each institution contains an http URL contains one or more name variations

full names abbreviated names names in different language

There are country and US states units. ??? records

Page 6: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

ariw.org web site

It is designed to be resdistributable. The entier site can be download in one tarball.

It will install with one Perl script. You may have to get some modules.

The site is self-documenting. AMF data, xslt data and scripts are all fully accessible on the site.

The site's code was written by Thomas Krichel.

Page 7: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

users

At this time, AuthorClaim is the only official user of ARIW data.

Registrants can claim affiliation to one or more ARIW identified institutions.

Registrants can also propose to add a new institution if they don't find theirs. As a result, an email is sent to the maintainters of ARIW.

Page 8: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

AuthorClaim

AuthorClaim is bascially a implementation of the principle functions of the RePEc Author Service (RAS) into an interdisciplinary document dataset and the ARIW institution dataset.

The most important non-principal function of RAS is citation data processing. There are no plans to integrate that.

Page 9: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

ACIS

ACIS is the academic contribution information system. It was written by Ivan V. Kurmanov. The development was funded by the Open Society Institute.

There are about 150 Perl modules in the system, and a large pile of XSLT.

The code is very complicated and very sparsely documented.

Page 10: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

basic idea

There are people. There are documents. There are relations between people and

documents. ACIS lets people manage claims of Authorships is only one claim, the system

should really be called ContributorClaim but's not a good term from a marketing perspecive.

Page 11: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

basic process

Users register. They maintain a name varitaions profile. The system searches the document data for

matches of the name variations. On initial registration, the searches are

conducted while the user waits. For registered users, the system conducts

searches and informs registrants about new potential contributions that may have appeared.

Page 12: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

exported data

The personal data is exported in AMF. Some data elements that are not covered by

AMF are represented through an acis namespace.

The most frequest is <acis:hasnothingtodowith>.

Why do we want to know about papers that somebody has not written, you may wonder.

Page 13: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

important absence

At this time, there are individual pages for registered users.

But there is no way to search for them. More generally there is no user service on the personal data.

There is no intention for the society to provide such a service at this time.

Page 14: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

isolated uselessness The whole idea of AuthorClaim is to serve as an

intermediary for others to delegate a boring technicality to.

It is not meant to become a point where authors modify document data.

Especially, it will never ever become a document (or metadata) submission system. This involves expertise that AuthorClaim can not.

AuthorClaim is a complement to, not a substitute for the systems that feed it with document data.

Page 15: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

document data in AuthorClaim AuthorClaim is rounding up the usual suspects.

arXiv CiteSeer DBLP E-LIS PubMed RePEc SPIRES

Work in the fall 2009 should bring in some major institutional repositories.

Page 16: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

centralize author registration?

Author registration is a simple factual claim. Claim verification not require subject expertise It is in principle the same process across

differenent disciplines. It has been talked about a lot for years, but

nothing much appears to be done.

Page 17: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

what does autho

Page 18: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

is a central system possible?

On an IR level, registration of authors appears not cost effective.

On an discipline-based r

Page 19: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

document data format

In principle ACIS uses AMF document data. De facto, it only uses four data elements

id title author name expressions (multiple) URL to provider site or other location data.

Such data is in principle not copyrighted, a further advantage of the system.

Page 20: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

at this stage

It has not been up to ballistic start about 20,000,000 documents... about 30 profiles...

but RePEc too took a long time to really take off.

If there no user services soon Thomas Krichel will build one himself, possibly with the help of Jose Manuel Barrueco.

Page 21: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

merger with RAS

It is possible that in about a couple of years, RAS will be merged into AuthorClaim.

There are three obstracles: shortIds will have to change citation processing will have to be included in RAS profiling of institution choice will have to be

implemented. Currently the main difference with RAS is the

volume of document records.

Page 22: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

processing of potential claims

Each potential claim has to be manually processed.

For people with common names this may involve processing hundreds of potential claims, esp. since PubMed only recently added authors' first names.

To make that easier, Thomas Krichel worked in the Summer of 2009 on a learning system.

Page 23: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

SVM learning

Thomas Krichel has some experience with SVM because he is using it with great success in his current awareness work.

In current awareness, learning aims to predict what papers will be included into the current vesion of the report. It looks at features of included and excluded in past issues of the report.

Page 24: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

learning about authors As soon as an author has accepted at least one

document, and refused at least one document, it is possible to sort the suggested documents to bring the ones more likely to be close to the front.

AuthorClaim now learns about suggested documents when a new document arrives in the queue when the user refuses documents when the user accepts documents

This happens via a daemon. The daemon sorts the entries in a table of suggested documents.

Page 25: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

known document learning

ACIS also learns about documents that have a known status for all refused documents, put the most likely to be

accepted first for all accepted documents, put the most likely to be

refused first Learning about known documents is carried out

when the user logs out.

Page 26: Ariw and AuthorClaim: current state Thomas Krichel  prepared for the first retreat for disciplinary repositories Monterey.

competition

ResearcherID is a clone of the RAS that has bene done by Thompson / ISI. Neither input nor output data are freely available.

CrossRef are rumoured to work on a system codenamed “CrossReg” to do author registration.

Competition, even if it is succesful, will help to make the concept of author registration more widely know.