LIS650lecture 0 Introductory lecture Thomas Krichel 2008-10-18.
Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary...
-
Upload
sebastian-salazar -
Category
Documents
-
view
213 -
download
0
Transcript of Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary...
ariw and AuthorClaim: current state
Thomas Krichelhttp://openlib.org/home/krichel
prepared for the first retreat for disciplinary repositoriesMonterey 2009-10-19
introduction
I am here representing two activites RePEc Open Library Society (OLS)
ariw AuthorClaim (RePEc?)
RePEc is more established. OLS may become an umbrella organiztation
for RePEc.
open library society
This is a 503 1 c charity set up by Thomas Krichel to support the work on the registration systems.
The purposes of the society are formulated quite generally. The society can support related purposes.
In the next few weeks a formal alignment between RePEc and the society may becoming along such as to enable a legal representation, or at least support, through the society.
two official projects
ARIW is a registry of academic and research institutions in the world.
It lives at http://ariw.org. The data comes from a similar collection
(academ.cc) and from Isidro Arguillo's data that he uses to build his webometric rankings of academic institutions.
It is not much maintained at this time.
data structure The data is in AMF, an XML encoded format.
Each record contains a unique id for each institution contains an http URL contains one or more name variations
full names abbreviated names names in different language
There are country and US states units. ??? records
ariw.org web site
It is designed to be resdistributable. The entier site can be download in one tarball.
It will install with one Perl script. You may have to get some modules.
The site is self-documenting. AMF data, xslt data and scripts are all fully accessible on the site.
The site's code was written by Thomas Krichel.
users
At this time, AuthorClaim is the only official user of ARIW data.
Registrants can claim affiliation to one or more ARIW identified institutions.
Registrants can also propose to add a new institution if they don't find theirs. As a result, an email is sent to the maintainters of ARIW.
AuthorClaim
AuthorClaim is bascially a implementation of the principle functions of the RePEc Author Service (RAS) into an interdisciplinary document dataset and the ARIW institution dataset.
The most important non-principal function of RAS is citation data processing. There are no plans to integrate that.
ACIS
ACIS is the academic contribution information system. It was written by Ivan V. Kurmanov. The development was funded by the Open Society Institute.
There are about 150 Perl modules in the system, and a large pile of XSLT.
The code is very complicated and very sparsely documented.
basic idea
There are people. There are documents. There are relations between people and
documents. ACIS lets people manage claims of Authorships is only one claim, the system
should really be called ContributorClaim but's not a good term from a marketing perspecive.
basic process
Users register. They maintain a name varitaions profile. The system searches the document data for
matches of the name variations. On initial registration, the searches are
conducted while the user waits. For registered users, the system conducts
searches and informs registrants about new potential contributions that may have appeared.
exported data
The personal data is exported in AMF. Some data elements that are not covered by
AMF are represented through an acis namespace.
The most frequest is <acis:hasnothingtodowith>.
Why do we want to know about papers that somebody has not written, you may wonder.
important absence
At this time, there are individual pages for registered users.
But there is no way to search for them. More generally there is no user service on the personal data.
There is no intention for the society to provide such a service at this time.
isolated uselessness The whole idea of AuthorClaim is to serve as an
intermediary for others to delegate a boring technicality to.
It is not meant to become a point where authors modify document data.
Especially, it will never ever become a document (or metadata) submission system. This involves expertise that AuthorClaim can not.
AuthorClaim is a complement to, not a substitute for the systems that feed it with document data.
document data in AuthorClaim AuthorClaim is rounding up the usual suspects.
arXiv CiteSeer DBLP E-LIS PubMed RePEc SPIRES
Work in the fall 2009 should bring in some major institutional repositories.
centralize author registration?
Author registration is a simple factual claim. Claim verification not require subject expertise It is in principle the same process across
differenent disciplines. It has been talked about a lot for years, but
nothing much appears to be done.
what does autho
is a central system possible?
On an IR level, registration of authors appears not cost effective.
On an discipline-based r
document data format
In principle ACIS uses AMF document data. De facto, it only uses four data elements
id title author name expressions (multiple) URL to provider site or other location data.
Such data is in principle not copyrighted, a further advantage of the system.
at this stage
It has not been up to ballistic start about 20,000,000 documents... about 30 profiles...
but RePEc too took a long time to really take off.
If there no user services soon Thomas Krichel will build one himself, possibly with the help of Jose Manuel Barrueco.
merger with RAS
It is possible that in about a couple of years, RAS will be merged into AuthorClaim.
There are three obstracles: shortIds will have to change citation processing will have to be included in RAS profiling of institution choice will have to be
implemented. Currently the main difference with RAS is the
volume of document records.
processing of potential claims
Each potential claim has to be manually processed.
For people with common names this may involve processing hundreds of potential claims, esp. since PubMed only recently added authors' first names.
To make that easier, Thomas Krichel worked in the Summer of 2009 on a learning system.
SVM learning
Thomas Krichel has some experience with SVM because he is using it with great success in his current awareness work.
In current awareness, learning aims to predict what papers will be included into the current vesion of the report. It looks at features of included and excluded in past issues of the report.
learning about authors As soon as an author has accepted at least one
document, and refused at least one document, it is possible to sort the suggested documents to bring the ones more likely to be close to the front.
AuthorClaim now learns about suggested documents when a new document arrives in the queue when the user refuses documents when the user accepts documents
This happens via a daemon. The daemon sorts the entries in a table of suggested documents.
known document learning
ACIS also learns about documents that have a known status for all refused documents, put the most likely to be
accepted first for all accepted documents, put the most likely to be
refused first Learning about known documents is carried out
when the user logs out.
competition
ResearcherID is a clone of the RAS that has bene done by Thompson / ISI. Neither input nor output data are freely available.
CrossRef are rumoured to work on a system codenamed “CrossReg” to do author registration.
Competition, even if it is succesful, will help to make the concept of author registration more widely know.