Towards OpenURL Quality Metrics: Initial Findings
description
Transcript of Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
Adam ChandlerCornell University Library
2009 American Library Association Annual Conference, Chicago
OpenURL model
OpenURL model cont. incoming OpenURL
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit
=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech
&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
in our knowledge base?
title: Library hi tech issn: 0737-8831 start date: 19970101 end date:
link-to syntax for Emerald
http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
OpenURL is pervasive
Cornell link resolver alone:July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests.
Estimate: 402,000 * 123(ARL libraries) = 49 million
Cornell’s top 10 OpenURL sources1. Web of Knowledge2. Google Scholar3. Webfeat (our “Find Articles” service)4. EBSCOHost5. OCLC FirstSearch6. SilverPlatter7. Weill Cornell Medical Center8. SciFinder Scholar 9. PubMed10. Refworks
example OpenURL
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
example OpenURL (1)
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831
example OpenURL (2)
&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
Literature review
Since the OpenURL standard was introduced some ten years ago I can identify no systematic study designed and carried out to benchmark the quality of linking.
Wakimoto, Walker, and Dabbour (2006)
Main finding: Users just expect full-text. When they do not get it they are disappointed.
Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
Wakimoto, Walker, and Dabbour (2006)
"Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134)
Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
… but finding the cause of the problem is hard
• Wrong start end date in the local library's holdings knowledge base (see KBART)
• Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)
• Wrong link-to syntax in link resolver• Fragile handling of incoming links by content provider• Inaccurate or missing Crossref DOI URL (sometimes the DOI
registration process is out of sync with the mounting of articles)
• Subscription errors (especially with the start of a new calendar year)
• Syntactically incorrect metadata from the OpenURL origin
Blake and Knudson (2002)
• “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
See: Culling, James (2007). "Link Resolvers and the Serials Supply Chain." UKSG. <http://www.uksg.org/projects/linkfinal> and NISO/UKSG KBART
Blake and Knudson (2002)
• “Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
Blake and Knudson (2002)
• “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
Blake and Knudson (2002)
• “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
Hughes (2004)
• Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community.
.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
Hughes (2004)
• Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation.
.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
Hughes (2004)
• Goals: – establish a baseline against which future
instances can be compared; – provide assistance to data providers; – evaluate a set of domain-grounded controlled
vocabularies.
.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
Hughes’ approach• Each metadata record score from 0 - 10. • There are two parts, a "Code Existence Score and an Element
Absence Penalty," with weighting. • The Code Existence Score is specific to the OLAC communities use
of Dublin Core extensions. • The Element Absence Penalty is based on the premise that the
usefullness of a given metadata decreases in the absence of core metadata fields.
• The absence of a core element results in a negative 0.2 penalty.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
Hughes’ approach• From this simple approach, an array of metrics are derived:
– archive diversity; – metadata quality; – core elements per record; – core element usage; – code usage; – code and element usage; – star rating.
• From these metrics a score is computed for each metadata record, each archive, and the community as a whole.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
Mellon funded planning grant for L'Année philologique
1. Canonical Citation Linking: http://cwkb.orgIn collaboration with Eric Rebillard, Professor, Classics
and History, and David Ruddy, Cornell University Library
2. OpenURL QualityIs it possible to build a system for evaluating OpenURL
quality from a content provider?
Key findings from 2008 Mellon OpenURL quality investigation
Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.
Constant 1: Key elements used by content providers in their link-to targets
title - 64%spage - 64%volume - 61%issue - 60%date - 48%aulast - 47%issn - 35%atitle - 35%DOI - 14%ISBN – 5%
Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.
Constant 2: Frequency of element string patterns for all sources
aulast if ($element =~ /aulast/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.aulast/) { next; } } $patterns{allsids}{$genre}{"aulast"}++; $patterns{$sid}{$genre}{"aulast"}++; if ($value =~ /^[A-Za-z]+$/) { $patterns{$sid}{$genre}
{"aulast_simple"}++; } elsif ($value =~ /^[A-Za-z]+, .+$/) { $patterns{$sid}{$genre}
{"aulast_comma"}++; } elsif ($value =~ /^[A-Z][a-z]+( [A-Z]\.)+$/) { $patterns{$sid}{$genre}
{"aulast_simpleplusinitial"}++; } else {$patterns{$sid}{$genre}{"aulast_other"}++; } }
Simple flat structure
aulast_other examples
Ryan S MillerLouise D BryantDAVID J MCKENZIE%C4%90okovi%C4%87Indu B Ahluwalia Carreras-Sangr%c3%a0Bautista-Casta%C3%B1oO%27SheaMelissa Ventura MarraGuan XueYing%3B Yu Nan%3B Shangguan XiaoXia
spage
if ($element =~ /spage/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.spage/) { next; } } $patterns{allsids}{$genre}{"spage"}++; $patterns{$sid}{$genre}{"spage"}++; if ($value =~ /^\d+$/) { $patterns{$sid}{$genre}{"spage_number"}+
+; } elsif ($value =~ /^\d+-\d+$/) { $patterns{$sid}{$genre}
{"spage_number_number"}++; } elsif ($value =~ /[A-Za-z].+\d/) { $patterns{$sid}{$genre}
{"spage_string_w_number"}++; } else {$patterns{$sid}{$genre}{"spage_other"}++; } }
spage_other examples
• 1033 (6 pages)• 85(19)• 575 (11 pages)• 283...290• PHYS• GLRM• 58,+VI
date if ($element =~ /date/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.date/) { next; } } $patterns{allsids}{$genre}{"date"}++; $patterns{$sid}{$genre}{"date"}++; if ($value =~ /^\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd"}++; } elsif ($value =~ /^\d{4}-\d{2}$/) { $patterns{$sid}{$genre}{"date_dddd-
dd"}++; } elsif ($value =~ /^\d{4}-\d{2}-\d{2}$/) { $patterns{$sid}{$genre}
{"date_dddd-dd-dd"}++; } elsif ($value =~ /^\d{4}-\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd-
dddd"}++; } elsif ($value =~ /^\d{8}$/) { $patterns{$sid}{$genre}{"date_dddddddd"}++;
} else {$patterns{$sid}{$genre}{"date_dateother"}++;} }
date_other examples
• 1956 July• %7E1994• June 5%2C 2002• JUN 30 05• 2006%282007%29• 1922,+April+25th
• %5B%5B1943-06-19%5D%5D
issn_other if ($element =~ /issn/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.issn/) { next; } } $patterns{allsids}{$genre}{"issn"}++; $patterns{$sid}{$genre}{"issn"}++; if ($value =~ /^\d+-\d+$/) { $patterns{$sid}{$genre}
{"issn_number_number"}++; } elsif ($value =~ /^\d+$/) { $patterns{$sid}{$genre}{"issn_number"}++; } elsif ($value =~ /^\d+X$/) { $patterns{$sid}{$genre}{"issn_numberX"}++; } elsif ($value =~ /^\d+-\d+X$/) { $patterns{$sid}{$genre}
{"issn_number_numberX"}++; } else {$patterns{$sid}{$genre}{"issn_other"}++; print "$value\n";} }
issn_other examples
• 0065-2598%28print%29• 0018-5345+%28ISSN+print%29• ISSN ISBN 0-9525091-5-6.• 0021-8375%28print%29%7C1439-
0361%28electronic%29• 1471-2164+%28ISSN+online%29• 0191-8699%3B0191-8699• 0741-8329 (Print)%3B NLM Unique Journal
Identifier%3A 8502311
How often?metric frequency in July-Sep 2008 sample
au_last_other 5476spage_other 772date_other 591issn_other 200
Demo of OQ UI
Element report
Element report
Pattern report
Pattern report
Pattern report
Next steps• add non-Cornell data, from libraries or link
resolver vendors (model is agnostic to source)• confirm and publicize key elements used by
target syntaxes• outreach to content providers• refine and expand metrics• more reports
– longitudinal by source– compare frequency of an element’s use across sources– compare frequency of an element pattern across
sources
How to stay in the loop
http://openurlquality.blogspot.com/
Adam ChandlerDatabase Management and Electronic Resources LibrarianLibrary Technical ServicesCornell University Librarytel: 607-255-5760email: [email protected]