Towards OpenURL Quality Metrics: Initial Findings

44
Towards OpenURL Quality Metrics: Initial Findings Adam Chandler Cornell University Library 2009 American Library Association Annual Conference, Chicago

description

Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: . Delivered at the July 2009 American Library Association conference in Chicago.

Transcript of Towards OpenURL Quality Metrics: Initial Findings

Page 1: Towards OpenURL Quality Metrics: Initial Findings

Towards OpenURL Quality Metrics: Initial Findings

Adam ChandlerCornell University Library

2009 American Library Association Annual Conference, Chicago

Page 2: Towards OpenURL Quality Metrics: Initial Findings

OpenURL model

Page 3: Towards OpenURL Quality Metrics: Initial Findings

OpenURL model cont. incoming OpenURL

http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit

=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech

&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/

in our knowledge base?

title: Library hi tech issn: 0737-8831 start date: 19970101 end date:

link-to syntax for Emerald

http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#

Page 4: Towards OpenURL Quality Metrics: Initial Findings

OpenURL is pervasive

Cornell link resolver alone:July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests.

Estimate: 402,000 * 123(ARL libraries) = 49 million

Page 5: Towards OpenURL Quality Metrics: Initial Findings

Cornell’s top 10 OpenURL sources1. Web of Knowledge2. Google Scholar3. Webfeat (our “Find Articles” service)4. EBSCOHost5. OCLC FirstSearch6. SilverPlatter7. Weill Cornell Medical Center8. SciFinder Scholar 9. PubMed10. Refworks

Page 6: Towards OpenURL Quality Metrics: Initial Findings

example OpenURL

http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/

Page 7: Towards OpenURL Quality Metrics: Initial Findings

example OpenURL (1)

http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831

Page 8: Towards OpenURL Quality Metrics: Initial Findings

example OpenURL (2)

&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/

Page 9: Towards OpenURL Quality Metrics: Initial Findings

Literature review

Since the OpenURL standard was introduced some ten years ago I can identify no systematic study designed and carried out to benchmark the quality of linking.

Page 10: Towards OpenURL Quality Metrics: Initial Findings

Wakimoto, Walker, and Dabbour (2006)

Main finding: Users just expect full-text. When they do not get it they are disappointed.

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Page 11: Towards OpenURL Quality Metrics: Initial Findings

Wakimoto, Walker, and Dabbour (2006)

"Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134)

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Page 12: Towards OpenURL Quality Metrics: Initial Findings

… but finding the cause of the problem is hard

• Wrong start end date in the local library's holdings knowledge base (see KBART)

• Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)

• Wrong link-to syntax in link resolver• Fragile handling of incoming links by content provider• Inaccurate or missing Crossref DOI URL (sometimes the DOI

registration process is out of sync with the mounting of articles)

• Subscription errors (especially with the start of a new calendar year)

• Syntactically incorrect metadata from the OpenURL origin

Page 13: Towards OpenURL Quality Metrics: Initial Findings

Blake and Knudson (2002)

• “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.

See: Culling, James (2007). "Link Resolvers and the Serials Supply Chain." UKSG. <http://www.uksg.org/projects/linkfinal> and NISO/UKSG KBART

Page 14: Towards OpenURL Quality Metrics: Initial Findings

Blake and Knudson (2002)

• “Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.

Page 15: Towards OpenURL Quality Metrics: Initial Findings

Blake and Knudson (2002)

• “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.

Page 16: Towards OpenURL Quality Metrics: Initial Findings

Blake and Knudson (2002)

• “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

Page 17: Towards OpenURL Quality Metrics: Initial Findings

Hughes (2004)

• Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community.

.

Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.

Page 18: Towards OpenURL Quality Metrics: Initial Findings

Hughes (2004)

• Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation.

.

Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.

Page 19: Towards OpenURL Quality Metrics: Initial Findings

Hughes (2004)

• Goals: – establish a baseline against which future

instances can be compared; – provide assistance to data providers; – evaluate a set of domain-grounded controlled

vocabularies.

.

Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.

Page 20: Towards OpenURL Quality Metrics: Initial Findings

Hughes’ approach• Each metadata record score from 0 - 10. • There are two parts, a "Code Existence Score and an Element

Absence Penalty," with weighting. • The Code Existence Score is specific to the OLAC communities use

of Dublin Core extensions. • The Element Absence Penalty is based on the premise that the

usefullness of a given metadata decreases in the absence of core metadata fields.

• The absence of a core element results in a negative 0.2 penalty.

Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.

Page 21: Towards OpenURL Quality Metrics: Initial Findings

Hughes’ approach• From this simple approach, an array of metrics are derived:

– archive diversity; – metadata quality; – core elements per record; – core element usage; – code usage; – code and element usage; – star rating.

• From these metrics a score is computed for each metadata record, each archive, and the community as a whole.

Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.

Page 22: Towards OpenURL Quality Metrics: Initial Findings

Mellon funded planning grant for L'Année philologique

1. Canonical Citation Linking: http://cwkb.orgIn collaboration with Eric Rebillard, Professor, Classics

and History, and David Ruddy, Cornell University Library

2. OpenURL QualityIs it possible to build a system for evaluating OpenURL

quality from a content provider?

Page 23: Towards OpenURL Quality Metrics: Initial Findings

Key findings from 2008 Mellon OpenURL quality investigation

Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.

Page 24: Towards OpenURL Quality Metrics: Initial Findings

Constant 1: Key elements used by content providers in their link-to targets

title - 64%spage - 64%volume - 61%issue - 60%date - 48%aulast - 47%issn - 35%atitle - 35%DOI - 14%ISBN – 5%

Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.

Page 25: Towards OpenURL Quality Metrics: Initial Findings

Constant 2: Frequency of element string patterns for all sources

Page 26: Towards OpenURL Quality Metrics: Initial Findings

aulast if ($element =~ /aulast/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.aulast/) { next; } } $patterns{allsids}{$genre}{"aulast"}++; $patterns{$sid}{$genre}{"aulast"}++; if ($value =~ /^[A-Za-z]+$/) { $patterns{$sid}{$genre}

{"aulast_simple"}++; } elsif ($value =~ /^[A-Za-z]+, .+$/) { $patterns{$sid}{$genre}

{"aulast_comma"}++; } elsif ($value =~ /^[A-Z][a-z]+( [A-Z]\.)+$/) { $patterns{$sid}{$genre}

{"aulast_simpleplusinitial"}++; } else {$patterns{$sid}{$genre}{"aulast_other"}++; } }

Page 27: Towards OpenURL Quality Metrics: Initial Findings

Simple flat structure

Page 28: Towards OpenURL Quality Metrics: Initial Findings

aulast_other examples

Ryan S MillerLouise D BryantDAVID J MCKENZIE%C4%90okovi%C4%87Indu B Ahluwalia Carreras-Sangr%c3%a0Bautista-Casta%C3%B1oO%27SheaMelissa Ventura MarraGuan XueYing%3B Yu Nan%3B Shangguan XiaoXia

Page 29: Towards OpenURL Quality Metrics: Initial Findings

spage

if ($element =~ /spage/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.spage/) { next; } } $patterns{allsids}{$genre}{"spage"}++; $patterns{$sid}{$genre}{"spage"}++; if ($value =~ /^\d+$/) { $patterns{$sid}{$genre}{"spage_number"}+

+; } elsif ($value =~ /^\d+-\d+$/) { $patterns{$sid}{$genre}

{"spage_number_number"}++; } elsif ($value =~ /[A-Za-z].+\d/) { $patterns{$sid}{$genre}

{"spage_string_w_number"}++; } else {$patterns{$sid}{$genre}{"spage_other"}++; } }

Page 30: Towards OpenURL Quality Metrics: Initial Findings

spage_other examples

• 1033 (6 pages)• 85(19)• 575 (11 pages)• 283...290• PHYS• GLRM• 58,+VI

Page 31: Towards OpenURL Quality Metrics: Initial Findings

date if ($element =~ /date/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.date/) { next; } } $patterns{allsids}{$genre}{"date"}++; $patterns{$sid}{$genre}{"date"}++; if ($value =~ /^\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd"}++; } elsif ($value =~ /^\d{4}-\d{2}$/) { $patterns{$sid}{$genre}{"date_dddd-

dd"}++; } elsif ($value =~ /^\d{4}-\d{2}-\d{2}$/) { $patterns{$sid}{$genre}

{"date_dddd-dd-dd"}++; } elsif ($value =~ /^\d{4}-\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd-

dddd"}++; } elsif ($value =~ /^\d{8}$/) { $patterns{$sid}{$genre}{"date_dddddddd"}++;

} else {$patterns{$sid}{$genre}{"date_dateother"}++;} }

Page 32: Towards OpenURL Quality Metrics: Initial Findings

date_other examples

• 1956 July• %7E1994• June 5%2C 2002• JUN 30 05• 2006%282007%29• 1922,+April+25th

• %5B%5B1943-06-19%5D%5D

Page 33: Towards OpenURL Quality Metrics: Initial Findings

issn_other if ($element =~ /issn/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.issn/) { next; } } $patterns{allsids}{$genre}{"issn"}++; $patterns{$sid}{$genre}{"issn"}++; if ($value =~ /^\d+-\d+$/) { $patterns{$sid}{$genre}

{"issn_number_number"}++; } elsif ($value =~ /^\d+$/) { $patterns{$sid}{$genre}{"issn_number"}++; } elsif ($value =~ /^\d+X$/) { $patterns{$sid}{$genre}{"issn_numberX"}++; } elsif ($value =~ /^\d+-\d+X$/) { $patterns{$sid}{$genre}

{"issn_number_numberX"}++; } else {$patterns{$sid}{$genre}{"issn_other"}++; print "$value\n";} }

Page 34: Towards OpenURL Quality Metrics: Initial Findings

issn_other examples

• 0065-2598%28print%29• 0018-5345+%28ISSN+print%29• ISSN ISBN 0-9525091-5-6.• 0021-8375%28print%29%7C1439-

0361%28electronic%29• 1471-2164+%28ISSN+online%29• 0191-8699%3B0191-8699• 0741-8329 (Print)%3B NLM Unique Journal

Identifier%3A 8502311

Page 35: Towards OpenURL Quality Metrics: Initial Findings

How often?metric frequency in July-Sep 2008 sample

au_last_other 5476spage_other 772date_other 591issn_other 200

Page 36: Towards OpenURL Quality Metrics: Initial Findings
Page 37: Towards OpenURL Quality Metrics: Initial Findings

Demo of OQ UI

Page 38: Towards OpenURL Quality Metrics: Initial Findings

Element report

Page 39: Towards OpenURL Quality Metrics: Initial Findings

Element report

Page 40: Towards OpenURL Quality Metrics: Initial Findings

Pattern report

Page 41: Towards OpenURL Quality Metrics: Initial Findings

Pattern report

Page 42: Towards OpenURL Quality Metrics: Initial Findings

Pattern report

Page 43: Towards OpenURL Quality Metrics: Initial Findings

Next steps• add non-Cornell data, from libraries or link

resolver vendors (model is agnostic to source)• confirm and publicize key elements used by

target syntaxes• outreach to content providers• refine and expand metrics• more reports

– longitudinal by source– compare frequency of an element’s use across sources– compare frequency of an element pattern across

sources

Page 44: Towards OpenURL Quality Metrics: Initial Findings

How to stay in the loop

http://openurlquality.blogspot.com/

Adam ChandlerDatabase Management and Electronic Resources LibrarianLibrary Technical ServicesCornell University Librarytel: 607-255-5760email: [email protected]