William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

50
The role of custom transaction log analysis in informing the design and implementation of a locally developed open source metasearch application William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess University of Illinois at Urbana-Champaign October 4, 2009 LITA National Forum

description

The role of custom transaction log analysis in informing the design and implementation of a locally developed open source metasearch application. William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess University of Illinois at Urbana-Champaign October 4, 2009 - PowerPoint PPT Presentation

Transcript of William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Page 1: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

The role of custom transaction log analysis in informing the design and

implementation of a locally developed open source metasearch

application

William H. Mischo, Elizabeth M. German, Joshua BishoffMary C. Schlembach, David S. Vess

University of Illinois at Urbana-Champaign

October 4, 2009LITA National Forum

Page 2: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

OVERVIEW

• Backgroundo Easy Searcho IMLS / NSDL Grant fundedo Search assistanceo Transaction Log design

•  Transaction Log Analysis o Methodologyo Summary

• System changes o Direct links | Author Redo | Expand/Limit

Page 3: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

BACKGROUNDIllinois Library Gateway and Transaction Logs

Page 4: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

University of Illinois Library Gateway

• Gateway Portal introduced in September 2007o Guide users to appropriate information resourceso Recommender systemo Integration of resourceso Help with search strategy formulation and refinement

• Custom Engineering Library portlet with fielded search approach

• Powered by metasearch system suite (Easy Search)

• Metasearch over 70 targets

Page 5: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Easy Search Features

• Recommender system• Transaction logs: 2.3 million user search

arguments, 2.5 million clickthroughs.• Analysis of search arguments, pattern checking• Result displays influenced by search arguments• AJAX driven display• Links into the native interfaces at the point of

completed search• NISO MXG support

Page 6: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 7: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Research Focus

• IMLS and NSDL Grants  • Focus on two things:1.Design and develop search assistance techniques2.Better refer users to relevant information resources

Page 8: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Research Questions

• Will users find recommender approach useful?

• Can we characterize user information seeking behaviors?

• Can we capture user information seeking behavior well enough to provide quality search assistance?

• Can useful refinement and navigation services be introduced within the Gateway?

• What will search sessions look like web search sessions or OPAC sessions?

Page 9: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Search Assistance Technologies

• Based on deep transaction log analysis

• Goal is to develop interactive Information Retrieval (IIR): contextual suggestions & links

• Improve search strategy refinement by providing navigational assistance

• Develop dynamic system to suggest relevant information resources to users

Page 10: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Implementation of Search Assistance

Fall semester 2009 first semester with complete set of search assistance features implemented

Search assistant features were suggested by a deep transaction log analysis and has led to a more robust transaction log format that lets us better understand user behavior

This semester we have been monitoring use of search assistance features

Remainder of presentation will report what we have learned.

Page 11: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

SEARCH ASSISTANCE Functions and Examples

Page 12: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Search Assistance Functions

• Stopword removal• Spelling suggestions• Direct link prompts for frequently entered terms, pathfinder

topics. Partial term matches• Pattern matching for author search prompts• Suggested limiting to phrase and title word and phrase

searches• Dark target searches in background• Direct links to journal title matches• Pattern matching for link to Journal Article Locator (full text

article finder)• Context sensitive arrangement of results

Page 13: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Example - Author Search Patterns

• Robert A. Smith• Smith, Robert A.• Smith r. a.• Smith RA• Smith, RA• R. Alan Smith• Robert Smith

Page 14: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

Page 15: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

Page 16: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 17: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

Page 18: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

Page 19: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 20: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 21: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 22: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

TRANSACTION LOGdesign

Page 23: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Transaction Log

• Example of an entry from a standard web server log:

 2009-09-24 18:39:03 128.174.36.99 GET /josh/searchaid3/saresultsug.asp nopval=1&project=native&selection=gen&selection=opac&nopval=1&keyword=abraham+lincoln&Bool=all&interp=yes&OPERATE=Perform+Search 80 - 128.174.36.95 Mozilla/5.0+(Windows;+U;+Windows+NT+6.0;+en-US;+rv:1.9.1.3)+Gecko/20090824+Firefox/3.5.3+(.NET+CLR+3.5.30729) 200 0 0 

Page 24: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Problems

• Web Server logs require extensive post-processing to determine the relationships between user actions.

• Web Server logs don't reveal when we refer a user to a vendor database.

 

• Our Approach:Deposit the log in a relational database that can reveal the full dimensions of

a client interaction with the server, and develop a solution to log client exits from the UIUC gateway to an outside provider.  Must also log search suggestions made by the server-side program.

Page 25: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Our Log

• Began as third-party open-source application called Statcountex, written in ASP.  (http://2enetworx.com/dev/projects/statcountex.asp)  

 

• Modified (heavily) to interact with the main Easy Search processing routine.

 

• We also added the session-tracking functionality built into ASP.

Page 26: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Log Events/ Relationships

 

Page 27: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Example of table searchstats

 

Page 28: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Tracking User Sessions

• Search result page page checks client browser to see if client has been there before; if not, writes new cookie and logs new sessionid.

• If the client has a current cookie, the server looks up the client's previous search and enters it in the SearchStats table (under column previoussearch.  This makes post-processing simpler: all new sessions begin on rows where previoussearch is null; all searches where previoussearch is not null are follow-up searches.

Page 29: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Logging User Actions

• Each search submitted generates a unique SearchStatID and  row in the SearchStats table.

• Fields captured: referer, IP, sessionid, previoussearch, date/time, catid, useragent.

• previoussearch comes from cookie; "suggest" field records server provision of any assistive prompts after query processing

   

Page 30: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Logging User Actions: Clicks• Result page links are dynamic & refer to the primary key of

SearchStats        Actual href: http://search.grainger.uiuc.edu/searchaidlog2/sourcelog.asp?ID=243989&acse--http://www.library.uiuc.edu/proxy/go.php?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&bquery=(gaas)&type=1&site=ehost-live

Page 31: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Clicks (continued)

URL to results actually has 3 URLs: • Separate logging file writes SearchStatID, name of resource

clicked,& time information to separate table Clickstream in log database.  Clickstream has a foreign key of searchstatid.  File redirects user.

• User passes through EZproxy;

• User arrives at results in vendor interface.

Page 32: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

TRANSACTION ANALYSIS AND LOG COMPARISONSSearch failures and system improvements        

Page 33: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

User Studies• Markey’s two papers on End-User Studies -

JASIST 2007o 32 studieso Need for new OPAC studieso Library Portal/Gateway studies needed

• Spink and Jansen findings on Web searcheso Short search sessionso Average search: 2.3 words o “Advanced features” not being utilizedo Users typically look at first page of results only

Page 34: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

 

Search Arguments

• 10.4% Booleans (12.3% AND, 0.2% OR, 0.1% NOT)

• 8.1% Commas• 0.1% Parentheses• 4.2% Quotes• 20.5% Prepositions• 9.8% Spell Suggests

(31.5% are clicked)• 0.7% +

• 58.4% Follow-upso 12% are Author

• 2.3% Author redo link (12% clicked)

• 0.9% from phrase/title links

• 3% show Direct suggests (64% clicked)

2008 – 2009 Searches

Page 35: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

DLF 2008 Fall Forum

2008-2009 Easy Search -- 3.758 Words per Query

Words Number of Searches %

1 33,054 12.3

2 70,719 26.3

3 56,584 21.1

4 44,251 16.5

5 21,730 8.1

6 12,430 4.6

7 7,598 2.8

8 5,212 1.9

> 8 16,786 6.3

Total 268,454 100

Page 36: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Clickthroughs 2008 v. 2009

 

Resource 2008 2009

Academic Search Premier 30.2% 19.1%

Voyager Online Catalog 24.9% 32.3%

Scopus 8.5% 7.2%

ISI Web of Knowledge 7.1% 7.8%

Info Trac 6.8% 5.6%

CARLI Statewide Catalog 7.1% 4.1%

Springer E-Books 3.1% 3.1%

E-Resource List 2.7% 3.6%

Google Books 1.7% 2.4%

Amazon 1.1% 1.7%

Google Scholar .9% .9%

Direct Links: 1.8%

Page 37: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Search Assistance Response

• Spelling suggestions offered for 12.6% of searches• Spell suggestion clicked 34% of times offered

 • Direct links offered for 3% of searches• Direct links clicked 64% of times offered• Success

  • "Dark Targets" offered for 27% of searches• Dark Targets clicked 8%

 • Reduce matches by Title or Phrase searching: 20%• Reduce prompt clicked 5% of times offered

  

Page 38: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Search Assistance Responses

• "Search as Author" prompt offered for 2.3% of searches• Clicked 12% of times offered

 • Link to Journal & Article Locator service: 2% of searches• Clicked 2% of times offered• Not very successful 

   

Page 39: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

What We Have Learned

• Broad continuum of searches being performed– topical, specific item

• Users expect sophisticated parsing – mental model

• Spell suggestions important• Must accommodate specific item search• Author search and fielded search• Used as Reference tool• Search assistance being utilized

Page 40: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Reacting to Logs with Design

• numbers provide a real-time measure of search enhancement success

 • poor performance can influence changes in prompt

language, presentation, triggering event    • Example: Known-item Searching

Page 41: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

User Behavior - Known ItemsA sample of 3,000 log entries from the single-entry box gateway taken in semester 1, 2007, was analyzed in detail. In this sample, fully 49.4% of the searches were “known-item”, “known-person/organization” or specific item searches as opposed to topical searches. These searches were for specific book, journal, or article titles or a specific author name. Of the 49.4% specific item searches:• 7.4% of the 49.4% were author/title;• 28.9% were author;• 40.5% were book/monographic searches;• 6.8% were index/abstract title;• 5.7% were for specific journal article; and• 11.8% were for specific journal title

 Overall, 17.96% of the searches contained a name or an organization, although clearly some of these are topical search

Page 42: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Reacting to Known-Item Searches

Search Assistance introduced: • Search as Title, Author, Phrase: A fielded approach

reduces the number of clicks between a user and a Known Item

 • Exact Journal title match, Exact A & I title match &

Direct Link prompts introduced • Assistance methods rooted in observations of logged

user behavior

Page 43: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

The "Common Query" database

• Calculated the most frequently searched terms• Noticed entries like "ebsco", "IEEE"• These "directional" searches were apparently

unsuccessful for users •  Developed a database of links to UIUC resources, with

user-entered search arguments as its vocabulary • Very successful (users follow direct links 64% of times

offered; excellent feedback)

Page 44: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

THE FUTUREwith prototype examples

Page 45: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

Future• Guided search module – “Help Getting Started”

o encyclopedias, dissertations, e-books, popular journal articles, etc

• Tailored (vertical) search moduleso NSDL STEM Education Site o Library and Information Science 

• Faceted result displays• Return first 10 articles from selected targets to user  • Merging of results  • Agent approach (software agents that may e.g. return

answers rather than citations) • Portability

o University of Illinois Springfield

Page 46: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 47: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 48: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 49: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
Page 50: William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess