Copyright ©2014 Teaching Lab Safety Jennifer Bishoff November 4, 2014.
William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
-
Upload
ashton-villarreal -
Category
Documents
-
view
24 -
download
0
description
Transcript of William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess
The role of custom transaction log analysis in informing the design and
implementation of a locally developed open source metasearch
application
William H. Mischo, Elizabeth M. German, Joshua BishoffMary C. Schlembach, David S. Vess
University of Illinois at Urbana-Champaign
October 4, 2009LITA National Forum
OVERVIEW
• Backgroundo Easy Searcho IMLS / NSDL Grant fundedo Search assistanceo Transaction Log design
• Transaction Log Analysis o Methodologyo Summary
• System changes o Direct links | Author Redo | Expand/Limit
BACKGROUNDIllinois Library Gateway and Transaction Logs
University of Illinois Library Gateway
• Gateway Portal introduced in September 2007o Guide users to appropriate information resourceso Recommender systemo Integration of resourceso Help with search strategy formulation and refinement
• Custom Engineering Library portlet with fielded search approach
• Powered by metasearch system suite (Easy Search)
• Metasearch over 70 targets
Easy Search Features
• Recommender system• Transaction logs: 2.3 million user search
arguments, 2.5 million clickthroughs.• Analysis of search arguments, pattern checking• Result displays influenced by search arguments• AJAX driven display• Links into the native interfaces at the point of
completed search• NISO MXG support
Research Focus
• IMLS and NSDL Grants • Focus on two things:1.Design and develop search assistance techniques2.Better refer users to relevant information resources
Research Questions
• Will users find recommender approach useful?
• Can we characterize user information seeking behaviors?
• Can we capture user information seeking behavior well enough to provide quality search assistance?
• Can useful refinement and navigation services be introduced within the Gateway?
• What will search sessions look like web search sessions or OPAC sessions?
Search Assistance Technologies
• Based on deep transaction log analysis
• Goal is to develop interactive Information Retrieval (IIR): contextual suggestions & links
• Improve search strategy refinement by providing navigational assistance
• Develop dynamic system to suggest relevant information resources to users
Implementation of Search Assistance
Fall semester 2009 first semester with complete set of search assistance features implemented
Search assistant features were suggested by a deep transaction log analysis and has led to a more robust transaction log format that lets us better understand user behavior
This semester we have been monitoring use of search assistance features
Remainder of presentation will report what we have learned.
SEARCH ASSISTANCE Functions and Examples
Search Assistance Functions
• Stopword removal• Spelling suggestions• Direct link prompts for frequently entered terms, pathfinder
topics. Partial term matches• Pattern matching for author search prompts• Suggested limiting to phrase and title word and phrase
searches• Dark target searches in background• Direct links to journal title matches• Pattern matching for link to Journal Article Locator (full text
article finder)• Context sensitive arrangement of results
Example - Author Search Patterns
• Robert A. Smith• Smith, Robert A.• Smith r. a.• Smith RA• Smith, RA• R. Alan Smith• Robert Smith
DLF 2008 Fall Forum
DLF 2008 Fall Forum
DLF 2008 Fall Forum
DLF 2008 Fall Forum
TRANSACTION LOGdesign
Transaction Log
• Example of an entry from a standard web server log:
2009-09-24 18:39:03 128.174.36.99 GET /josh/searchaid3/saresultsug.asp nopval=1&project=native&selection=gen&selection=opac&nopval=1&keyword=abraham+lincoln&Bool=all&interp=yes&OPERATE=Perform+Search 80 - 128.174.36.95 Mozilla/5.0+(Windows;+U;+Windows+NT+6.0;+en-US;+rv:1.9.1.3)+Gecko/20090824+Firefox/3.5.3+(.NET+CLR+3.5.30729) 200 0 0
Problems
• Web Server logs require extensive post-processing to determine the relationships between user actions.
• Web Server logs don't reveal when we refer a user to a vendor database.
• Our Approach:Deposit the log in a relational database that can reveal the full dimensions of
a client interaction with the server, and develop a solution to log client exits from the UIUC gateway to an outside provider. Must also log search suggestions made by the server-side program.
Our Log
• Began as third-party open-source application called Statcountex, written in ASP. (http://2enetworx.com/dev/projects/statcountex.asp)
• Modified (heavily) to interact with the main Easy Search processing routine.
• We also added the session-tracking functionality built into ASP.
Log Events/ Relationships
Example of table searchstats
Tracking User Sessions
• Search result page page checks client browser to see if client has been there before; if not, writes new cookie and logs new sessionid.
• If the client has a current cookie, the server looks up the client's previous search and enters it in the SearchStats table (under column previoussearch. This makes post-processing simpler: all new sessions begin on rows where previoussearch is null; all searches where previoussearch is not null are follow-up searches.
Logging User Actions
• Each search submitted generates a unique SearchStatID and row in the SearchStats table.
• Fields captured: referer, IP, sessionid, previoussearch, date/time, catid, useragent.
• previoussearch comes from cookie; "suggest" field records server provision of any assistive prompts after query processing
Logging User Actions: Clicks• Result page links are dynamic & refer to the primary key of
SearchStats Actual href: http://search.grainger.uiuc.edu/searchaidlog2/sourcelog.asp?ID=243989&acse--http://www.library.uiuc.edu/proxy/go.php?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&bquery=(gaas)&type=1&site=ehost-live
Clicks (continued)
URL to results actually has 3 URLs: • Separate logging file writes SearchStatID, name of resource
clicked,& time information to separate table Clickstream in log database. Clickstream has a foreign key of searchstatid. File redirects user.
• User passes through EZproxy;
• User arrives at results in vendor interface.
TRANSACTION ANALYSIS AND LOG COMPARISONSSearch failures and system improvements
DLF 2008 Fall Forum
User Studies• Markey’s two papers on End-User Studies -
JASIST 2007o 32 studieso Need for new OPAC studieso Library Portal/Gateway studies needed
• Spink and Jansen findings on Web searcheso Short search sessionso Average search: 2.3 words o “Advanced features” not being utilizedo Users typically look at first page of results only
Search Arguments
• 10.4% Booleans (12.3% AND, 0.2% OR, 0.1% NOT)
• 8.1% Commas• 0.1% Parentheses• 4.2% Quotes• 20.5% Prepositions• 9.8% Spell Suggests
(31.5% are clicked)• 0.7% +
• 58.4% Follow-upso 12% are Author
• 2.3% Author redo link (12% clicked)
• 0.9% from phrase/title links
• 3% show Direct suggests (64% clicked)
2008 – 2009 Searches
DLF 2008 Fall Forum
2008-2009 Easy Search -- 3.758 Words per Query
Words Number of Searches %
1 33,054 12.3
2 70,719 26.3
3 56,584 21.1
4 44,251 16.5
5 21,730 8.1
6 12,430 4.6
7 7,598 2.8
8 5,212 1.9
> 8 16,786 6.3
Total 268,454 100
Clickthroughs 2008 v. 2009
Resource 2008 2009
Academic Search Premier 30.2% 19.1%
Voyager Online Catalog 24.9% 32.3%
Scopus 8.5% 7.2%
ISI Web of Knowledge 7.1% 7.8%
Info Trac 6.8% 5.6%
CARLI Statewide Catalog 7.1% 4.1%
Springer E-Books 3.1% 3.1%
E-Resource List 2.7% 3.6%
Google Books 1.7% 2.4%
Amazon 1.1% 1.7%
Google Scholar .9% .9%
Direct Links: 1.8%
Search Assistance Response
• Spelling suggestions offered for 12.6% of searches• Spell suggestion clicked 34% of times offered
• Direct links offered for 3% of searches• Direct links clicked 64% of times offered• Success
• "Dark Targets" offered for 27% of searches• Dark Targets clicked 8%
• Reduce matches by Title or Phrase searching: 20%• Reduce prompt clicked 5% of times offered
Search Assistance Responses
• "Search as Author" prompt offered for 2.3% of searches• Clicked 12% of times offered
• Link to Journal & Article Locator service: 2% of searches• Clicked 2% of times offered• Not very successful
What We Have Learned
• Broad continuum of searches being performed– topical, specific item
• Users expect sophisticated parsing – mental model
• Spell suggestions important• Must accommodate specific item search• Author search and fielded search• Used as Reference tool• Search assistance being utilized
Reacting to Logs with Design
• numbers provide a real-time measure of search enhancement success
• poor performance can influence changes in prompt
language, presentation, triggering event • Example: Known-item Searching
User Behavior - Known ItemsA sample of 3,000 log entries from the single-entry box gateway taken in semester 1, 2007, was analyzed in detail. In this sample, fully 49.4% of the searches were “known-item”, “known-person/organization” or specific item searches as opposed to topical searches. These searches were for specific book, journal, or article titles or a specific author name. Of the 49.4% specific item searches:• 7.4% of the 49.4% were author/title;• 28.9% were author;• 40.5% were book/monographic searches;• 6.8% were index/abstract title;• 5.7% were for specific journal article; and• 11.8% were for specific journal title
Overall, 17.96% of the searches contained a name or an organization, although clearly some of these are topical search
Reacting to Known-Item Searches
Search Assistance introduced: • Search as Title, Author, Phrase: A fielded approach
reduces the number of clicks between a user and a Known Item
• Exact Journal title match, Exact A & I title match &
Direct Link prompts introduced • Assistance methods rooted in observations of logged
user behavior
The "Common Query" database
• Calculated the most frequently searched terms• Noticed entries like "ebsco", "IEEE"• These "directional" searches were apparently
unsuccessful for users • Developed a database of links to UIUC resources, with
user-entered search arguments as its vocabulary • Very successful (users follow direct links 64% of times
offered; excellent feedback)
THE FUTUREwith prototype examples
Future• Guided search module – “Help Getting Started”
o encyclopedias, dissertations, e-books, popular journal articles, etc
• Tailored (vertical) search moduleso NSDL STEM Education Site o Library and Information Science
• Faceted result displays• Return first 10 articles from selected targets to user • Merging of results • Agent approach (software agents that may e.g. return
answers rather than citations) • Portability
o University of Illinois Springfield