Reading Cyber Tracks: Analyzing Log Files and Search Logs Darlene Fichter Data Library Coordinator,...

download Reading Cyber Tracks: Analyzing Log Files and Search Logs Darlene Fichter Data Library Coordinator, U of S Library January 29, 2004

of 86

  • date post

    17-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Embed Size (px)

Transcript of Reading Cyber Tracks: Analyzing Log Files and Search Logs Darlene Fichter Data Library Coordinator,...

  • Slide 1
  • Reading Cyber Tracks: Analyzing Log Files and Search Logs Darlene Fichter Data Library Coordinator, U of S Library January 29, 2004
  • Slide 2
  • Overview Log Files How can log files? Getting up close and personal with log files 7 things my server logs told me Error logs Search Logs Content synopsis Site search performance Intranet usability Best bets
  • Slide 3
  • Why super heroes read log files Macro picture Rich source of information about User behaviors Link choices Typical paths through a web site Point out trouble spots Help inform redesigns
  • Slide 4
  • Detractors The least useful type of data for understanding users Doesnt measure outcomes We dont know the intent of the visitor Hits are meaningless True Not true, if youre estimating service capacity and performance Imprecise and incomplete
  • Slide 5
  • Server logs can tell you Who is using your site? Who never uses your site? Where do they enter? What route do they follow? What do they use? How long do they stay?
  • Slide 6
  • Big picture Average Number of Visits per Day on Weekdays6254 Average Number of Hits per Day on Weekdays110437 Average Number of Visits per Weekend8157 Average Number of Hits per Weekend113500 Most Active Day of the Week Mon Least Active Day of the Week Sat Most Active Date December 04 2003 Number of Hits on Most Active Date173527 Least Active Date December 25 2003 Number of Hits on Least Active Date38078 Most Active Hour of the Day 14:00-14:59 Least Active Hour of the Day 01:00-01:59
  • Slide 7
  • Page duration How long do most people spend on a page Inordinately long time could mean Very confusing Very worthwhile Went for coffee? Skip averages and look for the mode or median
  • Slide 8
  • Exit pages The point where someone leaves your site, may offer some interesting clues Related links fine Find an article page listing databases A caveat to keep in mind Use of the back button may not show up when pages are loaded from the browser cache
  • Slide 9
  • Forms What is the completion rate for forms? How many people abandon the ILL loan process part way through?
  • Slide 10
  • Forms Does your system for marking required fields work are people presented with error upon error on submission? Are employees entering in bogus responses in form fields to circumvent bad design?
  • Slide 11
  • What can you measure? Depends on what is recorded in the log file Web server access log files ASCII file that records each request Two common web server log files types Common Combined More data
  • Slide 12
  • Example: Apache combined log format
  • Slide 13
  • Who and when? IP address or hostname Identity or Login (seldom used) Username recorded Date, Time
  • Slide 14
  • What did they ask for? Did it work? MethodPath Protocol (http) and version Status Code
  • Slide 15
  • Status codes In general 200 codes are successful requests by a client 300s report server redirects 400s are used for client errors 500s are used for server errors 404
  • Slide 16
  • Page immediately before this request Bytes transferred Referring site
  • Slide 17
  • User Agent: Browser and OS Browser Mozilla OS Windows NT
  • Slide 18
  • Log analysis software Produce summary tables, charts and graphs Popular ones are: WebTrends (commercial, Windows, Unix) Analog (free, Unix, Windows) Wusage (free, Unix, Windows) Many more Yahoo Log Analysis Tools > Titles
  • Slide 19
  • Sample: Top domains chart
  • Slide 20
  • Sample: Summary top files requested Meaningful filenames rather than id=1232 help make this report understandable
  • Slide 21
  • What your logs can tell you, if you listen Specific areas where logs are useful Specific examples
  • Slide 22
  • How visible are your links and menus? Are you tuning your site? Is the new button or label working? Is anyone clicking on the special announcement information? Run a special report and see what links are used the most on your home page
  • Slide 23
  • Redesign of E-Journal page Subject browse was #2.
  • Slide 24
  • Redesign of U of S home page Help was removed. Homepage Clickthrus: http://www.usask.ca/analog/homepage/
  • Slide 25
  • Redesign of U of S home page 1.Departments 2.Search 3.PAWS 4.Students 5.Admissions
  • Slide 26
  • Redesign of Health Sciences Library page Home page clickthrus used to set priority Order.
  • Slide 27
  • Before and after Does the new top menu work? Click Tracks [www.clicktracks.com] - displaying all the links on the page and % of visitors that click on it.www.clicktracks.com
  • Slide 28
  • Digging for evidence Are people able to get from here to there? Specific example Evaluating a site wide menu Trying to make the case that generic terms rather than brand names were more effective Team response was polite nods
  • Slide 29
  • Looked up how many people actually selected this area from the home page based on the brand name label rather then generic term. Possible because the links had different syntax Tip: Add tracking code to the end of a link http://library.usask.ca/data?top Log file: - [27/Jan/2004:03:08:11 -0600] "GET /data?top Log files to the rescue
  • Slide 30
  • We discovered A quick glance at the log file revealed in the prior two days 200 accesses resulted from the brand name label 1000 accesses for the generic term in a less prominent location
  • Slide 31
  • Where do you post announcements? Need to get everyones attention Branch closure Pay fines now in order to convocate Not every one enters your site at the home page Find the entry pages
  • Slide 32
  • Top entry pages
  • Slide 33
  • Whats hot and whats not? What areas or pages are popular? How is it changing over time? Popular may good Custom 404 pages are often #1 on a site with link rot High use may mean people are lost, if your site doesnt have a followed link colour
  • Slide 34
  • Link rot? http://www.bio.cornell.edu/stats/01/07/default_01_b.htm
  • Slide 35
  • Top directories
  • Slide 36
  • Popularity questions Whats popular but shouldnt be? Overdependence on site search may signal site navigation weaknesses What should be popular but isnt? If you expect high usage and its not happening, recheck links, labels and position. Is the link to underutilized area prominent? Is it plain language or jargon?
  • Slide 37
  • Does anyone care? Are we posting new announcements and no one reads them (ever)? Are the only hits from search engines spidering the site? What should we add more of?
  • Slide 38
  • Is a feature used? After a debate, quick links and audience menus were added to the site.
  • Slide 39
  • Quick links very popular #3 and #5
  • Slide 40
  • Audience menus Over time the student option on the audience menus has increased
  • Slide 41
  • Getting down to the details When can you move to CSS layouts? When can you downgrade support for Netscape 4.78?
  • Slide 42
  • What web browsers do you need to support?
  • Slide 43
  • Cross platform testing Table
  • Slide 44
  • Retrace someones footsteps What page referred them to the library site? No referrer? Bookmark, typed in URL (or a robot) What path did they follow? Sometimes even what link they clicked What data they may have typed in a search box? Where did they leave?
  • Slide 45
  • Log analysis tools top paths http://www.bio.cornell.edu/stats/01/07/default_01_b.htm A sad tale
  • Slide 46
  • Paths An even sadder tale Or a programmer doing debugging?
  • Slide 47
  • Follow the top paths Pay attention where they stopped and restarted No direct links from one area to another, may indicate they used their back button
  • Slide 48
  • Error logs Usually well used by development teams Only touch on a few points
  • Slide 49
  • Error log captures Date Error level Client IP address or hostname Error message or path to requested file [Wed Jan 28 00:15:26 2004] [error] [client 24.69.255.237] File does not exist: /data/www/northwest/images/spacer.gif, referer: http://library.usask.ca/northwest/contents.html [Wed Jan 28 00:16:30 2004] [error] [client 66.77.73.89] File does not exist: /data/www/education/chldawrd.html
  • Slide 50
  • Also log Some types of authentication failures Authentication problems may indicate a need to add: Directions usernames are case sensitive Implement a password reminder feature
  • Slide 51
  • Redesign or launch of new service Watch you log files in real time or every few seconds tail f /usr/local/apache/logs/error_log tail -f Path to error_log file For example on a UNIX server, use this command:
  • Slide 52