Topic 11 Topic 11 第十一讲第十一讲 ::Web Site AnalysisWeb Site Analysis网站分析网站分析
Marshall BreedingMarshall BreedingDirector for Innovative Technologies and Director for Innovative Technologies and ResearchResearchVanderbilt UniversityVanderbilt Universityhttp://staffweb.library.vanderbilt.edu/breedinghttp://staffweb.library.vanderbilt.edu/breeding
Redefining Libraries:Web 2.0 and other Challenges
May 2007 Xiamen, China
Theme Theme 主题主题 For many libraries, the number of visitors of their For many libraries, the number of visitors of their
Web site and electronic resources exceeds the Web site and electronic resources exceeds the numbers that visit their physical premises. It's vital numbers that visit their physical premises. It's vital for libraries to understand how these remote visitors for libraries to understand how these remote visitors approach the Web site, not only to measure use but approach the Web site, not only to measure use but to improve the resources themselves. Marshall to improve the resources themselves. Marshall Breeding will present a number of practical Breeding will present a number of practical techniques that libraries can use to better techniques that libraries can use to better understand the use of their Web-based resources.understand the use of their Web-based resources.许多图书馆的网站和电子资源的访客远多于他们馆舍的访客。许多图书馆的网站和电子资源的访客远多于他们馆舍的访客。 明白这些远程访客如何上网对图书馆非常重要,这不但是为 明白这些远程访客如何上网对图书馆非常重要,这不但是为了计算用量,而且是为了改善资源。 了计算用量,而且是为了改善资源。 Marshall BreedingMarshall Breeding将介绍一些图书馆可用作进一步了解他们网上资源运用的实将介绍一些图书馆可用作进一步了解他们网上资源运用的实际技术。际技术。
Theme Theme 主题主题
Topics will include the basics of analyzing Topics will include the basics of analyzing the server logs of the library's Web site, the server logs of the library's Web site, transaction logs from the OPAC, the transaction logs from the OPAC, the complexities of measuring use of complexities of measuring use of subscription-based electronic resources, subscription-based electronic resources, and techniques for enhancing applications and techniques for enhancing applications to better record how they are used.to better record how they are used.主题包括图书馆网站服务器日志和在线公众查询主题包括图书馆网站服务器日志和在线公众查询目录事务日志的分析基础,量度订购电子资源的目录事务日志的分析基础,量度订购电子资源的运用的复杂性,及建立更完善上网记录的技术运用的复杂性,及建立更完善上网记录的技术。。
Understanding remote Understanding remote users users 了解远程用户了解远程用户 Vital to providing relevant library Vital to providing relevant library
services services 对提供相关的图书馆服务是重对提供相关的图书馆服务是重要的要的
More libraries may use library More libraries may use library resources remotely through the Web resources remotely through the Web than from physical library facilitiesthan from physical library facilities
更多的图书馆可能透过网络遥距运用图书更多的图书馆可能透过网络遥距运用图书馆资源多于实际的图书馆设施馆资源多于实际的图书馆设施
Understanding remote Understanding remote users users 了解远程用户了解远程用户 Must work harder to ensure that Must work harder to ensure that
Web-based services meet patron Web-based services meet patron needsneeds必需更努力地工作以确保网上服务能满足顾必需更努力地工作以确保网上服务能满足顾客需要客需要
Move beyond hit counters and raw Move beyond hit counters and raw statistics to more sophisticated statistics to more sophisticated analysis and assessmentanalysis and assessment超越浏览人数计算器和原始统计,迈向更高超越浏览人数计算器和原始统计,迈向更高层次的分析与评估层次的分析与评估
Analysis goals Analysis goals 分析目标分析目标 Improve usability Improve usability 增加可用性增加可用性 Web site diagnostics Web site diagnostics 网站诊断网站诊断 Understand user needs Understand user needs 了解用户需要了解用户需要 Content selection decisions Content selection decisions 选择内容的决选择内容的决
定定 Improve quality of service Improve quality of service 提升服务质素提升服务质素 Marketing Marketing 推广推广 Budget justification Budget justification 预算的理由预算的理由 Strategy to increase interest and Strategy to increase interest and
activity activity 增加兴趣和活动的策略增加兴趣和活动的策略
Data sources for tracking Data sources for tracking remote use remote use 追踪遥距使用的数追踪遥距使用的数据来源据来源
Web server logsWeb server logs 网站服务器日志网站服务器日志 Application logs Application logs 应用日志应用日志 Remote tracking data (Google Remote tracking data (Google
Analytics) Analytics) 遥距追纵数据 遥距追纵数据 (Google(Google 网站分网站分析系统析系统 ))
Vendor provided use statistics (e-Vendor provided use statistics (e-resources) resources) 供应商提供的用量统计 供应商提供的用量统计 (( 电子电子资源资源 ))
Enterprise approach to Enterprise approach to analytics analytics 用企业方法作分析用企业方法作分析
Multiplicity of Resources to track Multiplicity of Resources to track 多种资源跟多种资源跟踪踪– Web Servers Web Servers 网站服务器网站服务器– OPACS OPACS 在线公众查询目录在线公众查询目录– E-Resources E-Resources 电子资源电子资源– Databases Databases 数据库数据库– RepositoriesRepositories 典藏典藏
Important to track the flow of use among all Important to track the flow of use among all the library’s Web-based resources the library’s Web-based resources 跟踪所有跟踪所有图书馆的网络资源之中的使用流程是重要的图书馆的网络资源之中的使用流程是重要的
Enterprise approach to Enterprise approach to analytics analytics 用企业方法作分析用企业方法作分析
Beyond the library: study flow to and Beyond the library: study flow to and from higher-level Web sites and portals from higher-level Web sites and portals (University -> Courseware -> Library) (University -> Courseware -> Library) 图图书馆以外:研究高水平网站和网络出入口的来去书馆以外:研究高水平网站和网络出入口的来去流程 流程 (( 大学 大学 -> -> 课程套件 课程套件 -> -> 图书馆图书馆 ))
Web server logsWeb server logs网站服务器日志网站服务器日志
Web servers are routinely configured to record detailed Web servers are routinely configured to record detailed information about each request. Common elements information about each request. Common elements includeinclude 网站服务器是日常配置来记录关于每一请求的详细资网站服务器是日常配置来记录关于每一请求的详细资料料 : : – File requested File requested 需要的档案 需要的档案 – Date / time stamp Date / time stamp 日期 日期 / / 时间印时间印– Status code Status code 状态代码状态代码– Request directive (get, post, head) Request directive (get, post, head) 需求指令需求指令– Referrer (where the user came from) Referrer (where the user came from) 来源来源 (( 用户来自何用户来自何
处处 ))– User agent (browser and platform data) User agent (browser and platform data) 用户代理 用户代理 (( 浏浏
览器与平台数据览器与平台数据 ))
Example Web log Example Web log 网站服网站服务器日志务器日志例子例子 Raw data for analysis process Raw data for analysis process 分析过程的原始数据分析过程的原始数据
2006-06-20 05:01:43 129.59.150.105 2006-06-20 05:01:43 129.59.150.105 GET /index.pl - 80 - c-69-250-131-GET /index.pl - 80 - c-69-250-131-199.hsd1.md.comcast.net Mozilla/4.0+199.hsd1.md.comcast.net Mozilla/4.0+(compatible;(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)+1.1.4322) http://www.google.com/search?http://www.google.com/search?hl=en&lr=&safe=off&q=september+11+televihl=en&lr=&safe=off&q=september+11+television+archivesion+archive 200 0 0 11752 200 0 0 11752
Exploiting referral Exploiting referral datadata发掘来源资料发掘来源资料 The query string component of the referrer can The query string component of the referrer can
be parsed to reveal search terms and other be parsed to reveal search terms and other interesting informationinteresting information
可以分析来源查询字串成分以揭示搜寻术语和其它有趣可以分析来源查询字串成分以揭示搜寻术语和其它有趣的资料的资料
http://www.google.com/search?http://www.google.com/search?hl=en&lr=&safe=off&q=september+11+televishl=en&lr=&safe=off&q=september+11+television+archiveion+archive– User typed “september 11 television archive” User typed “september 11 television archive”
in Google to find our site in Google to find our site 用户在 用户在 Google Google 输入输入““ september 11 television archive” september 11 television archive” 找寻网址找寻网址
Exploiting referral Exploiting referral datadata发掘来源资料发掘来源资料 Important to study how users get to your siteImportant to study how users get to your site
研究用户怎样上你的网址是重要的 研究用户怎样上你的网址是重要的 [example: TV News Public Web queries vs [example: TV News Public Web queries vs
OpenWeb)OpenWeb)
(( 例子:电子新闻公开网查询相对于开放网络例子:电子新闻公开网查询相对于开放网络 ))
Analysis methodologyAnalysis methodology分析方法分析方法 Go beyond simply counting pages Go beyond simply counting pages 不要只限于不要只限于
数页数数页数 Identify Sessions Identify Sessions 识别不同部份识别不同部份 Categorize users Categorize users 用户分类用户分类 Determine use patterns Determine use patterns 推定用户模式推定用户模式 Measure interest Measure interest 量度利率量度利率
– Time spent on Web site Time spent on Web site 用于网站的时间用于网站的时间– Bounce rate Bounce rate 回弹率回弹率– Page overlay analysis Page overlay analysis 页面分析页面分析
Move from Move from measurement to measurement to impact impact 从量度移到影响从量度移到影响 Establish site goals Establish site goals 建立网站目标建立网站目标 Benchmark current use Benchmark current use 评核现有的使用评核现有的使用 Implement goal oriented improvements Implement goal oriented improvements
实施以目标为主的改进实施以目标为主的改进 Measure impact Measure impact 量度影响量度影响 Repeat as needed Repeat as needed 需要时重复步骤需要时重复步骤 (Example: enhancement of TV News (Example: enhancement of TV News
OpenWeb) OpenWeb) (( 例子:改进电视新闻开放网络例子:改进电视新闻开放网络 ))
Appropriate data Appropriate data filteringfiltering适当的数据过滤适当的数据过滤 Requests from indexing bots (crawlers) can Requests from indexing bots (crawlers) can
skew statistics skew statistics 搜索器的请求会曲解统计搜索器的请求会曲解统计 Count user requests and bot requests Count user requests and bot requests
separately separately
分开计算用户请求和搜索器请求分开计算用户请求和搜索器请求 Performance monitors Performance monitors 追踪表现追踪表现 Link checkers Link checkers 链接检查器链接检查器 Monitoring crawler activity is an important Monitoring crawler activity is an important
component of SEO and Web site component of SEO and Web site discoverability strategies. discoverability strategies. 监视搜索器的活动是监视搜索器的活动是搜索引擎最佳化和发现网站的策略的一个重要部分搜索引擎最佳化和发现网站的策略的一个重要部分
Resource DiscoveryResource Discovery发现资源发现资源 How do users get to your site? How do users get to your site? 用户如用户如
何上你的网站?何上你的网站? Track performance of the Web site Track performance of the Web site
relative to major search engines relative to major search engines 追踪追踪与主要搜索引擎有关的网站的表现与主要搜索引擎有关的网站的表现
SEO – Search engine optimization SEO – Search engine optimization 搜索搜索引擎最佳化引擎最佳化
Few users begin with library Web sites Few users begin with library Web sites 很少用户一开始便搜查图书馆网站很少用户一开始便搜查图书馆网站
TV News OpenWeb TV News OpenWeb projectproject电视新闻开放网络项目电视新闻开放网络项目 Dramatic increase in Web site activity and Dramatic increase in Web site activity and
loan requests through systematic and loan requests through systematic and controlled exposure of metadata to Google controlled exposure of metadata to Google and other search enginesand other search engines 透过有系统和有控透过有系统和有控制地将資料数据展示在制地将資料数据展示在 GoogleGoogle 和其它搜索引擎上和其它搜索引擎上的网站活动和借用请求戏剧性地增加的网站活动和借用请求戏剧性地增加
SEO (Search Engine Optimization) strategy SEO (Search Engine Optimization) strategy 搜索引擎最佳化策略搜索引擎最佳化策略
Helped the Archive become financially self-Helped the Archive become financially self-sufficient. sufficient. 令档案管理在财政上自给自足令档案管理在财政上自给自足
Examples of Web Examples of Web reporting and reporting and analysis tools analysis tools 网络报告网络报告和分析工具的例子和分析工具的例子
Selected utilities Selected utilities 选择工选择工具具 Analog – free, open source Analog – free, open source 免费,开放资源免费,开放资源 NetTracker – enterprise level Web analysis NetTracker – enterprise level Web analysis
application application 企业水平的网络分析应用 企业水平的网络分析应用 Google utilities Google utilities Google Google 工具工具
– Sitemap – process for submitting Web Sitemap – process for submitting Web pages for optimized indexing by Google pages for optimized indexing by Google with some assessment capabilities with some assessment capabilities 网站地网站地图 图 ─ ─ 提交网页至带有评估性能的提交网页至带有评估性能的 GoogleGoogle优化索优化索引的步骤引的步骤
– Analytics – Sophisticated approach for Analytics – Sophisticated approach for measuring Web site performance measuring Web site performance 分析学 分析学 ─ ─ 量度网站表现的成熟方法量度网站表现的成熟方法
AnalogAnalog
Free Open Source application Free Open Source application 自由开放资源应用自由开放资源应用
Basic Web statistics application Basic Web statistics application 基本网络统计应用基本网络统计应用
Includes fairly full set of static metrics Includes fairly full set of static metrics 包括整套静态分析法包括整套静态分析法
Command line utilityCommand line utility – generates Web – generates Web reportreport 命令列工具 命令列工具 - - 建立网络报告建立网络报告
Windows, Unix, Linux, etc.Windows, Unix, Linux, etc.
NetTrackerNetTracker
Unica CorporationUnica Corporation Enterprise level Web analyticsEnterprise level Web analytics
企业水平的网络分析企业水平的网络分析 http://www.sane.com/http://www.sane.com/
NetTracker Executive NetTracker Executive DashboardDashboard
NetTracker Bandwidth NetTracker Bandwidth TrendsTrends
NetTracker ContentNetTracker Content
NetTracker Keyword NetTracker Keyword SummarySummary
NetTracker ReferrersNetTracker Referrers
NetTracker Pages NetTracker Pages ViewedViewed
Google SiteMaps Google SiteMaps 网站地图 网站地图 XML specification for systematically submitting XML specification for systematically submitting
URLs that represent a Web site URLs that represent a Web site 有系统地提交代有系统地提交代表一个网站的表一个网站的 URLsURLs 的的 XMLXML规格规格
Makes indexing more efficient but does not Makes indexing more efficient but does not affect PageRank affect PageRank 令索引更有效率但不影响网页排名令索引更有效率但不影响网页排名
SiteMap interface provides utilities for SiteMap interface provides utilities for monitoring how the site has been indexed with monitoring how the site has been indexed with some analytical information on terms used to some analytical information on terms used to find your Web site.find your Web site.
网站地图接口提供工具以监察网站如何根据一些用作搜网站地图接口提供工具以监察网站如何根据一些用作搜寻你的网站的术语的分析数据而被编入索引寻你的网站的术语的分析数据而被编入索引
Google SiteMaps Top Google SiteMaps Top SearchesSearches
Google SiteMaps Page Google SiteMaps Page AnalysisAnalysis
Google Analytics Google Analytics Google Google 分析法分析法 Available at no cost from Google Available at no cost from Google 无需成本无需成本 Must receive invitation code Must receive invitation code 必须接收邀请码必须接收邀请码 Slanted toward e-commerce Slanted toward e-commerce 倾向电子商业倾向电子商业 ““ Conversion University” – training on how to Conversion University” – training on how to
optimize Web site for high conversion rates. optimize Web site for high conversion rates. ““顾客转化率大学” 顾客转化率大学” – – 培训如何优化网站以提高转化培训如何优化网站以提高转化率率
Allows Webmasters to establish site goals Allows Webmasters to establish site goals and measure performance and measure performance
容许网站管理员建立网站目标和量度表现容许网站管理员建立网站目标和量度表现
Google Analytics mainGoogle Analytics main
Google Analytics Google Analytics overviewoverview
Google Analytics Google Analytics Browser VersionsBrowser Versions
Google Analytics Top Google Analytics Top ContentContent
Google Analytics Google Analytics Entrance-Bounce RatesEntrance-Bounce Rates
Google Analytics Google Analytics Navigational AnalysisNavigational Analysis
Google Analytics Goal Google Analytics Goal trackingtracking
Application-level Application-level reporting and analysis reporting and analysis 应应用层报告和分析用层报告和分析
Content management systems and other dynamically Content management systems and other dynamically driven Web environments can provide additional driven Web environments can provide additional usage information.usage information.
内容管理系统和其它动态驱动的网络环境可提供额外用途信息内容管理系统和其它动态驱动的网络环境可提供额外用途信息 Can offer additional information beyond raw Web logsCan offer additional information beyond raw Web logs
可提供原始网络日志以外的附加信息可提供原始网络日志以外的附加信息 More capabilities for identifying use based on user More capabilities for identifying use based on user
categories categories 更多以用户种类为基础识别用途的性能更多以用户种类为基础识别用途的性能 Reporting can be built into the business logic of the Reporting can be built into the business logic of the
application application 可在应用服务器的业务逻辑内设立报告可在应用服务器的业务逻辑内设立报告
Examples from the TV Examples from the TV News Web Site News Web Site 电视新闻网电视新闻网站的例子站的例子 Reports of use by user category and Reports of use by user category and
institutioninstitution
以用户种頪和机构编排的使用报告以用户种頪和机构编排的使用报告 Statistics on resource useStatistics on resource use
资源使用的统计资源使用的统计 Data on search types, query terms, etc.Data on search types, query terms, etc.
搜寻形式、查询术语等的数据搜寻形式、查询术语等的数据 Ability to track all aspects of business activityAbility to track all aspects of business activity
全方位追踪业务活动的能力全方位追踪业务活动的能力
Other sources of Use Other sources of Use datadata其它使用数据的来源其它使用数据的来源 ILS OPAC LogsILS OPAC Logs
ILSILS 在线公用目录日志在线公用目录日志 Proxy Server logs and reports Proxy Server logs and reports
代理服务器日志和报告代理服务器日志和报告 Link resolver logs and reports Link resolver logs and reports
链接解析器日志和报告链接解析器日志和报告
LimitationsLimitations 限制限制
Can’t know the intent of the user Can’t know the intent of the user 不知道使用者的不知道使用者的目目標標
User success can only be estimated User success can only be estimated 使用者成功只使用者成功只能估计能估计
Difficult to obtain trends by user type Difficult to obtain trends by user type 难以难以得得知用知用户种类的趋式户种类的趋式
More aggressive reporting might intrude on More aggressive reporting might intrude on privacy privacy 更多报告可能涉及私隐范围更多报告可能涉及私隐范围
Few libraries require the level of user Few libraries require the level of user authentication needed to determine use by type authentication needed to determine use by type of patron of patron 很少图书馆要求使用可藉客户种类推很少图书馆要求使用可藉客户种类推断断用途的用途的用户认证用户认证
Additional InformationAdditional Information附加资料附加资料
Breeding, Marshall. Breeding, Marshall. Strategies for Strategies for Measuring and Implementing E-useMeasuring and Implementing E-use. . ALA TechSource. May-June 2002. 79 ALA TechSource. May-June 2002. 79 pages.pages.
Breeding, Marshall. “Analyzing Web Breeding, Marshall. “Analyzing Web server logs to improve a site’s usage.” server logs to improve a site’s usage.” Computers in LibrariesComputers in Libraries. Information . Information Today. Medford, CT. October 2005.Today. Medford, CT. October 2005.
Devise a strategy in which you can Devise a strategy in which you can follow a more user-centered follow a more user-centered approach to the ongoing approach to the ongoing development of your library’s Web development of your library’s Web site through monitoring and analysis site through monitoring and analysis of use data.of use data.
你可透过监察和分析使用数据,遵循一个你可透过监察和分析使用数据,遵循一个更加以用户为中心的方法,对贵馆正在实更加以用户为中心的方法,对贵馆正在实行的网站发展制定一项策略。行的网站发展制定一项策略。
Group ExerciseGroup Exercise 小组研习小组研习