Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus Brian...

46
Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: Email [email protected] URL http://www.ukoln.ac.uk/

Transcript of Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus Brian...

Automated Benchmarking Of UK Museum Web Sites

With An Introduction to UKOLN and UK Web Focus

Brian Kelly

UK Web FocusUKOLN

University of Bath

Bath, BA2 7AY

UKOLN is supported by:

[email protected]://www.ukoln.ac.uk/

2

Contents

• About UKOLN• UKOLN’s WebWatch Work For UK HEIs• Benchmarking UK Museum Web Sites• Comparison With “6 Of The Best”• Limitations Of Approach• Where To From Here?

3

UKOLN

UKOLN:• National focus of expertise in digital information

management• Based at University of Bath• Funded by JISC (HE and FE sector) and Resource:

The Council for Museums, Archives and Libraries, together with project funding (e.g. EU and JISC)

• About 25 FTEs • Carries out applied research (e.g. in metadata),

software development and provides policy and advisory services

4

UKOLN’s Dissemination Work

UKOLN carries out dissemination activities including work carried out by UKOLN’s Policy and Advice Team:

Interoperability FocusClose links with Resource and Museums community (member of CIMI Executive Committee)Involved in e-GIF standards workSee <http://www.ukoln.ac.uk/interop-focus/>

Collection Description FocusFunded by JISC, RSLP and British Library Coordination work on collection description methods, schemas & tools with goal of ensuring consistency across projects, disciplines, institutions and sectors See <http://www.ukoln.ac.uk/cd-focus/>

Bibliographic ManagementUK Web Focus - myself

5

UK Web Focus

UK Web Focus:• Funded by JISC to provide advice on Web

developments• Organises events (e.g. annual Institutional Web

Management Workshop), writes articles (e.g. regular columns in Ariadne e-journal), gives talks, etc.

• A member of UKOLN’s Policy and Advice Team (which also includes Interoperability Focus, Collection Description Focus and Public Library Networking Focus)

• Managed the original WebWatch project and continues to publish results of WebWatch surveys

6

Community Building

An important part of my work is community building within UK HE / FE Web management communities:

• An annual 3 day workshop which provides an opportunity for Web managers to: update their technical skills and approaches to managerial

and strategic thinking discuss and share problems and solutions with peers

• Active participation in (e.g.) JISCMail mailing lists e.g.: web-support: “My home page doesn’t look right in

Netscape 4. Can anyone help?” website-info-mgt: “A Web site has stolen text and images

from my Web site. What should I do?”“How should I impose a consistent look-and-feel across all departmental Web sites?”

• Comparing approaches across community and sharing best practices

7

WebWatch Project

WebWatch project:• Initially funded for 1 year in 1997 by BLRIC to

develop and use automated robot software to analyse Web developments across various UK communities

• Once funding finished the work continued, but made use of (mainly) freely available Web services to analyse various features of Web site communities

• Supports community-building work across UK HE/FE Web managers (sharing, not flaming)

• See <http://www.ukoln.ac.uk/web-focus/webwatch/>

8

WebWatch SurveysSearch Engines Used To Index UK HE Web Sites:

ht://Dig most popular and growing in popularity followed by an MS solution

Interest in licensed Ultraseek/Inktomi solution Interest in externally hosted indexers (e.g. Google) Surprising number of institutions with no search facility See <http://www.ukoln.ac.uk/web-focus/

surveys/uk-he-search-engines/>

Nos. of Links Cambridge has most (231,000 links to all servers) Sheffield has the most to a single server (46,000) See <http://www.ariadne.ac.uk/issue23/web-watch/>

Nos. Of Web Servers Cambridge has most (200+) See <http://www.ariadne.ac.uk/issue25/web-watch/>

9

Update On Search EnginesSept 1999 ht://Dig: 25 Excite: 19 Microsoft: 12 Harvest: 8 Ultraseek: 7 SWISH: 5 Other: 23 None: 59

Today: ht://Dig: 48 Microsoft: 17 Ultraseek/Inktomi: 12 Google: 11 Excite: 5 Webinator: 5 Others: 22 None: 29

The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making.

The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making.

NOTE

10

WebWatch Activities

As well as these metrics a number of observations of features have been carried out

404 Error Page The appearance of and functionality provided by the

institution’s 404 error page

Appearance of Main Entry Point The appearance of the institution’s entry point, and

identifying main types (menu-style vs news) and use of technologies (Java, DHTML, etc.)

A “rolling demo” has been provided of these features allowing interested parties to quickly get a feel of the approaches taken within the community

These have proved very popular – see <http://www.ukoln.ac.uk/web-focus/site-rolling-demos/>

11

Benchmarking

WebWatch approach of monitoring UK HE Web sites can be extended into a benchmarking exercise:

• Making comparisons with peers• Checking compliance with standards • Checking compliance with community or funders guidelines

(e.g. e-GIF guidelines)

This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building

and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards

This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building

and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards

12

Benchmarking Museum Web Sites

WebWatch approach to benchmarking has been applied to a small number of UK Museum Web sites:Small selection chosen in order to:

Keep resource requires to a minimum Validate methodology Gauge interest in this approach

Selected resources were: Sample of museum Web sites Guardian’s six best museum Web sites

If methodology is felt to be valid and there is sufficient interest the approach could be taken more widely across the museum community

Details of survey available from <http://www.ukoln.ac.uk/web-focus/events/conferences/museums-2001/>

Details of survey available from <http://www.ukoln.ac.uk/web-focus/events/conferences/museums-2001/>

13

Benchmarking Activity

Choosing the sample:• mda list of UK Museum Web sites used as master

source <http://www.mda.org.uk/vlmp/>• Web sites beginning with letter “A” were chosen

<http://www.mda.org.uk/vlmp/#A>• Andrew Carnegie Birthplace Museum removed from

sample as Web site was unavailableAbbot Hall Art Gallery

Aberdeen Art Gallery & Museums

AccessArt

Aerospace Museum

Allhallows Museum

Althorp House

Amberley Museum

American Museum in Britain

Armagh Planetarium

Arnolfini Gallery

Ashmolean Museum of Art & Archaeology

Astley Hall Museum and Art Gallery

Avoncroft Museum of Historic Buildings

The 13 Selected Museum Web Sites

14

Approaches

Approaches taken:• Use of freely-available Web

sites which provide analysis capabilities

• Page of “live links” provided enabling all users to reproduce findings

• Complement this with manual inspection

Benefits of this approach:• Openness, reproducibility

and objectivity of survey

http://www.netmechanic.com/toolbox/html-code.htm

http://www.netmechanic.com/toolbox/html-code.htm

15

Domain Names

Findings• 11 museums (92%) have an entry point which is the

domain name and 2 (8%) have an entry point which is one level beneath the domain name

• 6 (46%) have a .co.uk domain; 3 (23%) have .org.uk; 2 (15%) have .com; 1 (8%) has .org; 1 (8%) has .ac.uk

Discussion• Most of the museums have a short, memorable URL• The variety of top level domains may be confusing

for end users• How will the new .museum domain be deployed?

Is there an opportunity for a major advertising campaign?

Reminder – findings are for a small, non-random sample

Reminder – findings are for a small, non-random sample

16

Server Software

Netcraft used to analyse Web server software

Findings• 7 hosted on a Unix platform (4 on Linux, 2 on Solaris and 1 on

BSD)• 6 hosted on a Microsoft platform (4 on NT 4 or Windows 98, 2

on Windows 2000)

Issues• Security, scalability, ease-of-use, ….

http://www.netcraft.com/http://www.netcraft.com/

17

Standards Compliance

Entry point examined for compliance with HTML and CSS standards using the NetMechanic and W3C Validator Web-based tools:

Findings• 0 pages were HTML compliant (according to W3C)• Of the 5 sites which contained a CSS style sheet, 0

had errors (according to W3C)• 3 pages were HTML compliant (according to

NetMechanic)

Issues• HTML-compliance is important for ensuring wide

accessibility and for repurposing content

18

AccessibilityEntry point examined for compliance with W3C WAI guidelines for accessibility using the Bobby Web-based tool:

Findings• Only 2 pages had no WAI Priority 1 error

Issues• Compliance with accessibility standards is

important for ensuring access to resources for people with disabilities

• Compliance with accessibility standards may be an organisational requirement

• Compliance with accessibility standards may be a legal requirement

19

Size Of Entry Point Using Bobby

Findings (Bobby)• Largest entry point initially appeared to be 159 Kb • On further analysis of framed sites the largest entry

point was found to be 236.91 Kb• The smallest appeared to be 1 Kb – but this was a

FRAMES page (and not the individual linked pages)• On further analysis of framed sites the smallest entry

point was found to be 15.45 Kb

Issues• Bobby flagged pages which used frames but further

manual analysis and calculations were needed

20

Size Of Entry Point Using NetMechanic

Findings (NetMechanic) • Largest entry point initially appeared to be 237,107 b

(231 Kb) • The smallest appeared to be 16,045 b (15.7 Kb)

Issues• NetMechanic flagged pages which used frames but

further manual analysis and calculations were needed

Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case

Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case

21

Comments On Size Measurements

Use of tools to analyse size of Web pages has indicated several issues:

• Need for manual inspection of results (normally outliers) in order to spot invalid comparisons

• Different ways of treating: Redirects Frames User-agent negotiation etc.

and inconsistencies in handling: robot exclusion protocol external files (e.g. CSS and JavaScript), etc.

may result in inconsistent findings• Changes in content of page (e.g. inclusion of news items,

personalised interfaces, etc.)• Output generated for viewing on Web, not further processing• Current need to manual sum sub-parts

22

Link PopularityThe numbers of links to the Web site was found using LinkPopularity (which has an interface to AltaVista):

Findings• The most linked-to Web site had 2,731 links• The least linked-to Web site had 45 links

Issues• Links can drive traffic to your Web site • Links can be used by citation-based search engines

(such as Google) to boost the ranking of your site (many links to your page means Google will give it a higher ranking than a similar page with fewer links)

• Snapshots of link popularity can help gauge effectiveness of publicity campaigns

23

Search Engine Coverage / Size Of Web SiteAltaVista and Netscape’s What’s Related tool were used to measure the size of the museum Web sites (i.e. the numbers of pages they had indexed):

Findings• Most no. of pages indexed by AV was 2,037 pages• Most no. of pages indexed by NS was 1,919 pages• Least no. of pages indexed by AV was 0 pages• Most no. of pages indexed by NS was 0 pages

Issues• The nos. of pages indexed should be

≥ 0 and ≤ nos. of pages on Web site• If significantly fewer pages are indexed than exist,

this may show a Web site which is not search-friendly (e.g. use of frames, splash screens, etc.)

24

Search FacilityInformation on museum’s search engine was found:

Findings• 10 sites have no search facility• 3 have a search facility:

1 uses the FreeFind externally-hosted search engine 1 uses a Microsoft search engine 1 uses a Perl script (to search an online catalogue)

• 1 search facility not working (over 1 month period)

Issues• Users expect to be provided with search facilities • It can take < 30 minutes (and little technical

expertise) to make an externally hosted search engine available, suitable for simple static Web sites (but not many people know this)

25

404 Error Page

Information on the 404 error page was found:

Findings• 10 sites use the default 404 error message• 3 have a lightly branded error message, but with little

additional functionality

Issues• The 404 error page is (sadly) likely to be widely accessed• It is desirable that it:

Reflects the Web sites look-and-feel Provides functionality to assist a user who is ‘lost’:

Provides access to a search facility / site mapProvides contact details

• The 404 page can also be context-sensitive (e.g. different pages for users following a local link / remote link / no link)

26

27

Robots.txt

Information on the Web site’s robots.txt file was found:

Findings• 12 sites have no robots.txt file • 1 site has a simple robots.txt file

Issues•robots.txt file can be used to control indexing of

your Web site e.g. stop robots from indexing: Pre-release versions of pages Test areas …

28

Other Surveys

Additional surveys were carried out:

Cachability Of Entry Point• Cacheability Engine used

<http://www.mnot.net/cacheability/>• 11 entry points were cachable and 2 were not

What’s Related To Web Site• Netscape’s What's Related? facility

<http://home.netscape.com/escapes/related/> used to record:

Popularity, nos. of pages and nos. of links Relationships with other sites

29

Six of the Best: Museums Guardian’s Online supplement (18 Oct 2001) published their list of the six best Museum Web sites:

• The Hermitage in St Petersberg at<http:// www.hermitagemuseum.org/>

• Metropolitan Museum at <http:// www.metmuseum.org/>

• SCRAN at <http:// www.scan.ac.uk>• Tate Modern at <http://www.tate.org.uk/modern/>• The Louvre at <http://www.louvre.fr/>• Design Museum at

<http://www.designmuseum.org/>

30

Comparisons

Automated Surveys • 3 had a search facility• Nos. of links to sites ranged from 723 to 18,366• All surveyed entry points had P1 accessibility errors• All surveyed entry points had HTML errors

Observations• 3 were providing a search facility• Most were providing a simple robots.txt file• Some of the 404 error messages were slightly

better

31

Accessible to Browsers

How do the Web sites look in different browsers?

The Lynx text browser and an emulation of the Mosaic browser were used in order to investigate how the Web sites would look to:

• Users of old browsers• Users of browsers with no JavaScript support• Users of text browsers (or an indexing robot)

32

Mosaic

33

Lynx

34

Limitations Of Survey

Limitations of this type of benchmarking approach include:

• Lack of standards• Limitations of the tools• Resources needed to carry out surveys• Scoping of Museum sites and invalid comparisons• Automated approach fails to address content

issues which require a manual approach

35

Limitations - Standards

There is a lack of standards to support benchmarking work (or conflicting standards). For example:

Size of a page

How do you measure the size of the museum’s entry point? You need this in order to make comparisons and if, say, you have guidelines on the maximum file size.

Problems What do you measure (HTML file, inline images, external

CSS and JavaScript files, …)? Changes in file content (e.g. user-agent negotiation, news

content, frames and refresh elements, etc.) How do you handle the robot exclusion protocol (REP)

NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP

NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP

36

Limitations - Tools

Issues:• Auditing tools tend to make implicit definitions (e.g. measuring

size of a page). Different results may be obtained when using different tools for same purpose (or if vendor changes its definition)

• Use of Web-based auditing services:Talk has described use of (mainly free) Web-based servicesThe providers may change their policy Use of the URL interface to pass parameters (rather than direct use of the form on the Web page) may not be allowed

• Use of desktop auditing toolsUse of desktop tools avoids the problems of change control of Web based services.However it means that it may be difficult for others to reproduce findings

37

Limitations - Resources

It can be time-consuming to:• Maintain URL of entry point to museum Web sites

(need to have close links with provider of central portal)

• Manage the input to the variety of Web-based services

• Process the output from the Web-based services (current need to initiate inquiry, wait for results and manually copy and paste results)

38

Limitations – Scope of Web Site

Scope• What is a museum Web site?• What is not part of a museum Web site?• It can be difficult to answer these questions.• There are no standard ways to define a “Web site”

other than by use of domain names and directory structures

• Even directory structures can be inadequate if they are not used correctly

Comparisons• It may not to sensible to make comparisons

between museums of different types and sizes

39

Limitations – Automated Only

Use of an automated approach:• Would not (easily) address content issues• Has been supplemented with manual observations

(e.g. home page, 404 page & search engine page)

However:• An automated approach can be more objective and

reproducible• An automated approach should be less resource-

intensive (once software has been set up to maintain links to resources, surveys sites and process results)

• A automated approach could be used in conjunction with a manual survey (of a representative sample set of resources)

40

Beyond A Pilot

Despite the limitations which have been described, would a comprehensive and systematic benchmark of UK Museum Web sites be of benefit?

• Can we address the resource issues?• Are the lack of standards being addressed?• Can we find someone to do the work?• Should the focus be developmental?• Can the work be extended to provide notification of

problems (e.g. search engine not working)?

What may happen if we don’t do this?

Might we find that funders set up inappropriate or flawed performance indicators?

What may happen if we don’t do this?

Might we find that funders set up inappropriate or flawed performance indicators?

41

A Model For Implementation

The benchmarking process can be made less time-consuming if a more flexible model for managing the data was usedThe benchmarking process can be made less time-consuming if a more flexible model for managing the data was used

At present we seem to have a HTML page with links to museum Web sites

Unfortunately HTML pages are difficult to repurpose

Page for viewing

Page for inputto Web services

A better model is to store links in a neutral databases, and to generate pages for viewing by end users and for input into benchmarking Web services

The database could also be reused for other purposes e.g. checking links and email notifications of problems

42

Towards “Web Services”

Background• Web initially implemented for provision of information• CGI allowed users to input data and provided

integration with backend applications • Techniques described use URL as input to auditing

service. However this provides limited functionality and is susceptible to vagaries of marketplace

Future• “Web Services” will support machine integration by

providing a standard messaging infrastructure which uses HTTP protocol

• XML output (e.g. EARL) will provide a neutral format for benchmarking output, and can describe benchmarking environment (EARL is RDF)

43

Need For Standard Definitions

Need For Standard Definitions• There is a need for standard definitions of

terminology such as Web page, visit, unique visit, session, etc. in order to ensure that meaningful and objective comparisons can be made

• The market place is addressing current deficiencies within Web Advertising and Web Auditing communities (and there are financial incentives for this to be solved)

• With the growth in e-governments internationally and governments setting targets (X% of government work to be carried about electronically by 2005)

44

Doing The Work

If there is further interest, who should do the work?

Who

Funding body

Auditing bodyOther central body

Volunteer

Part of current remit

What

Why

Other(s) New remit

Research interest

Dissemination

Provides benefitsto communityMaintain central database

Software development

Student project

Producing reports

BenchmarkingWork

Researcher

45

What Next?

To summarise:• Approach to the automated benchmarking of a small

set of museum Web sites has been shown• Implications of the findings have been discussed• There are limitations of the methodology

It is suggested that:• Despite the limitations benchmarking of museum

Web sites can be beneficial: Community building Learning from successes and mistakes

• There may be advantages in carrying out this work within the community

46

Questions

Any questions?

Questions For You• Would further work be useful?• Who would do the work?• Is there a need for a portal for use by the

community of museum Web managers as well as for end users?

• Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)

Questions For You• Would further work be useful?• Who would do the work?• Is there a need for a portal for use by the

community of museum Web managers as well as for end users?

• Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)