J. WILLARD MARRIOTT LIBRARY June 3, 2010Western CONTENTdm Users Group Creating a Path through the...

52
J. WILLARD MARRIOTT LIBRARY June 3, 2010 Western CONTENTdm Users Group Creating a Path through the Labyrinth Using Sitemaps to Enhance Discoverability Sandra McIntyre, Mountain West Digital Library Anne Morrow, University of Utah Patrick OBrien, University of Utah (volunteer) Lisa Chaufty, University of Utah

Transcript of J. WILLARD MARRIOTT LIBRARY June 3, 2010Western CONTENTdm Users Group Creating a Path through the...

J. WILLARD MARRIOTT LIBRARY

June 3, 2010 Western CONTENTdm Users Group

Creating a Path through the Labyrinth

Using Sitemaps to Enhance Discoverability

Sandra McIntyre, Mountain West Digital LibraryAnne Morrow, University of UtahPatrick OBrien, University of Utah (volunteer)Lisa Chaufty, University of Utah

J. WILLARD MARRIOTT LIBRARY

The Importance of Discoverability

J. WILLARD MARRIOTT LIBRARY

Why is this a priority?

• For our digital collections– The J. Willard Marriott Library hosts over 100

outstanding digital collections, containing over 1 million digital items.

• For USpace, one of our large collections– Our mission is to collect, preserve, and provide access to

the intellectual capital of the University of Utah, to reflect the University’s excellence, and to share that work with others.

J. WILLARD MARRIOTT LIBRARY

Sharing

• Share what?– Unique materials, such as theses and dissertations– Above all, the work of our faculty

• Q: Why do faculty submit to institutional repositories?• A: One primary motivator for faculty IR contribution is to make

sure other scholars can find and cite their work.• A: Faculty will access and contribute to an IR if they see

significant input activity.

• How are people going to find what we want to share?

De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-17. http://www.oclc.org/reports/pdfs/Percept_all.pdf

De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-18. http://www.oclc.org/reports/pdfs/Percept_all.pdf

In the beginning…

Today…

J. WILLARD MARRIOTT LIBRARY

User discovery• Library Website• OAI harvesters (e.g., Mountain West Digital Library)• WorldCat • Search engines

– Google– Google Scholar– Google Images– Yahoo– Bing– Baidu– And more…

J. WILLARD MARRIOTT LIBRARY

Focus on search engine discoverability

• Googlebots– Crawls website and harvests links– Used by Google to build search index

• We want to lend the bots a hand– What can we do to improve the bot-crawling of the

digital library?

• Understanding the relationship between Googlebots and CONTENTdm

J. WILLARD MARRIOTT LIBRARY

Cross-departmental collaboration • Search Engine Optimization (SEO) A-Team

– Collection Managers• Lisa Chaufty, Coordinator of the Institutional Repository

– Mountain West Digital Library• Sandra McIntyre, MWDL’s Program Director

– IT Division• Kenning Arlitsch, Associate Director for IT Services• Systems Development• Application Development• Anne Morrow, Digital Initiatives Librarian

– Search Engine Marketing (SEM) expertise• Patrick OBrien, Volunteer Advisor

J. WILLARD MARRIOTT LIBRARY

CONTENTdm and Google

J. WILLARD MARRIOTT LIBRARY

Google and CONTENTdm’s OAI

• Big change in mid-2008:Google announced it would no longer crawl Open Archives Initiative (OAI) streams

• Before then, Google would crawl OAI metadata for digital collections and index it

• Many digital collections have been slowly “disappearing” from Google since then

J. WILLARD MARRIOTT LIBRARY

Web Crawlers – how they work

Index and

store data

J. WILLARD MARRIOTT LIBRARY

Dynamic pages

• CONTENTdm constructs pages in HTML on the fly– Header– Record retrieved from

database and formatted– Footer

J. WILLARD MARRIOTT LIBRARY

Dynamic page

• Have to tell crawler how to assemble it (with URL)

J. WILLARD MARRIOTT LIBRARY

CONTENTdm: More challenges for crawlers

• Compound objects use frameset that makes it harder to get to the content of the page

• Table layout, Javascript, and CSS clutter up the code and are not used for structure of text, just for styling and management

• Except for splash page, there is no hierarchy of linking that gives any context – “islands” of separate pages

• Usable content is often far down in the code on the delivered page

J. WILLARD MARRIOTT LIBRARY

In the works

• OCLC is working with Google and others to enhance visibility of resources in WorldCat

• Expect more of a linkage between WorldCat and search engines

J. WILLARD MARRIOTT LIBRARY

Google Sitemaps

J. WILLARD MARRIOTT LIBRARY

Google Sitemaps

• Instructions to Google’s web crawler: Crawl these URLs to get my content

• One Sitemap for each collection• Not the same as a “site map” (contents page for

website)

“Here is a list of the URLs of the dynamic pages that I want you to crawl, one for each item.”

J. WILLARD MARRIOTT LIBRARY

Google Sitemap – examplehttp://content.lib.utah.edu/sitemaps/sitemap_ir-main-001.xml

J. WILLARD MARRIOTT LIBRARY

Sitemap Index

• XML file listing all the Sitemaps on your server

“Here is a list of all the Sitemap files.”

J. WILLARD MARRIOTT LIBRARY

Sitemap Index - examplehttp://content.lib.utah.edu/cdm4/autositemap/sitemapindex.xml

J. WILLARD MARRIOTT LIBRARY

Implementing Google Sitemaps

• Create Sitemaps, one for each collection, and Sitemap Index.

• Register with Google Webmaster Tools.• Inform Google about the location of your Sitemap

Index.– In Webmaster Tools– In the robots.txt file on the server

• Monitor crawler results.

J. WILLARD MARRIOTT LIBRARY

Step 1: Create Sitemaps and Index

• According to the protocol at http://www.sitemaps.org:– Create a Sitemap file for each collection. – Create a Sitemap Index file.

• See Terry Reese’s “makemap” script athttp://digitalcollections.library.oregonstate.edu/php/makemap.txt

J. WILLARD MARRIOTT LIBRARY

Step 2: Webmaster Tools Registration• Register (free) with

Google Webmaster Tools at http://www.google.com/webmasters/tools

• You will need a Google account

• Follow the directionsto prove your controlover the site by adding a <meta> tag to the home page

J. WILLARD MARRIOTT LIBRARY

Step 2: Webmaster Tools Registration

J. WILLARD MARRIOTT LIBRARY

Step 3: Inform Google• Step 3A: Submit the address of Sitemap Index file on Webmaster Tools.

J. WILLARD MARRIOTT LIBRARY

Step 3: Inform Google• Step 3B: Modify the robots.txt file at the root of your CONTENTdm

server to specify the location of the Sitemaps Index.

J. WILLARD MARRIOTT LIBRARY

Step 4: Monitor crawler results

• Monitor crawler results on Webmaster Tools.• Top search queries• Links to your site• Keywords• Internal links• Crawl errors• Crawl stats• HTML suggestions

J. WILLARD MARRIOTT LIBRARY

Using and Maintaining Sitemaps

• Re-generating (updating) Sitemaps and Sitemap Index frequently

• Checking the crawler stats in Google Webmaster Tools and initiating changes as needed

• Noticing the impact in Google searches

J. WILLARD MARRIOTT LIBRARY

Initial Progress

J. WILLARD MARRIOTT LIBRARY

Initial approach to SEO

• Generated Sitemaps and Sitemap Index file: Applied a variation of Terry Reese’s makemap.php script

• Edited robots.txt file• Registered with Webmaster Tools• Started observing crawler statistics

J. WILLARD MARRIOTT LIBRARY

J. WILLARD MARRIOTT LIBRARY

J. WILLARD MARRIOTT LIBRARY

J. WILLARD MARRIOTT LIBRARY

The Next Phase

• Enlisted expertise of Patrick OBrien– Sitemaps

• Diagnose crawl error reports • Make recommendations

– Proposing strategy for the future of SEO

J. WILLARD MARRIOTT LIBRARY

Short-term Opportunities

J. WILLARD MARRIOTT LIBRARY

Know your customers and what they value.Faculty

Collection Donors • Digital Collection Pages Indexed• Digital Collection Page Views• Digital Collection Visitors• Requests for More Info• Physical Collection Visitors• Reproductions Ordered

Publication Page Views Publication Downloads Requests for Information Publication Citations

Value

Value

High

Value

Value

High

J. WILLARD MARRIOTT LIBRARY

Why can’t the public find our content?

Public

Are you worthy enough for their customer (i.e Index)?

How much will their customer value the introduction (i.e, Visibility)?

CONTENTdm

?

?

?

?

J. WILLARD MARRIOTT LIBRARY

Are you worthy enough for their customer?

• Can they trust you with their customer?

• Is your content worth an investment of their resources?

J. WILLARD MARRIOTT LIBRARY

Check the Crawl Errors

• Page Forbidden (401 errors)• User Not Authorized (403 errors)• Network Unreachable (5xx errors)• Page Not Found (404 errors)

J. WILLARD MARRIOTT LIBRARY

Address errors and don’t leave users stranded!

Low Trust Example403 Error

J. WILLARD MARRIOTT LIBRARY

Eliminate sitemap and robots.txt conflicts

User-agent: *Disallow: /dmscripts/Disallow: /cdm4/admin/Disallow: /cdm4/client/Disallow: /cdm4/cqr/Disallow: /cdm4/images/Disallow: /cdm4/includes/Disallow: /cdm4/jscripts/Disallow: /cdm-diagnostics/Disallow: /cgi-bin/Disallow: /images/Disallow: /u/

Robots.txt SitemapUser-agent: *Disallow: /dmscripts/Disallow: /cdm4/admin/Disallow: /cdm4/client/Disallow: /cdm4/cqr/Disallow: /cdm4/images/Disallow: /cdm4/includes/Disallow: /cdm4/jscripts/Disallow: /cdm-diagnostics/

Disallow: /cgi-bin/Disallow: /images/Disallow: /u/

http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwithhttp://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith

J. WILLARD MARRIOTT LIBRARY

http://content.lib.utah.edu/http://content.lib.utah.edu/cdm4/az.php#Dhttp://content.lib.utah.edu/cdm4/az_details.php?id=44http://content.lib.utah.edu/cdm4/browse.php?CISOROOT=/DardHunter

Provide sitemaps linking context with simple URLshttp://content.lib.utah.edu/cdm4/document.php?CISOROOT=/DardHunter&CISOPTR=1919

J. WILLARD MARRIOTT LIBRARY

A Papermaking Pilgrimage to Japan, Korea and ChinaA Papermaking Pilgrimage to Japan, Korea and China

Increase Page Crawl efficiency

J. WILLARD MARRIOTT LIBRARY

Are you worthy enough for their customer?

• Can they trust you with their customer?– Check the Crawl Errors in Google Webmaster– Address errors and don’t leave their customers stranded!

• Is your content worth an investment of their resources?– Eliminate sitemap & robots.txt conflicts– Provide sitemaps linking context with simple URLs– Increase Page Crawl efficiency

J. WILLARD MARRIOTT LIBRARY

How much will their customer value the introduction (i.e., Visibility)?

• Is your content relevant?• Is your content credible?• Is your content

accessible?

J. WILLARD MARRIOTT LIBRARY

Summary

J. WILLARD MARRIOTT LIBRARY

Recommendations for CONTENTdm managers• Assemble the right players: your SEO team• Set your priorities• Create linking strategy from home page, to index of

collections, to collection “splash” page, and to item pages• Create Sitemaps and Sitemap Index file• Set up a regular process to update Sitemaps• Eliminate any conflicts between robots.txt and Sitemaps• Get involved with CONTENTdm’s new Web templates• Set up to monitor results!

J. WILLARD MARRIOTT LIBRARY

Stay tuned!

• We will share the results and tools

J. WILLARD MARRIOTT LIBRARY

What’s your experience? Let us know!• Sandra McIntyre

[email protected]• Anne Morrow

[email protected]• Lisa Chaufty

[email protected]• Patrick OBrien

(805) [email protected]