J. WILLARD MARRIOTT LIBRARY June 3, 2010Western CONTENTdm Users Group Creating a Path through the...
-
Upload
brent-benjamin-mills -
Category
Documents
-
view
226 -
download
0
Transcript of J. WILLARD MARRIOTT LIBRARY June 3, 2010Western CONTENTdm Users Group Creating a Path through the...
J. WILLARD MARRIOTT LIBRARY
June 3, 2010 Western CONTENTdm Users Group
Creating a Path through the Labyrinth
Using Sitemaps to Enhance Discoverability
Sandra McIntyre, Mountain West Digital LibraryAnne Morrow, University of UtahPatrick OBrien, University of Utah (volunteer)Lisa Chaufty, University of Utah
J. WILLARD MARRIOTT LIBRARY
Why is this a priority?
• For our digital collections– The J. Willard Marriott Library hosts over 100
outstanding digital collections, containing over 1 million digital items.
• For USpace, one of our large collections– Our mission is to collect, preserve, and provide access to
the intellectual capital of the University of Utah, to reflect the University’s excellence, and to share that work with others.
J. WILLARD MARRIOTT LIBRARY
Sharing
• Share what?– Unique materials, such as theses and dissertations– Above all, the work of our faculty
• Q: Why do faculty submit to institutional repositories?• A: One primary motivator for faculty IR contribution is to make
sure other scholars can find and cite their work.• A: Faculty will access and contribute to an IR if they see
significant input activity.
• How are people going to find what we want to share?
De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-17. http://www.oclc.org/reports/pdfs/Percept_all.pdf
De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-18. http://www.oclc.org/reports/pdfs/Percept_all.pdf
J. WILLARD MARRIOTT LIBRARY
User discovery• Library Website• OAI harvesters (e.g., Mountain West Digital Library)• WorldCat • Search engines
– Google– Google Scholar– Google Images– Yahoo– Bing– Baidu– And more…
J. WILLARD MARRIOTT LIBRARY
Focus on search engine discoverability
• Googlebots– Crawls website and harvests links– Used by Google to build search index
• We want to lend the bots a hand– What can we do to improve the bot-crawling of the
digital library?
• Understanding the relationship between Googlebots and CONTENTdm
J. WILLARD MARRIOTT LIBRARY
Cross-departmental collaboration • Search Engine Optimization (SEO) A-Team
– Collection Managers• Lisa Chaufty, Coordinator of the Institutional Repository
– Mountain West Digital Library• Sandra McIntyre, MWDL’s Program Director
– IT Division• Kenning Arlitsch, Associate Director for IT Services• Systems Development• Application Development• Anne Morrow, Digital Initiatives Librarian
– Search Engine Marketing (SEM) expertise• Patrick OBrien, Volunteer Advisor
J. WILLARD MARRIOTT LIBRARY
Google and CONTENTdm’s OAI
• Big change in mid-2008:Google announced it would no longer crawl Open Archives Initiative (OAI) streams
• Before then, Google would crawl OAI metadata for digital collections and index it
• Many digital collections have been slowly “disappearing” from Google since then
J. WILLARD MARRIOTT LIBRARY
Dynamic pages
• CONTENTdm constructs pages in HTML on the fly– Header– Record retrieved from
database and formatted– Footer
J. WILLARD MARRIOTT LIBRARY
CONTENTdm: More challenges for crawlers
• Compound objects use frameset that makes it harder to get to the content of the page
• Table layout, Javascript, and CSS clutter up the code and are not used for structure of text, just for styling and management
• Except for splash page, there is no hierarchy of linking that gives any context – “islands” of separate pages
• Usable content is often far down in the code on the delivered page
J. WILLARD MARRIOTT LIBRARY
In the works
• OCLC is working with Google and others to enhance visibility of resources in WorldCat
• Expect more of a linkage between WorldCat and search engines
J. WILLARD MARRIOTT LIBRARY
Google Sitemaps
• Instructions to Google’s web crawler: Crawl these URLs to get my content
• One Sitemap for each collection• Not the same as a “site map” (contents page for
website)
“Here is a list of the URLs of the dynamic pages that I want you to crawl, one for each item.”
J. WILLARD MARRIOTT LIBRARY
Google Sitemap – examplehttp://content.lib.utah.edu/sitemaps/sitemap_ir-main-001.xml
J. WILLARD MARRIOTT LIBRARY
Sitemap Index
• XML file listing all the Sitemaps on your server
“Here is a list of all the Sitemap files.”
J. WILLARD MARRIOTT LIBRARY
Sitemap Index - examplehttp://content.lib.utah.edu/cdm4/autositemap/sitemapindex.xml
J. WILLARD MARRIOTT LIBRARY
Implementing Google Sitemaps
• Create Sitemaps, one for each collection, and Sitemap Index.
• Register with Google Webmaster Tools.• Inform Google about the location of your Sitemap
Index.– In Webmaster Tools– In the robots.txt file on the server
• Monitor crawler results.
J. WILLARD MARRIOTT LIBRARY
Step 1: Create Sitemaps and Index
• According to the protocol at http://www.sitemaps.org:– Create a Sitemap file for each collection. – Create a Sitemap Index file.
• See Terry Reese’s “makemap” script athttp://digitalcollections.library.oregonstate.edu/php/makemap.txt
J. WILLARD MARRIOTT LIBRARY
Step 2: Webmaster Tools Registration• Register (free) with
Google Webmaster Tools at http://www.google.com/webmasters/tools
• You will need a Google account
• Follow the directionsto prove your controlover the site by adding a <meta> tag to the home page
J. WILLARD MARRIOTT LIBRARY
Step 3: Inform Google• Step 3A: Submit the address of Sitemap Index file on Webmaster Tools.
J. WILLARD MARRIOTT LIBRARY
Step 3: Inform Google• Step 3B: Modify the robots.txt file at the root of your CONTENTdm
server to specify the location of the Sitemaps Index.
J. WILLARD MARRIOTT LIBRARY
Step 4: Monitor crawler results
• Monitor crawler results on Webmaster Tools.• Top search queries• Links to your site• Keywords• Internal links• Crawl errors• Crawl stats• HTML suggestions
J. WILLARD MARRIOTT LIBRARY
Using and Maintaining Sitemaps
• Re-generating (updating) Sitemaps and Sitemap Index frequently
• Checking the crawler stats in Google Webmaster Tools and initiating changes as needed
• Noticing the impact in Google searches
J. WILLARD MARRIOTT LIBRARY
Initial approach to SEO
• Generated Sitemaps and Sitemap Index file: Applied a variation of Terry Reese’s makemap.php script
• Edited robots.txt file• Registered with Webmaster Tools• Started observing crawler statistics
J. WILLARD MARRIOTT LIBRARY
The Next Phase
• Enlisted expertise of Patrick OBrien– Sitemaps
• Diagnose crawl error reports • Make recommendations
– Proposing strategy for the future of SEO
J. WILLARD MARRIOTT LIBRARY
Know your customers and what they value.Faculty
Collection Donors • Digital Collection Pages Indexed• Digital Collection Page Views• Digital Collection Visitors• Requests for More Info• Physical Collection Visitors• Reproductions Ordered
Publication Page Views Publication Downloads Requests for Information Publication Citations
Value
Value
High
Value
Value
High
J. WILLARD MARRIOTT LIBRARY
Why can’t the public find our content?
Public
Are you worthy enough for their customer (i.e Index)?
How much will their customer value the introduction (i.e, Visibility)?
CONTENTdm
?
?
?
?
J. WILLARD MARRIOTT LIBRARY
Are you worthy enough for their customer?
• Can they trust you with their customer?
• Is your content worth an investment of their resources?
J. WILLARD MARRIOTT LIBRARY
Check the Crawl Errors
• Page Forbidden (401 errors)• User Not Authorized (403 errors)• Network Unreachable (5xx errors)• Page Not Found (404 errors)
J. WILLARD MARRIOTT LIBRARY
Address errors and don’t leave users stranded!
Low Trust Example403 Error
J. WILLARD MARRIOTT LIBRARY
Eliminate sitemap and robots.txt conflicts
User-agent: *Disallow: /dmscripts/Disallow: /cdm4/admin/Disallow: /cdm4/client/Disallow: /cdm4/cqr/Disallow: /cdm4/images/Disallow: /cdm4/includes/Disallow: /cdm4/jscripts/Disallow: /cdm-diagnostics/Disallow: /cgi-bin/Disallow: /images/Disallow: /u/
Robots.txt SitemapUser-agent: *Disallow: /dmscripts/Disallow: /cdm4/admin/Disallow: /cdm4/client/Disallow: /cdm4/cqr/Disallow: /cdm4/images/Disallow: /cdm4/includes/Disallow: /cdm4/jscripts/Disallow: /cdm-diagnostics/
Disallow: /cgi-bin/Disallow: /images/Disallow: /u/
http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwithhttp://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith
J. WILLARD MARRIOTT LIBRARY
http://content.lib.utah.edu/http://content.lib.utah.edu/cdm4/az.php#Dhttp://content.lib.utah.edu/cdm4/az_details.php?id=44http://content.lib.utah.edu/cdm4/browse.php?CISOROOT=/DardHunter
Provide sitemaps linking context with simple URLshttp://content.lib.utah.edu/cdm4/document.php?CISOROOT=/DardHunter&CISOPTR=1919
J. WILLARD MARRIOTT LIBRARY
A Papermaking Pilgrimage to Japan, Korea and ChinaA Papermaking Pilgrimage to Japan, Korea and China
Increase Page Crawl efficiency
J. WILLARD MARRIOTT LIBRARY
Are you worthy enough for their customer?
• Can they trust you with their customer?– Check the Crawl Errors in Google Webmaster– Address errors and don’t leave their customers stranded!
• Is your content worth an investment of their resources?– Eliminate sitemap & robots.txt conflicts– Provide sitemaps linking context with simple URLs– Increase Page Crawl efficiency
J. WILLARD MARRIOTT LIBRARY
How much will their customer value the introduction (i.e., Visibility)?
• Is your content relevant?• Is your content credible?• Is your content
accessible?
J. WILLARD MARRIOTT LIBRARY
Recommendations for CONTENTdm managers• Assemble the right players: your SEO team• Set your priorities• Create linking strategy from home page, to index of
collections, to collection “splash” page, and to item pages• Create Sitemaps and Sitemap Index file• Set up a regular process to update Sitemaps• Eliminate any conflicts between robots.txt and Sitemaps• Get involved with CONTENTdm’s new Web templates• Set up to monitor results!
J. WILLARD MARRIOTT LIBRARY
What’s your experience? Let us know!• Sandra McIntyre
[email protected]• Anne Morrow
[email protected]• Lisa Chaufty
[email protected]• Patrick OBrien
(805) [email protected]