Crawl Optimisation - #Pubcon 2015

Post on 21-Apr-2017

6.430 views 0 download

Transcript of Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Optimisation

Presented by:

Barry Adams

Polemic Digital

#pubcon@badams

About Barry Adams

• Dutchman in Northern Ireland

• Founder of Polemic Digital

• Senior editor for StateofDigital.com

• Twitter ranter: @badams

• Lecturer & educator

#pubcon@badams

What is Crawl Optimisation?

Ensuring search engine spiders waste as little time as

possible crawling the right URLs on your site.

#pubcon@badams

Why is Crawl Optimisation important?

If you waste crawl budget, the right pages are unlikely

to be crawled & indexed.

#pubcon@badams

Crawl Sources

• Site crawl

• XML Sitemaps

• Inbound links

• DNS records

• Domain registrations

• Browsing data

#pubcon@badams

Identifying Crawl Waste

#pubcon@badams

Crawl Waste

• Bogus URLs in

XML Sitemap

#pubcon@badams

Optimise XML Sitemaps

• Ensure your sitemap contains final URLs only

• Minimise 301-redirects or other non-200 status codes

• Use multiple sitemaps to identify crawl waste in GSC

#pubcon@badams

Crawl Waste

• Paginated Listings

• Especially when

combined with

faceted navigation

#pubcon@badams

Optimise Paginated Listings

• List more items on a single page

• Implement rel=prev/next

• Block sorting parameters in robots.txt

– Disallow: /*?order=*

• “rel=nofollow”

#pubcon@badams

Crawl Waste

• Internal Site

Search Results

#pubcon@badams

Block Internal Site Search Pages

• Block in robots.txt

User-agent: *

Disallow: /SearchResults.aspx

Disallow: /*query=*

Disallow: /*s=*

#pubcon@badams

Crawl Waste

• Internal redirects

#pubcon@badams

Minimise Internal Redirects

• Find redirects with

Screaming Frog

• Internal links should

all be 200 OK

• Flat site structure

#pubcon@badams

Crawl Waste

• Canonicalised Pages

#pubcon@badams

Use Canonicals Wisely

• “rel=canonical” is primarily for index issues

– It is not a fix for crawl waste

– Search engines need to see the canonical tag before they

can act on it

– Ergo, pages need to be crawled before rel=canonical has

any effect

– Ditto with meta noindex tags

#pubcon@badams

Crawl Waste

• Slow loading pages

#pubcon@badams

Optimise Load Speed

• Time to First Byte

• Lightweight pages

• Caching

• Compression

#pubcon@badams

Crawl Optimisation Summarised

• Don’t let search engines

do the hard work

• Tools at your disposal;

– DeepCrawl

– Google Search Console

– Screaming Frog SEO Crawler

– WebPageTest.org

• Solutions;

– XML Sitemaps

– robots.txt

– rel=nofollow

– rel=prev/next

– Load speed