Crawl Optimisation - #Pubcon 2015

19
#pubcon @badams Crawl Optimisation Presented by: Barry Adams Polemic Digital

Transcript of Crawl Optimisation - #Pubcon 2015

Page 1: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Optimisation

Presented by:

Barry Adams

Polemic Digital

Page 2: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

About Barry Adams

• Dutchman in Northern Ireland

• Founder of Polemic Digital

• Senior editor for StateofDigital.com

• Twitter ranter: @badams

• Lecturer & educator

Page 3: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

What is Crawl Optimisation?

Ensuring search engine spiders waste as little time as

possible crawling the right URLs on your site.

Page 4: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Why is Crawl Optimisation important?

If you waste crawl budget, the right pages are unlikely

to be crawled & indexed.

Page 5: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Sources

• Site crawl

• XML Sitemaps

• Inbound links

• DNS records

• Domain registrations

• Browsing data

Page 6: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Identifying Crawl Waste

Page 7: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Bogus URLs in

XML Sitemap

Page 8: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Optimise XML Sitemaps

• Ensure your sitemap contains final URLs only

• Minimise 301-redirects or other non-200 status codes

• Use multiple sitemaps to identify crawl waste in GSC

Page 9: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Paginated Listings

• Especially when

combined with

faceted navigation

Page 10: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Optimise Paginated Listings

• List more items on a single page

• Implement rel=prev/next

• Block sorting parameters in robots.txt

– Disallow: /*?order=*

• “rel=nofollow”

Page 11: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Internal Site

Search Results

Page 12: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Block Internal Site Search Pages

• Block in robots.txt

User-agent: *

Disallow: /SearchResults.aspx

Disallow: /*query=*

Disallow: /*s=*

Page 13: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Internal redirects

Page 14: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Minimise Internal Redirects

• Find redirects with

Screaming Frog

• Internal links should

all be 200 OK

• Flat site structure

Page 15: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Canonicalised Pages

Page 16: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Use Canonicals Wisely

• “rel=canonical” is primarily for index issues

– It is not a fix for crawl waste

– Search engines need to see the canonical tag before they

can act on it

– Ergo, pages need to be crawled before rel=canonical has

any effect

– Ditto with meta noindex tags

Page 17: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Waste

• Slow loading pages

Page 18: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Optimise Load Speed

• Time to First Byte

• Lightweight pages

• Caching

• Compression

Page 19: Crawl Optimisation - #Pubcon 2015

#pubcon@badams

Crawl Optimisation Summarised

• Don’t let search engines

do the hard work

• Tools at your disposal;

– DeepCrawl

– Google Search Console

– Screaming Frog SEO Crawler

– WebPageTest.org

• Solutions;

– XML Sitemaps

– robots.txt

– rel=nofollow

– rel=prev/next

– Load speed