Advanced Technical SEO - Index Bloat & Discovery: from Facets to Javascript Frameworks - SMX Munich...

78
The Latest in Advanced Technical SEO Index Bloat & Discovery: from Facets to Frameworks

Transcript of Advanced Technical SEO - Index Bloat & Discovery: from Facets to Javascript Frameworks - SMX Munich...

The Latest in Advanced Technical SEOIndex Bloat & Discovery: from Facets to Frameworks

Hi! Good Afternoon.

AriNahmaniCEO/[email protected]

TeamClients

index bloat

index bloat

crawl budget

web-tech > googlebot

discoverability

Today’s Session• Technical SEO issues around e-commerce /

large site architecture

• Preventing index bloat & preserving crawl budget as a core methodology

• Current solutions & upcoming threats (JS, AJAX, new frameworks, pre-rendering)

Index Bloat Prevention

Index Bloat Prevention

A bloated index = if indexed URLS > “unique pages”

Index Bloat Prevention

On an ecommerce site:A bloated index = if indexed URLS > sum(CAT+PDP+Static)

Index Bloat Prevention

On a ‘content’ site:A bloated index = if indexed URLS > sum(Articles+Static)

cannibalization

Index Bloat Prevention: Cannibalization

Index Bloat Prevention: Sorts & Facets

Index Bloat Prevention: Sorts & Filters

http://www.site.com/guys/tees/?prefn1=bvAverageRating&prefn2=colorGroup&prefv3=LG&srule=sortingNewArrival&prefv1=4&prefv2=RED&prefn3=size

Index Bloat Prevention: Sorts & Filters

<linkrel="canonical"href=”http://www.site.com/guys/tees/"/>

• Basic Solution: Strip out the unnecessary parameters

Solution: Filtering Out All Facet Params• PROS:

– Avoids diluted / dupe URLs (request, not directive)

• CONS:– If you want/need specific parameters indexed

and exposed (size, color), need properly coded canonical tag logic, recipe for major leak and confusion.

– Considerations w/ pagination & view-all page

Crawl Budget: Facet Parameter URLs

Crawl Budget: Facet Parameter URLs

JS / AJAX Indexation

Index Bloat VS Discovery: JS + AJAX

Index Bloat Prevention: JS + AJAX

AJAXRefinementV1=NOURLCHANGE

Index Bloat Prevention: JS + AJAX

AJAXRefinementV1- NOURLCHANGE,butinactive,different href=URLexists

AJAX Facet Refinements V1 (NO URL CHANGE)

• PROS:– Theoretically no parameters exposed to bloat the

index• CONS:

– Users can’t share refined / filtered content to friends, no accurate bookmarking. (Terrible UX)

– Googlebot will still crawl hidden href=' or other JS framework links like Angular: ng-href= (check canonical logic!!)

Index Bloat Prevention: JS + AJAX

AJAXRefinementV2=html5 history.pushState()

Index Bloat Prevention: JS + AJAX

html5history.pushState()

http://www.site.com/guys/tees/?color=green&size=large

Consistent URL Signals - Navigation

Ideal consistency:Navigation URLs = Pushstate() URLs =Canonical URLs = XML Sitemap URLs =

Consistent URL Signals - Navigation

Ideal consistency:Navigation URLs = Pushstate() URLs ≠ Canonical URLs = XML Sitemap URLs =

Index Bloat Prevention: JS + AJAX

Googlepreferredpushstate URLversion,wehadtoreinforce(vianormalinlinehref=‘’,canonical,xmlsitemap)

AJAX Facet Refinements V2 (PushState URL Change)• PROS:

– Users can now share /bookmark the correct content

– Added to browser history• CONS:

– Still need to have consistent canonical structure due to Googlebot crawling pushstate()

– Different hidden URL structure via AJAX facets may require further unpredictable canonicalization logic / further dev work

Indexing AJAX & JS Frameworks

Indexing AJAX & JS Frameworks

Indexing AJAX & JS Frameworks

What method exists that we know still works?

Indexing AJAX & JS Frameworks

HTML SNAPSHOT

<head><meta name="fragment" content="!">

Google / Bing crawls with:_escaped_fragment_=

Indexing AJAX & JS: HTML Snapshot

Indexing AJAX & JS: HTML Snapshot

Indexing AJAX & JS: HTML Snapshot

Pre or RealtimeRendered

(to users & bots)

Indexing AJAX & JS: How To Decide?

HTML SNAPSHOT_escaped_fragment_=

Trust Googlebot

VALIDATE!

Progressive Enhancement

‘Dumbed down’ HTML Template

3rd Party Service

(prerender.io)

Server side(phantomJS /

headless browser)

Pre-Rendered(to bots)

Pre or RealtimeRendered

(to users & bots)

Indexing AJAX & JS: How To Decide?

HTML SNAPSHOT_escaped_fragment_=

Trust Googlebot

VALIDATE!

Progressive Enhancement

‘Dumbed down’ HTML Template

3rd Party Service

(prerender.io)

Pre-Rendered(to bots)

Server side(phantomJS /

headless browser)

Indexing AJAX & JS: HTML Snapshot• Upon crawl of URL with _escaped_fragment_=,

serve ’dumbed down’ HTML version of page.

• Not pre-rendered, rather simplified.

• For example, on ecommerce à a view-all category listing with no dynamic facets. Amazing results from our clients.

Indexing AJAX & JS: How To Decide?

HTML SNAPSHOT_escaped_fragment_=

Trust Googlebot

VALIDATE!

Progressive Enhancement

‘Dumbed down’ HTML Template

3rd Party Service

(prerender.io)

Pre or RealtimeRendered

(to users & bots)

Pre-Rendered(to bots)

Server side(phantomJS /

headless browser)

Indexing AJAX & JS: Pre-renderingUpon crawl of URL with _escaped_fragment_=1. prerender.io – middleware via reverse proxy

that serves a pre-rendered, cached HTML page to botsOR

2. Server side – the server pre-rendered the JS in cached html pages to serve to bots ordoes it in real-time (headless browser).

Indexing AJAX & JS: Prerender.io

Indexing AJAX & JS: Prerender.io

Indexing AJAX & JS: BromBone

Indexing AJAX & JS: Server Prerender

Server side(phantomJS /

headless browser)

Pre or RealtimeRendered

(to users & bots)

Indexing AJAX & JS: How To Decide?

HTML SNAPSHOT_escaped_fragment_=

Trust Googlebot

VALIDATE!

Progressive Enhancement

‘Dumbed down’ HTML Template

3rd Party Service

(prerender.io)

Pre-Rendered(to bots)

Indexing AJAX & JS: Server Side

bit.ly/javascriptseo

Indexing AJAX & JS: Server Side

bit.ly/javascriptseobit.ly/javascriptseo

Indexing AJAX & JS: Server Side

bit.ly/javascriptseobit.ly/javascriptseo

Server side(phantomJS /

headless browser)

Pre or RealtimeRendered

(to users & bots)

Indexing AJAX & JS: How To Decide?

HTML SNAPSHOT_escaped_fragment_=

Trust Googlebot

VALIDATE!

Progressive Enhancement

‘Dumbed down’ HTML Template

3rd Party Service

(prerender.io)

Pre-Rendered(to bots)

Indexing AJAX & JS: Trust Googlebot

readthesefirst…

Testing JS Indexation: Jscrawlability.com

Validation & Testing: Discovery vs Bloat

Testing: Fetch & Render JS / AJAX

Testing: Slice and Dice the Index

AdvancedSiteOperatorssite:yoursite.com –inurl:cat.jsp-inurl:prod.jsp –inurl:store.jsp

Testing: Slice and Dice the Index

AdvancedSiteOperatorssite:yoursite.com inurl:size

inurl:cat.jsp -inurl:cid

Testing: Slice and Dice the Index

AdvancedSiteOperatorssite:yoursite.com inurl:pdpintext:”writeareview”

Testing: Automate Bloat + Discovery Check

Testing: Automate Bloat + Discovery Check

Testing: Search Analytics for Bloat / Discovery

Testing: Go To The Source: Server Logs!

Summing It Up• Index Bloat, Crawl Budget, & Testing: Large sites are

prone to serious index bloat and wasted crawl budget. Needs diligent testing and an OCD-like attention to detail with the basics. Test often & automate!

• JS/AJAX: Pushstate(), JS Frameworks and AJAX present both discovery and bloat challenges. Know the options: short term fixes like HTML snapshot (G+B), and long term re-designs with modern frameworks w/ built in server side rendering.

Dankeschön! Questions?

AriNahmaniCEO/[email protected]@AriNahmani

References: • Can You Now Trust Google To Crawl Ajax Sites?• Search Engine Optimization Best Practices for AJAX URLs | Webmaster Blog• We Tested How Googlebot Crawls Javascript And Here's What We Learned• Prerender - AngularJS SEO, BackboneJS SEO, or EmberJS SEO• SMX Munich Advanced Technical SEO Brainstorm - Google Docs• www.simoahava.com/seo/dynamically-added-meta-data-indexed-google-crawlers/• Speakers | Search Marketing Expo &ndash; SMX Munich• JavaScript + SEO: Better Together &mdash; Medium• SEO AJAX Crawlability in a Responsive Publisher World• SEO Strategies for JavaScript-Heavy Single Page Applications or AJAX Sites | Search Engine Watch• The Basics of JavaScript Framework SEO in AngularJS - Builtvisible• Can Search Engines Crawl Javascript?• https://www.w3.org/wiki/Graceful_degradation_versus_progressive_enhancement#Graceful_degradatio

n_and_progressive_enhancement_in_a_nutshell• SEO and JS: New Challenges• BromBone | SEO for your AngularJS, EmberJS, or BackboneJS website.• DIY AngularJS SEO with PhantomJS (the easy way!) | Lawsonry• https://scotch.io/tutorials/angularjs-seo-with-prerender-io

Image Credits:

fat-american-1.jpg (1280×955)bigbrands1.jpg (570×383)consistencydemotivator_large.jpeg (480×338)04-godfather-keep-friend.jpg (518×300)4da1a1a23dba011a7ba6918986a6b818302b949ae694b27d559cf8e733

08bf7b.jpg (604×392)the-17-craziest-cannibal-attacks-in-history-u2.jpg (520×272)taxonomy-types-800x450.png (800×450)wireframes-homecat.png (1000×460)Check-yoself.jpg (800×1025)Dangerous-Curve-Ahead-Sign-K-6513.gif (400×400)crawlerserver2.png (884×445)beach.png (1196×838)