getting_rid_of_duplicate_content_iss-ben_dangelo.ppt
-
Upload
zach-browne -
Category
Technology
-
view
519 -
download
0
description
Transcript of getting_rid_of_duplicate_content_iss-ben_dangelo.ppt
1
Getting Rid of Duplicate Content Issues Once and For All
Ben D’AngeloSoftware Engineer
PubCon, Las VegasNovember 13, 2008
2
What are “duplicate content issues”?
Multiple disjoint situations!
• Duplicate content within your site or sites
Multiple URLs pointing to the same page, similar pages
Different countries (same language)
• Duplicate content across other sites
Syndicated content
Scraped content
3
Guiding principle
One URL for one piece of content
Why?
• Users don’t like duplicates in results
• Saves resources in our index—more room for other pages from your site!
• Saves resources on your server
4
Sources of duplicates within your sites
• Multiple URLs pointing to the same page
www vs non-www
Session ids, URL parameters
Printable versions of pages
CNAMEs
• Similar content on different pages
• Manufacturer’s databases
• Different countries
5
• Many systems for de-duping URLs at various stages in our crawl/index
pipeline
General idea: cluster pages, choose the “best” representative
• Different filters are used for different types of duplicate content
• Goal: serve one version of the content in search results
• Generally just a filter: it will not destroy your site
How does Google handle this?
6
What can you do about your site?
• For exact dupes: 301 Tracking URLs
www vs non-www (also Google Webmaster Tools)
• Near duplicates: noindex / robots.txt Printable pages
Clones of other sites
• Domains by country Different languages is not duplicate content
Use unique content specific to the country
Use different TLDs (also Google Webmaster Tools) for geo-targeting
• Url parameters Put data which does not affect the substance of a page in a cookie
7
What can you do about your site?
Choose www or non-www as preferred
8
What can you do about your site?
9
What can you do about another site?
• Include original absolute URL in syndicated content
• Syndicate different content
• If you use syndicated content, manage your expectations
• Don’t worry about scrapers or proxies too much; they generally don’t affect
your rankings
If you are concerned, file a
• DMCA request (http://www.google.com/dmca.html)
• Spam report (https://www.google.com/webmasters/tools/spamreport)
10
Best practices for Google
• Avoid duplicate URLs / sites
• Generate unique, compelling content for users
• Don’t be overly concerned with duplicate content
• Let us know about any issues at the Webmaster Help Forum
11
Useful links
Webmaster Central http://google.com/webmasters/
• Webmaster Central Blog
http://googlewebmastercentral.blogspot.com/
• Webmaster Help Center
http://www.google.com/support/webmasters/
• Webmaster Discussion Group
http://groups.google.com/group/Google_Webmaster_Help
12
Thank You!