What is duplicate content?
Duplicate content is not a complex subject as some would have you believe, it is simply defined by me as two (or more) pieces of the same content which appear on different pages on the Internet. In terms of internal duplicate content there are many reasons this can happen such as:
- Products in multiple categories on an eCommerce website
- Printer friendly or other format web pages
- Technical website issues such as URL parameters or session IDs
Why does this matter?
Duplicate content can confuse search engines in several ways;
- They don't know which version of the content to show in the search results
- They need to decide if the duplicate content is accidental or malicious (trying to manipulate the search results)
- They don't know whether to give search metrics (such as domain and page authority) to just one page or split it between the multiple versions
This can have two implications to web masters 1) as search engines will rarely show multiple versions of the same content in search results they are forced to decide which version of the content is 'best'. This process could result in the version you consider best not be shown and the other versions of your content are essentially negated (their 'power' is not utilise). 2) if your duplicate content problem is large enough search engines could deem it worthy of a penalty or as Google put it in their webmaster post on the subject;
make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
How do we solve these problem?
Luckily for you there are several ways for you to solve these problems but there are two which are particularly useful:
- 301 redirects - a 301 redirect sends all visitors from one page to another so in this instance from the duplicate pages to the original. By dealing with duplicate content in this way you are only giving the search engines the option to rank the page you would like people to see and you also get the added benefit of having the combined 'power' of all the other pages pushed into one, thus giving you a better chance to rank well for the content.
- Rel="canonical" tag - this tag tells search engines that there is another page which is the original and therefore has the benefits of a 301 redirect but unlike the 301 redirect users can still access the page in question. This is ideal for content you want users to access such as print friendly versions of the page. These are easily implemented by putting the below tag (with the original URL inserted) in the HTML head of a page
<link href="http://www.yourdomain.com/original-page/" rel="canonical" />
Although these are the two most common ways to deal with duplicate content the 'noindex, follow tag', preferred domain tool in Google Webmaster Tools and parameter handling tool in Google Webmaster Tools are also viable.
I hope this helps with your issues and if you have any questions let us know!