Duplicate content – what does it actually mean and why can it be so harmful?
When taking on new clients, we nearly always complete a full SEO audit, this helps us to find the issues a website might have and helps our team develop a strategy. One of the most frequent issues we see is duplicate website content – a factor which can cause serious harm to organic search rankings and visibility.
To explain the harm duplicate content can cause, we’ll use the analogy of using a satnav to get to a destination. You’re starting a journey to a destination you’ve never visited before, entering a postcode into a satnav. The satnav provides you with 2 different destinations – for the same postcode – but which is the right one?
This is the issue Google and other search engines have with duplicate content – they can’t identify which are the right results, as they’re seeing the same content in multiple places.
I’ve been careful to avoid duplicate content, why are crawls and audits still identifying duplicate content issues on my site?
It’s important to know that duplicate content is not always a piece of text, sometimes technical errors can cause Google to read pages or content types as duplicate content. These types of technical problems can be fixed with dedicated development and SEO activity.
Here are some examples of technical duplicate content:
- Many websites present copies of pages which are created to be printer-friendly; these include the same content. This is completely fine to do but when you do this you must remember to de-index this page so search engines understand which content to show in rankings.
- Sometimes Google can read URL parameters (like click tracking and some analytics code) as duplicate content.
- Session IDs can be registered as duplicate content. This happens when a user visits a website which assigns a different session ID that is stored in the URL.
- Discussion forums can generate both regular and stripped-down pages which are targeted at mobile devices.
To combat technical duplicate content errors – the following steps would usually be taken:
- Canonicalization: SEO Best Practice tells us that whenever duplicate content is found at more than one URL, it should be ‘canonicalized’ for search engines. This can be done by using the “rel=canonical” tag; this tag passes the same amount of ranking power as a 301 redirect from the “duplicate” page to the “original” page.
- 301 Redirect: This method can have a positive impact on rankings; by setting up a 301 redirect from the “duplicate” page to the “original” one, search engines see that multiple pages with the potential to rank are re-directed to another page that also has potential to rank - this creates a stronger relevancy.
- Noindex, follow: Simply apply the meta robots tag with the values, “no index, follow” on the duplicate pages you don’t want to be included in a search engines index. This will tell the search engine bots to crawl the links of the chosen pages, but stop them from reading those with ‘noindex’ tags.
I think someone has stolen or deliberately duplicated my content?
Although in most cases duplicate content is a technical problem presented on one website, there are times where content is deliberately duplicated to try and trick the search engine rankings and get more traffic.
When you discover that somebody has copied your content, it is important to have it removed as soon as possible. However, this isn’t always an easy fix.
What’s a ‘scraper site’ and should I be worried about this?
There are many myths about the implications of duplicate content; a common one is the myth that “Scrapers” will hurt your site by targeting and stealing content from one or more of its pages. Everything written in this blog does not apply to scraper sites! So what is a scraper site?
A scraper site is a website that copies content from other websites using web scraping; the purpose of these websites is to earn revenue through methods such as advertising and selling data, they are NOT a threat to your ranking power. A Scraper Site will have:
- Quality posts with no links
- No RSS feed available
- Quality posts with a very low amount of subscribers
- No comments on any of the posts
- No “About Us” page or information about the business
- No contact form or email address
Our SEO team recently found a number of websites that had stolen content from one of our client’s websites. After spending time discovering what turned out to be a number of different scraper sites, we eventually walked away from the task with a clear mind knowing that the website was in no danger. If you have ever seen the analytics for a big blog, you will know that many websites get scraped ten times before you’ve had your Weetabix, do you think they panic? No, they take no notice and finish their breakfast. There can be, however, real duplicate threats on the net.
When a website (that isn’t a scraper) has deliberately taken your content to try and use as its own, there can be issues for your website:
- Website owners may suffer drops in both rankings and traffic
- Search Engines will provide less relevant results
To remove offending websites, follow the below steps:
- Find the email address of the offender, if this isn’t possible do a “WHOIS” search; this will give you the contact information of the owner of the website (also make a note of who is hosting the website).
- Use the “Way Back Machine” to find past views of your website; this will prove that your content is the original content.
- Provide a link to the Google Cache which shows that the Google Spiders discovered your content earlier than the offending website.
- Take screenshots of the duplicated content on the offending website (if it is a whole website that has been duplicated, also save the source code to compare it with yours).
Contact the offending website:
- Send a polite email to the owner of the website asking them to remove the duplicate content
- Check the website to see if the content has been removed.
- If so, great. If not then send a “Cease and Desist” order.
- If this works, great. If not - then take it to the next step.
- Contact the host of the offending website – send the “Cease and Desist” order and the evidence that you have gathered.
File a DMCA complaint (against an authority site only) - contact Google and other search engines such as Bing and Yahoo explaining the situation and request that they remove the offending site from their indexes.
If you notice that your content has been duplicated by an authoritative site, for example, the BBC, then don’t take actions to get it removed, take it as a compliment and contact them to give you an attribution link because let’s face it, you will never outrank them. They should link to you.