I confess: when I’m carrying out a technical audit on a website I basically act like I’m running a police investigation. I know that there will be mysteries to solve, and it’s my job to find the clues that will lead me in the right direction.
And with the right tools in hand, I’ll usually sniff something out when I get to the strange occurrences of rel=canonical.
In a nutshell, rel=canonical is a way to clean up duplicate URLs on a website. I know what you’re thinking, it would be much easier if duplicate content just didn’t exist at all. This would make my job all sunshine, rainbows and flowers rather than the sweat and tears it generally involves, but this is the real world and duplicate content is sometimes unavoidable.
This is especially true when it comes to ecommerce sites which pose some of the most complex mysteries for SEO forces all across the nation. The way that many of these sites present information or products to users means that some pretty wacky things can happen to the URL – all designed to provide the most relevant results to users through the use of parameters.
Moz has a great guide on canonicalisation which I’d urge you to read if you’re new to the concept, as the purpose of this blog post is to guide you with proper implementation rather than a full explanation of what it is.
Alternatively you could shimmy on over to the blog of Matt Cutts; he wrote a post in 2009 called “Learn about the Canonical Link Element in 5 minutes” which is just as relevant today as it was back then.
Make sure to revisit this post when you’re familiar with the topic as you’ll find it much more valuable then!
If I’ve captured the attention of your inner geek, sit back as I share some of my recommendations for rel=canonical best practice. The reality is that I’ve seen lots of cases recently where issues have gone undetected for far too long, and I want you to be able to check that you’re not being taken for a ride by your own website.
The first thing you’re going to need to do is identify the culprits that are causing duplicate content. My preferred sidekick for this job is the ever-dependable Screaming Frog SEO Spider.
Once you have performed a crawl, you should be able to use the overview report on the right-hand side of the tool to give you a quick insight into where issues might be occurring. Is it showing results for duplicate page titles, URI or meta descriptions? If so, these may indicate where there are duplicate pages which all share the same content and meta data. Use this a starting point for deeper investigations by manually visiting each version and checking out the source code of each.
Scroll down to the ‘Directives’ folder to see what is being acknowledged by the tool in terms of canonicalisation for more quick hints. Although it’s from the main ‘Directives’ tab in the top navigation where you can really start drilling down into individual issues. At this point you may start to spot strange occurrences that require a bit of manual investigation. Or a lot.
But then it does help to know what you’re actually looking for. Here are the common causes for why multiple URLs can load the same content:
When these issues occur, it’s important to choose a preferred URL for indexation by search engines. This is where the rel=canonical link comes in.
As a side note, there are other ways you can do this, including using 301 redirects, indicating how search engines should handle dynamic parameters, etc. but this is deserves a post of its own, something I’ll come back to in the near future.
The Google Webmaster Central blog has a great summary of rel=canonical:
“Including a rel=canonical link in your webpage is a strong hint to search engines about your preferred version to index among duplicate pages on the web. It’s supported by several search engines, including Yahoo!, Bing, and Google. The rel=canonical link consolidates indexing properties from the duplicates, like their inbound links, as well as specifies which URL you’d like displayed in search results.”
The whole purpose of indicating a preferred URL with the rel=canonical link element is so that search engines are more likely to show users your chosen URL structure as opposed to any duplicates. It is important to remember that rel=canonical elements can be ignored, especially when there are conflicting instructions, making accurate implementation all the more important.
Check out this example from the Google Webmaster Central blog; it sums up correct implementation pretty well:
Suppose you want https://blog.example.com/dresses/green-dresses-are-awesome/ to be the preferred URL, even though a variety of URLs can access this content. You can indicate this to search engines as follows:
Mark up the canonical page and any other variants with a rel=”canonical” link element.
Add a <link> element with the attribute rel=”canonical” to the <head> section of these pages:
<link rel=”canonical” href=”https://blog.example.com/dresses/green-dresses-are-awesome” />
Whilst the concept of rel=canonical is easy enough to understand, it’s the implementation that can cause strange occurrences that require investigation (and probably a headache or two along the way).
There are some common mistakes that webmasters and SEOs make when it comes to rel=canonical, although there are some excellent blog posts and guides out there already which may prove immensely helpful for you. Start off with 5 common mistakes with rel=canonical from the Webmaster Central Blog, and then read through Yoast’s rel=canonical: what it is and how (not) to use it.
To help you avoid the common mistakes, I’ve put together a helpful list of 7 things you should remember when implementing rel=canonical. You can refer back to this blog post, or grab the PDF version here: PDF of rel=canonical guide
When more than one is specified, all rel=canonicals will be ignored! This can occur with some SEO plugins that insert a default rel=canonical link, so be sure to understand what plugins you have installed and how they behave.
It’s possible to insert a relative URL into the <link> tag, but this almost certainly won’t do what you want it to. A relative URL includes a path that is “relative” to the current page. This means you need to add in the lot, including http:// (or https://).
You will risk some content not being indexed if you specify that page-one is the preference. Put it this way, are the other pages duplicates of page one? It’s highly unlikely.
Rel=canonical designations in the <body> are disregarded, so it’s best to include the tag as early as possible in the <head>.
If your site can load on both http and https versions, check that you don’t have an automatically generated self-referencing rel=canonical. This could mean that both https://www.example.com/red-dresses and http://www.example.com/red-dresses are denoted as the preference.
It’s fairly obvious that you want the search engines to index URLs that provide actual value and a positive experience to users…
It helps if you pick a preference for use across the site to minimise the chances of referencing a URL in this way; ensure it is included in all internal links and within the rel=canonical tag element.
This is something I learned from the Yoast blog post referenced above. He has put it quite eloquently, so I’ve included it here for your reference:
“If you share a URL on Facebook that has a canonical pointing elsewhere, Facebook will share the details from the canonical URL. In fact, if you add a like button on a page that has a canonical pointing elsewhere, it will show the like count for the canonical URL, not for the current URL. Twitter works in the same way.”
Now it’s your turn to get on the case and investigate whether your own site has any of these issues with rel=canonical. I’d love to hear if you uncover any hidden culprits, and I’m also happy to put on my investigator hat to answer any questions you may have on the topic too – please leave me a comment below or get in touch through Twitter.
Hopefully we can then utter a collective “case closed”, and move our focus to other technical issues instead!