Canonical URLs: Telling Search Engines which to Use

There is more than one URL that can be used to access any given website page. While this is extremely useful in certain situations, like when you want to view your page before the domain propagates, it can sometimes cause undesired effects. Luckily most search engines, like Google and Bing, understand that every page has multiple working URLs; so they are usually able to correctly determine what URL is most likely the canonical URL. Simply put, the canonical URL is the preferred URL for a page.

Examples of URLs for a Single Page

Depending on the setup of a particular website, it may be possible to visit the same page by using the following example URLs:

  • primarydomain.com
  • www.primarydomain.com
  • 10.0.0.2/~username/
  • primarydomain.com/index.php
  • www.primarydomain.com/index.php
  • 10.0.0.2/~username/index.php

If the domain is an addon domain, it may be possible to view the same page through these hypothetical URLs:

  • addondomain.com
  • www.addondomain.com
  • primarydomain.com/addondomain.com
  • addondomain.primarydomain.com
  • 10.0.0.2/~username/addondomain.com
  • addondomain.com/index.php
  • www.addondomain.com/index.php
  • primarydomain.com/addondomain.com/index.php
  • addondomain.primarydomain.com/index.php
  • 10.0.0.2/~username/addondomain.com/index.php

How Search Engines Guess the Canonical (Preferred) URL

It is important to note that even though several URLs exist for the same file, most search engines (as well as your visitors) will never encounter them or even know that they exist.  For example, unless you share the information with someone else, the temporary URL that your host gives you with your IP address and username will only be known by you and will not be listed in any search results for your domain name. Otherwise, search engines are able to find URLs in a couple of ways:

  • They found a link to your page on a web page they already knew existed.
  • They found the link or URL in a site map and/or an RSS feed.
  • The link or URL was submitted to them directly, usually via their website.
  • Someone visited your page while using that search engine's browser toolbar.

Once search engines find out about the page, they compare it with other pages that appear to be identical or almost exact matches in order to spot duplicates. If they spot a duplicate page, they then try to figure out which URL should be the canonical URL. Although they keep their exact algorithms secret, there are some things that they are known to check:

  • Most common URL used to link to the site
  • What URL is used in the site map and RSS feeds
  • Whether a canonical URL is specified in the meta tags of site pages
  • Whether the URL redirects to another URL
  • For Google specifically, whether a canonical URL is specified in Google Webmaster Tools

Resources

Here are some useful resources from Google and Bing regarding canonical URLs: