Canonicalization can be a challenging concept to understand (and hard to pronounce: “ca-non-ick-cull-eye-zay-shun”), but it’s essential to creating an optimized website.
The fundamental problems that canonicalization can fix stem from multiple uses for a single piece of writing–a paragraph or, more often, an entire page of content–that appears in multiple locations on one website or on multiple websites.
For search engines, this presents a conundrum: Which version of this content should they show to searchers? SEOs refer to this issue as duplicate content
To provide the best user experience, search engines will rarely show multiple, duplicate pieces of content and thus, are forced to choose which version is most likely to be the original (or best).
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem. The following lists show the most common canonicalization errors that can be produced when using the default settings on the two most common web servers:
Each of these URLs spreads out the value of inbound links to the homepage. This means that if the homepage has multiple links to these various URLs, the major search engines only give them credit separately, not in a combined manner.
Luckily for SEOs, web developers developed methods for redirection so that URLs can be changed and combined. Two primary types of server redirects exist:
Though the difference appears to be merely semantics, the actual results are dramatic. Google does not pass link juice (ranking power) equally between normal links and server redirects.
Canonicalization is not limited to the inclusion of alphanumeric characters. It also dictates forward slashes in URLs. If a web surfer goes to http://www.google.com they will automatically get redirected to http://www.google.com/ (notice the trailing forward slash). This is happening because technically the latter is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know that www.google.com is intended to mean the same as www.google.com/), it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to internally link to the version with the backslash.
One common canonicalization mistake is accidentally creating an infinite loop between http://www.example.com and http://www.example.com/index.html. The solution to this common glitch is discussed in this post about redirecting an index file to your domain without looping.
Another option for dealing with duplicate content is to utilize the rel=canonical tag. The rel=canonical tag passes the same amount of link juice (ranking power) as a 301 redirect, and often takes much less development time to implement.
The tag is part of the HTML head of a web page. This meta tag isn’t new, but like nofollow, simply uses a new rel parameter. For example:
<link href=”http://www.example.com/canonical-version-of-page/” rel=”canonical” />
This tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/canonical-version-of-page/ and that all of the links and content metrics the engines apply should actually be credited toward the provided URL.