The Parse.ly dashboard relies heavily on a canonical URL value provided on each page that serves as the unique identifier for that content as well as a value around which multiple URLs can be grouped for tracking.
The Parse.ly canonical URL is the URL considered to be the source of truth for the metadata about a particular post or page. That value is specified as the
url property in the JSON-LD tag or the
parsely-link value if using repeated meta tags. When Parse.ly receives a page view from a URL, the canonical value is checked to see if metadata on that URL already exists in Parse.ly’s system. If not, then the URL is crawled and the metadata stored, with the URL serving as a new identifier in Parse.ly’s data.
It’s important to know that the Parse.ly canonical URL is not the same value as the rel canonical element also set on most pages. The Parse.ly canonical tag refers to the
url value provided in the JSON-LD or repeated meta tags, not the value of the rel canonical element. It is possible to set a different URL as the rel canonical for SEO purposes than the value of the Parse.ly canonical.
As noted below, it is possible to track a group of URLs as a single entry in Parse.ly; this happens most often for photo galleries or across channels. Parse.ly, though, will only retrieve content from the page designated as the canonical URL. For example, you might have the same piece of content located at example.com/article and m.example.com/article. Setting the same Parse.ly canonical URL in the metadata on both pages will ensure that both pages are tracked as a single entity in the Parse.ly dashboard. The same scenario would be true if you decided to serve http and https versions of the same article – again, set the same URL value in the metadata on each page to make sure they track as a single page.
The canonical URL allows Parse.ly to aggregate data together across all URLs that share a common canonical URL. For more details about how this works, read about how the Parse.ly Crawler works.
One of the most common mistakes is omitting or incorrectly specifying the canonical URL. Any variations in the canonical URL will result in duplicate posts and skew your data in the Parse.ly dashboard. Commonly seen errors here include:
- URLs with and without
- different URLs for website vs AMP pages
Note that it is possible to specify an ID value in your metadata rather than a canonical URL. When provided, a post ID will take precedence over the canonical URL value, and we will group your articles by that ID instead. We do discourage including page ID values when possible as grouping articles by canonical URL is a simpler and more reliable implementation.
Canonical URLs and aliases
Parse.ly doesn’t just track individual URLs, but actually groups together URLs that refer to the same post. This grouping provides easier, simpler and more accurate tracking of your content. Every page has a Parse.ly canonical URL, but can have additional URLs that should be tracked with the canonical. Parse.ly refers to those URLs as aliases, pages should be considered logically equivalent to the canonical URL.
Here’s an example of an article about the changing demographics of society that appeared on the web across many URLs:
The URL structure itself reveals that all of the URLs essentially are from the same article. The first six URLs are portions of the entire article, and then there are three alternative formats – printable, single page and mobile versions. To make sure they’re all grouped together, each page should have the Parse.ly canonical URL set to the article URL (
http://www.theatlantic.com/magazine/archive/2011/11/all-the-single-ladies/8654/, in this case).
Metadata provided on alias URLs will be ignored; the Parse.ly canonical URL provides metadata for each page. For more details, read about how the Parse.ly Crawler works.
It’s true that most traffic for this article typically ends up at the article URL, but tracking the post across all its incarnations is important, especially when you consider social media and search channels. For example, a search engine might get a hit on the third page of a multi-page article for certain keywords. Your Twitter audience might choose to tweet the AMP version of the article or a mobile URL, perhaps.
Last updated: August 16, 2023