Metadata Help
What forms of metadata does Parse.ly accept? What should I include in my metadata that I can see in the Parse.ly dashboard?
We accept the three formats listed in our metadata documentation.
There should be a value for each field (as seen in the example on that page). Complete metadata gives you the most value and data within the Parse.ly dashboard.
What are the most common metadata mistakes?
The Parse.ly Canonical URL
This is the URL that Parse.ly considers to be the source of truth for the metadata about a particular post or page specified as the “URL” property in a JSON-LD tag.
A single piece of content may have multiple URLs associated with it, but Parse.ly will only retrieve content from the one designated as the canonical URL. For example, you might have the URL example.com/article on your desktop site, while the mobile version is at m.example.com/article. That’s fine, as long as the URL in the metadata is the same text string for both. Similarly, you might serve two different versions of an article: one with an “http” URL, and one with “https.” Again, no problem; just make sure they both have the same Parse.ly canonical URL in their metadata.
The canonical URL allows us to aggregate data together across all URLs that share a common canonical URL. For more details about how this works, read about how the Parse.ly Crawler works.
One of the most common mistakes is omitting or incorrectly specifying the canonical URL. Any variations in the canonical URL, such as:
- http vs https
- urls with and without /,
- different URLs for website vs AMP pages
will result in duplicate posts and skew your data in the Parse.ly dashboard.
Note that the criteria Parse.ly uses to identify the canonical URL differs from the common usage of the term, in that we rarely rely on the value of the <link rel="canonical">
tag.
Metadata
Invalid metadata as a result of small errors is another common problem. Our documentation outlines your metadata formatting options (we recommend JSON-LD). A single error in the metadata tag such as:
- a missing quotation mark
- an unescaped special character
- a field name in the wrong case
- a relative URL
may prevent an article from registering its metadata properly, causing it to show incorrectly as an index or no-metas page in your dashboard. You should escape double quotes within your metadata values.
Article Section Value
You can only list one value for an article’s section, though you can list up to 100 values in the tags/keywords field. This should be formatted as an array of strings. You can also list multiple author values, again as an array of strings. These values are case-sensitive, which is important to remember if you’re trying to pull data from our API. There are examples here for reference.
Many publishers want to track subsections in the tags field. The best way to do that is to separate the section/subsections with a colon in a single tag; for example “sports:football” or “sports:basketball:wnba”.
Publication Date and UTC Time
An article’s publication date should be listed in UTC ISO 8601 format, with no offset, and in UTC in your metadata; for example: “pub_date”: “2013-08-15T13:00:00Z”. We’ll display those dates and times in your dashboard using your local timezone.
Last updated: September 25, 2024