In a recrawl, we dispatch our web crawler to article URLs that already have metadata in our system in order to update the metadata associated with those records. This is useful in situations where you’ve configured your metadata incorrectly, for example, or when you’ve decided to add richer metadata retroactively that you’d like to see updated in your dashboard and API results.
A recrawl will only apply new metadata to historical data for the three days prior to the recrawl and forward. If you would like to correct additional historical data, a rebuild of your historical data will be necessary. More information on rebuilds is available here.
There are three ways in which existing metadata can be updated via the recrawl process:
Recrawl a single URL from the dashboard
A local site administrator can submit individual URLs to be recrawled via a form found in the Parse.ly dashboard. This form is accessible via the API Settings page in the admin:
Trigger a new article crawl from the Parsely API
Normally, articles are crawled during the first 24 hours after they are published when the metadata is most likely to change. It is possible to notify our servers when posts get updated. To do that, submit a POST request to the following address, appending the URL of the post to the end of it. The
<API_KEY> should be replaced with your respective Site ID (
<API_SECRET> with the token accessible to the account administrators via API Settings.
If your CMS supports webhooks, you may be able to automate this process as well.
Request a bulk recrawl from Parse.ly Support
If you have a large number of URLs that need to be updated, please contact firstname.lastname@example.org to set up a bulk recrawl.
Last updated: December 02, 2022