Updating metadata via a recrawl
In a recrawl, we dispatch our web crawler to article URLs that already have metadata in our system in order to update the metadata associated with those records. This is useful in situations where you’ve configured your metadata incorrectly, for example, or when you’ve decided to add richer metadata retroactively that you’d like to see updated in your dashboard and API results.
Note
A recrawl will only apply new metadata to historical data for the three days prior to the recrawl and forward. If you would like to correct additional historical data, a rebuild of your historical data will be necessary. More information on rebuilds is available here.
There are five ways that we can update existing metadata with this process:
Recrawl via Automatic Metadata Change Detection
Read more about this opt-in feature that automatically detects and updates your metadata.
Recrawl via Check For Metadata Updates feature
Read more about how to manually check for metadata updates within each Post Details page.
Recrawl a single URL from the dashboard
A local site administrator can submit individual URLs to recrawl via a form found in the Parse.ly dashboard. This form is accessible via the API Settings page in the admin:
Trigger a new article crawl from the Parsely API
Normally, we crawl articles during the first 24 hours after publication when the metadata is most likely to change. It is possible to notify our servers when you update posts. To do that, submit a POST request to the following address, appending the URL of the post to the end of it. The <API_KEY>
should be replaced with your respective Site ID (apikey
), and <API_SECRET>
with the token accessible to the account administrators via API Settings.
https://dash.parsely.com/<API_KEY>/ping_crawl?secret=<API_SECRET>&url=<URL>
If your CMS supports webhooks, you may be able to automate this process as well.
Request a bulk recrawl from Parse.ly Support
If you have a large number of URLs that need updating, please contact support@parsely.com to set up a bulk recrawl.
Last updated: August 15, 2024